Images expressed as Interplanetary Linked Data
ipld-image is an attempt to create a structured representation of images on IPFS, that allows to do image processing operations directly on this structure, instead of having to operate on opaque blobs of serialized images (like a PNG or JPEG).
The datastructure allows to address parts of the image in X/Y (for cropping, collages), on multiple levels-of-detail (for downsampling, thumbnails). This is achieved by representing the image as a tiled mipmap pyramid.
0.0.1: Initial proof of concept works
ipld-image-fromblob
can take an image file (PNG), and upload to IPFS as an IPLD imageipld-image-toblob
can take an hash of IPLD image, and render an image file (PNG) from itSee TODO for more details
Images are a huge part of web content today. Their primarily (only) representation is that of a file, a blob of bytes, which we know nothing about apart from its MIME-type. The file typically contains compressed pixel data, and sometimes some metadata.
So if we want to display the image, we have to download and process the whole file. For some formats one can stream only the beginning of a file, and from that get a lower-quality image from it. This is intended to allow progressive rendering. Theoretically one could cancel the stream when one deems the quality high-enough, but no web browsers available does this - and unassisted it cannot know what quality is considered good-enough.
This means that there is no space savings possible. This is inefficient, and painful - especially on slow pay-per-MB connections as is typical on mobile. Furthermore due to responsive design, the same image (semantically) may be presented at many different screen sizes, depending on the layout of the page it is included o. With smart-cropping the image might be not just rescaled, but also show only a subset of the image.
To solve this today, one typically uses an image processing server which automatically creates (multiple) down-scaled versions of an image. Examples include imgflo-server.
However, the processing server must also download the entire image, even if it knows that only a downscaled cropped part would be needed.
(addressing this might be out-of-scope for v1)
When receiving a down-scaled image blob, there is (in general) no way to find back the original source image. This means that for instance author attribution must be side-channeled (and usually is not).
Most processing services strip all metadata in the process of creating versions for display. In a few cases this can be a benefit, as privacy-invading metadata like geographic location is not present. But mostly it limits usefulness, like one cannot know which camera settings where used, so one cannot do after-the-fact projection/lens correction.
IPLD is the Inter Planetary Linked Data format. It will form the base of IPFS, an effort to re-architect Internet protocols to be peer2peer based on content-addressing. IPLD thus serves a similar role to that of IP packets in the conventional Internet protocol stack.
Note: IPFS 0.5 will transition to IPLD as the underlying protocol, whereas IPFS 0.4 (and earlier) use MerkleDAG, a less generic version of the same basic idea.
A mipmap is an structure for efficiently storing images at different levels of detail. At the lowest level are the original image in full resolution, then at each level up the image resolution is halved in both width and height. So 2x2=4 tiles at level N becomes 1 tile at level N+1. With square images, A fully mipmapped structure is at most 33% larger than the original.
This also means that an image at level=2 (1/4 width and height) is 1/16 the number of pixels that needs to be downloaded and processed.
ipld-image uses a mipmapped structure, but instead of each level being a continous buffer, it is a set of tiles, with each tile containing a encoded piece of the pixel data.
The spec here is given as a pseudo-YAML structure.
## Image
# IPLD-image protocol version
'ipld-image-version': 1
# The Image this data was derived from, if any
# It SHOULD be used when processing an image, say when overlaying text, changing colors etc
derivedfrom: { '/': Image }
# If lossy compressed, this SHOULD be set to a losslessly compressed version
# If processing an image, and this is set, the client SHOULD use canonicalversion instead of this one
canonicalversion: { '/': Image }
# size of each tile
# The tile size should be such that each Tile is less than the IPFS block size
# Currently IPFS block size is 256KB. An 256x256 px image with 3-4 bytes per pixel should almost always be below this
tilesize: { x: 256, y: 128 }
# number of tiles spanned.
tiles: { x: 10, y: 15 }
# The size of the image spanned by the tiles (in pixels) is:
# [tilesize.X*tiles.x , tilesize*tiles.y]
# here, 2560px by 1920
# which part of the spanned data is visible
# this allows re-using tiles even doing crops/views which don't
# If not specified, the boundary is implicitly
# x: 0, y: 0, width: tilesize.x*tiles.x, height: tilesize.y*tiles.y
boundary:
x: 10
y: 10
width: 1000
height: 1000
# mipmap structure containing the image data
level0: { '/': TileList } # n=tiles.x*tiles.y
level1: { '/': TileList } [ .. ] # n/=4
level2: { '/': TileList } # n/=4
...
levelH: { '/': TileList } n=1
## Tile
# format of data
format: "png-idat"
# size of data in pixels
size: { x: 256, y: 128 }
# link to the chunk of image data
data: {"/", }
## TileList
# stored in a scanline fashion
# ie: the first tile is at x=0,y=0, then follows y=0, x=1,2,3,4,5 -> (tiles.x-1)
# then everything in y=1. Repeat until all rows are included
[ { '/': Tile,} { '/': Tile } ... ]
An advantage of this initial spec is that the Tile
, containing the image data,
is self-describing yet has no dependencies on its surroundings.
This should allow reusing the Tile
also in other TileList
s or Image
s.
How to deal with fact that images can be non-square and have width/height which are not multiples of 2?
log2(size)
), and the shape of each level (always previoussize/2
),Is storing all levels as part of image the best approach?
Some alternatives are:
Should one allow multiple representations for a tile? Say different compression/formats
Should we allow sparse images (some areas not covered by tile).
Should we allow non-uniform chunk sizes?
TODO: define usecases which we want the dataformat to support (and which ones are not so important).
TODO: write how each of these would be performed on example data
boundary
propertyRight now, we can deducplicate parts of images when the encoded representation of tiles are identical. However even the tiniest, impercievable change, like a 1 bit quantification error, will invalidate deduplication. Some existing discussion here, with references to academic papers.
Right now each tile is stored as a proper PNG image. In order to construct an image file for rendering, we decode each of the neccesary tiles, blit it into an in-memory RGBA image representation, and then encode this as a new PNG file.
What if instead we could store tiles as compressed data (without headers), then assemble an image file by concatating a new header with a set of such pre-encoded tiles. This would skip both the decoding and re-encoding steps. This would reducing neccesary computations and memory usage significantly.
PNG might not be suitable for this, as the encoded stream seems to be in scanline ordering. It may require storing each scanline of each tile as a separate chunk..
This is primarily of interest when IPLD is the core protocol for IPFS (0.5), and there is support for IPLD path/selectors including ordering. As that way, one could theoretically express the rendering of an output image file using only IPFS primitives.