Well, one thing that does work for decoding regular old PNGs is separating the entropy decoding (in PNG, DEFLATE) from the prediction (what PNG calls filtering). You can create a pipeline where one thread is working on the entropy coding and handing off completed rows to a thread that performs the prediction decode. This doesn't get a 2x speedup because usually DEFLATE is slower than the prediction step, so it ends up bottlenecked on zlib, but it helps a good bit. (I implemented this in Rust at one point with the intention to ship in Firefox, but it never did.)
Of course, this only scales to 2 CPUs. Beyond that you will need to do some sort of splitting of the input to achieve wins (since no matter how much you optimize the filtering, you're still bottlenecked on DEFLATE, which is inherently serial).
How I read nyanpasu64; I think that the point was that a web page will often have multiple png files -- often more than you have cores, so the gains of decoding a single specially-formatted png might disappear in practice.
iDOT seems like overengineering to me: Why not break up the PNG and tile them in an SVG? This should be just as fast, about the same size, and you'd never have run the risk of inventing your own format and implementing it badly[1]. Users would probably be happier with all their SVGs being faster than having specially-crafted PNGs accelerated (if given the choice)
I'm not entirely sure how to begin to answer this question. Forgive me if I start too early.
An SVG is a list of compositing instructions for some (graphical) artefact. These instructions are given so that their dependencies (the things that must be composed before other things) are stated directly.
If you have familiar with Adobe's Photoshop, you might imagine a SVG as a set of nested layers: An SVG "renderer" will simply compose these layers together in order to have some pixels to display.
Now, to give a clear example of what I'm referring to, I am going to show you a simplified SVG that composes four png files together in tiles:
That data:image/png,xxx stanza is the SVG-encoding of a PNG (a slight simplification: the format actually belongs to a number of different standards, that the SVG specification leverages).
That is to say, I'm not exactly suggesting converting a PNG to an SVG: I am also suggesting breaking apart the (large!) PNG into several component PNG files (tiles) so that they can be decoded independently. Note carefully my example, how none of the tiles overlap. A decoder can (trivially) determine these instructions are independent, and so process them independently.
Being able to decode parts of the resulting image independently is what the iDOT metadata makes possible: It is essentially a different encoding of the x/y/width/height information in the above, the difference is that SVG already existed.
Decompressing zlib/DEFLATE streams, in the general case, is an inherently serial task. It cannot be meaningfully parallelised.
The "trick" used by Apple, and a few other encoders, is to flush the zlib state every so often (ZLIB_FULL_FLUSH), which means that subsequent data is both byte alligned, and does not make any backreferences to before the sync.
If you know where these sync points are (Apple encodes this information in their non-standard "iDOT" chunk) then you can start decompressing from that point, in an isolated thread.
Technically speaking you can also speculatively decode Huffman coded bitstreams because the use of canonical Huffman tree means that the end-of-block symbol is almost likely the longest code and has no 0 bits, so you can start decoding at a long string of 1 bits. Of course this doesn't solve an issue of shared LZSS window across multiple blocks.
There is indeed a hilarious backstory. Back in time one of my friends was looking for a task for the Distributed Computing course where you write a program for the Cell microprocessor, and I half-jokingly suggested this, not realizing that it won't work at all because of the shared window. The friend eventually came to realize that issue and couldn't finish the project in time, but still got a good grade solely because of the novelty of the task (!). (According to that friend virtually everyone else did the parallel JPEG stuffs.)
I think that's what restart markers are for in JPEG from the original spec. IIRC in JPEG you can have markers that are marked as "crucial for proper decode" and programs should reject the image if they don't understand them. Maybe the latter was from PNG I forget.
You'd also have to be kicking off PNG decodes in a way that is compatible with offloading them to a threadpool, and lots of software likely just loads images synchronously and moves on with its day.
Interesting. On the iPad on Firefox in the link preview when long-pressing the link it shows "Hello World", but "Hello Apple" when actually opened. On the same iPad in Safari, it says "Hello Apple" both in the long-press preview and the full page :)
Is there a conjecture that says for a field with given a degree of complexity, a given string or function of a can be encoded equivalently in a number of different ways that is proportional to its size and maybe complexity class?
Between this and project zero's analysis (https://news.ycombinator.com/item?id=29568625) of NSO using compression encoding to create a virtual machine for calculating exploit offsets, while unrelated except at a very high level of abstraction, it reminds me conceptually of cryptographic hash collisions, where over a large enough search space or field / domain of complexity there are many equivalent encodings or homonyms/isomorphisms.
The issue with the NSO exploit was they found that the compression encoding for a font was Turing complete, and then wrote a virtual architecture in it, and then ran programs on it that did the calculations necessary for their exploit.
This png encoding issue is different, but if you abstract it upwards to find a general principle it may be the effect of, it's like there is fast rule where if if you know the size or definition of the field of possibilities, and then have a definition of a given string in it, the function that describes or defines that string will also yield all strings whose evaluation is the same. It's like Kolmolgorov complexity, but where instead of finding the smallest progam to compute something, it's: given the number of instructions to define programs over a field of inputs of a given size, there are N programs beneath length L that are equivalent.
Sort of a showerthought, but it's interesting to think that our ideas of encodings and general isomorphisms may be instances of the same concept linked by a sort of "imaginary" function.
I suppose it's not too surprising that thumbnailing is single-threaded, since its usually done in the background (where wall-time doesn't really matter) and/or in large batches (where file-level parallelism doesn't really help you).
Since the desktop icon is scaled down, it has to be parsed at some point and saved back out at icon size. The fact that it says HELLO WORLD on the icon even if you're saving it from Safari means that whatever code is shrinking it into a little icon is parsing it differently from what's being used to display it at full size.
AFAIK, PNG format can't contain portable little icons. It wouldn't make sense, either, since it would need lots of different sizes of those for different platforms and devices. Mac OS is what creates the icons for individual files on a Mac desktop. Type something into a text file and save it... the document's icon will actually show a tiny miniature of the text you typed, not just some generic text-document icon. The PNG miniature is generated by the OS, and says HELLO WORLD, even though the same OS parses it as HELLO APPLE when you open it with quicklook or preview.
iOS and macOS may be a small player in worldwide computer use, but they're not small enough that you can identify anyone with just this flag. Unless you're in North Korea, saying you're using Apple software isn't really that much of an identifying feature.
It can definitely be used for fingerprinting, though, especially if Apple ever fixes their PNG decoder.
> on the first retina iPads, decoding PNGs was a huge portion of the total launch times for some apps, while one of the two cores sat completely idle.
Could they distribute decoding PNG files to a thread pool, instead of making multithreaded PNG files? Or would this fail for single large PNG files?