I had the opportunity to try and implement a "novel" algorithm for image downscaling. I contacted the authors - one replied that he can't reveal the source code, and the other didn't reply. So I went ahead and invested about 2 weeks implementing and optimizing it to the point where it worked - but the results were far from what we wanted. If they just supplied a demo program where I could see if it worked for our case, it would be much better.
Academic work would have explained the benefit of the algorithm. It would have presented it with a side by side comparison with common algorithms and explain it's pros and cons against these algorithms. It would have covered all aspects like quality, size, performance to name a few. It would have explained me if I could use this new algorithm in my field, and why (not). None of this is present in the current work shared here. You say this is half done, I would say this is not even 20% done..
To end on friendlier terms, I completely agree that more academic work should have been made available. Yet, I know the pressure in that world and can understand keeping it for yourself for a while. More often than not you will have to drag out a couple of other papers in the same field.
Furthermore, the traditional paper would have to make a lot of guesses about the kinds of images people were likely interested in and the range of characteristics that mattered. What kind of noise spectrum due your images have, how does increasing contrast affect things, what about the spacial frequency distribution in the images themselves, and so on... Different fields have radically different "typical" images and the attempts at covering a reasonable range of the in traditional papers were not necessarily very limited.
Instead, I see this model of publication as exploiting the possibilities of the 'Net to allow more effective communication and collaboration. And it is publication: it is making public, which is what makes the difference between science and alchemy... if there had been a "Proceedings of the Alchemical Society" we'd have had chemistry a thousand years ago.
What this model of publication does not (yet) have is a reputation mechanism, but it isn't clear it needs one, because you can see the results (and the code) for yourself. As such, I think the author has not only done something interesting in the image compression space, they are pointing the way on the future of scientific publication.
Measuring this model as if it could be described as a certain amount of progress along a line toward the old endpoint is mistaken. This is a paradigm shift, and the models are incommensurable.
The original post is certainly interesting, but that doesn't mean it extends our knowledge of image processing. For example, see this 20 year old paper that proposes the idea:
This is something peer review would pick up on... That said, I don't mean to discourage the author. It's a great idea and nicely presented!
I completely agree with the rest of your comment. Depending on audience you might prefer the one approach over the other. A person searching for an algorithm to use in a (large) production environment will most likely search for traditional publications (as you called it), while research might search for the other type described.
That leaves a discussion if you can talk of academic publications missing all the contents required for a publication. ;). I would like to call that academic work, and the other academic publications.
And then it only works on grayscale images. Maybe because it's easier to get funding for medical images. Just applying the algorithm to each color channel separately leads to color fringing when they get out of sync.
Finally, usability, distribution, and performance are afterthoughts. I don't disagree but it makes a huge difference.
Is this effect bad enough that it's still visible in other color spaces that use a luminance channel?
* what is the related work?
* can you put any bound on the error of the reconstruction?
* how does performance vary across resolution, noise, and content?
* how does that compare to the other state-of-the-art methods?
Without that, how do we know whether this is worth using?
That could be the reason why they didn't want to release any source code.
I'd be curious to know how this stacks up in terms of speed and quality.
EDIT: Oh yes, and there's also "Image Compression Using
Data-Dependent Triangulations" and "Survey of Techniques for Data-dependent Triangulations Approximating Color Images", both by Lehner et al., 2007. I don't mean to discourage you here, just pointing out the bar to be beaten. It's a cool idea.
See e.g. http://scribblethink.org/Courses/ScatteredInterpolation/scat...
I may have to read further, though. That's a lot of math.
Run the edge detection twice. That way you get better gradients along sharp edges.
Single run: http://i.imgur.com/kusJDRo.png
Double run: http://i.imgur.com/rHYSzpq.png
To me, at least, the double looks better. Especially in the stems.
Just add the following to FrequencyTable.cs, after line 18 (var edges = ...)
detector = new SobelEdgeDetector();
edges = detector.Apply(edges);
Have you tried struct-of-array (AAA...BBB...) instead of array-of-struct (ABABAB...) layouts?
Sorting them seems tough as the index of each value must match for each channel, so any sorting would have to occur beforehand. Except that is already sorted by x-y values, and my attempts otherwise have failed to produce results.
I did a quick and dirty experiment:
Seems about 30% smaller than before on http://i.imgur.com/5zwCEF5.png
Mind if I merge that? Or you could submit a pull request. Either would be great!
Also, do you know of any resources for learning about how to optimise for gzip compression? Google is just telling me about compression for websites.
I don't know any resources specifically about gzip compression. Demosceners have very practical and fun compression know how, so maybe look into: http://www.farbrausch.com/~fg/seminars/workcompression.html
You'll likely find a compressor much more suited to your particular data than gzip.
the samples can be saved and zipped up
Depending on the algorithm the results could vary wildly - there could be some characteristic of the samples that make them encodable in a smaller/easily-compressible way.
It also reminds me of this:
How might the final rendering look if it used some of the standard triangle shading techniques? Treat the sample points as coordinates in a mesh, assign colors to those coordinates based on what you sampled, then interpolate colors for the points between those coordinates using something like Gouraud or Phong shading (without the lighting). That might produce a satisfying result with fewer samples.
I wonder if this could be used as an image resizing mechanism? Take a large number of samples, then render the resulting image using those samples and a smaller or larger size. Or, generalizing further: turn the image into samples and associated colors, apply a transform to the sample coordinates, then render.
This also reminds me quite a bit of the algorithm used in http://research.microsoft.com/en-us/um/people/kopf/pixelart/... (for which, sadly, code is not available). I wonder if some of the techniques from there could improve the quality of the results with fewer samples?
What you are talking about is barycentric interpolation, which is what this is doing.
There are already image resizing algorithms that use triangulation (and something called DDE - data dependent interpolation) so the answer to your question is yes it is absolutely a valid idea.
1. There are triangle interpolation schemes out there now that are smoother than barycentric coordinates which should give much better results.
2. Look up DDE - Data dependent triangulation. It switches edges to connect points to neighbors that have similar values. It will get rid of some of the spikiness and leave more smooth gradients.
3. The running the edge detection twice scheme mentioned in the comments works because you want the change of the gradient, and you need both sides represented. So the double edge detection will give you manifolds, which is good.
4. Instead of having arbitrary vertex positions, you can just specify the offset to the next point. Then instead of an x and y value you can use one (possibly uint8_t) value to encode where the next point will go.
5. You can also chop some accuracy off of colors. In RGB, you can lose accuracy in blue and some in red. In other schemes like you can keep accuracy in luminance and lose it heavily in hue and chroma/saturation, etc.
So does that mean that there are other kernels that approximate derivatives "better"? Like with finite differences?
I'm a little too sleep deprived to think through this entirely, but just off the top of my head, it seems like over the course of a few frames that an edge in motion would wind up as a series of triangles where one of the points remains static while the other two shift away from it -- in other words, the Trigrad approach would yield motion blur as an artifact. And then when the next key frame comes along, the remaining point gets reselected, probably further along the motion path... So much like a normal differential approach you wouldn't need to store all of the point locations each frame, just the ones that change and which triangle they belong to.
It might be hard to make it stream-friendly though, since obviously the compression efficiency depends heavily on the storage structure (see pjtr's comments)...
It would be nice to have the original image for a more detailed comparison. As beautiful as the picture is, the depth-of-field effect makes comparison between the left and right sides of the image a little tricky.
EDIT: I've read the code - it seems to be using random sampling. Still not entirely sure how a point can be placed at a place with absolutely no Sobel response - maybe it can't, which would make sense. My question arose after looking at: https://i.imgur.com/9YHOtQ0.png and then https://i.imgur.com/XRF7mz4.png. It looks like samples have been placed in regions with no response, but perhaps my eyes just can't see the edges.
baseChance = 8 * samples / (width * height)
samples is the desired number of samples.
Again, this needs work. I'm pretty sure there's a way I can more accurately match the number of output samples to the desired number of samples.
Edit: And no, it can't produce a response if there's no edge (a value of zero). However, there's always some level of noise. This is true of the example you've shown - there's very light noise visible.
Have you tried using Canny Edge detection rather than Sobel? It might help a bit with the noise. I've just attempted it, but can't seem to get Canny working under Mono (I've not got a windows machine at the moment).
Since the weights u+v+w = 1 you actually don't need all three: u = 1-v-w, so P = (1-v-w) * A + v * B + w * C = A + v * (B - A) + w * (C - A).
Here's my first thought on what could improve that: samples are taken at edges, which is exactly where colors vary quickly. So perhaps samples should be taken in pairs, one on either side of the edge.
The compressed data comes out to 303KB, which isn't that great. It's a pretty noisy image.
I'm sure this could be increased by a huge amount if I had a not-terrible CPU or if I did some major refactors to use the GPU.
I don't think this approach can compete with JPEG and newer transform based variants for natural photos (event at edge cases), but seems like it would be nice for lossy compression of logos/general internet pics.
The point is, is that technique in between JPEG and PNG in terms of quality/size or is it worse than JPEG altogether ?
However, it handles gradients amazingly. A full colour gradient such as  can be made a tenth of the size since only ~500 samples are really needed.
> No text at all
Or indeed anything which poorly maps to gradients. For this I'm thinking about instead of storing pixel values per sample vertex, store only dct coefficient(s) per each tri - result of which gets texture-like-mapped to the tri surface. Think JPEG instead of 8x8 quads using variably sized tris.
JPEG artifacts would then be much more fine grained around edges.
EDIT: It would not need to be as complex as full DCT because a lot of information is carried through tri shape/positioning on edges. The idea is to have "library of various gradient shapes" to pick from, not full 8x8 DCT matrix capable of reconstituting 8x8 bitmap approximations.
Once again, thanks for inspiring implementation.
This is what I use for backups:
7z a -m0=PPMd:mem=256m -mx9 archive.7z file_to_compress