Hacker News new | comments | show | ask | jobs | submit login
Show HN: Trigrad, a novel image compression with interesting results (ruarai.github.io)
175 points by ruarai 837 days ago | hide | past | web | 67 comments | favorite

This is the difference between someone who actually does something and academic work that claims to achieve something. This is half-done, but it WORKS and you can use it and understand it right now.

I had the opportunity to try and implement a "novel" algorithm for image downscaling. I contacted the authors - one replied that he can't reveal the source code, and the other didn't reply. So I went ahead and invested about 2 weeks implementing and optimizing it to the point where it worked - but the results were far from what we wanted. If they just supplied a demo program where I could see if it worked for our case, it would be much better.

I mean the author no harm, nor want to talk bad about his work. His work is very cool and I like the amount of information that he gives. Yet, this is far from comparable to academic work. I am inclined to say that you mixed up the sides in your statement, IMHO.

Academic work would have explained the benefit of the algorithm. It would have presented it with a side by side comparison with common algorithms and explain it's pros and cons against these algorithms. It would have covered all aspects like quality, size, performance to name a few. It would have explained me if I could use this new algorithm in my field, and why (not). None of this is present in the current work shared here. You say this is half done, I would say this is not even 20% done..

To end on friendlier terms, I completely agree that more academic work should have been made available. Yet, I know the pressure in that world and can understand keeping it for yourself for a while. More often than not you will have to drag out a couple of other papers in the same field.

This is the traditional model of academic publishing that was driven by the limited communication and collaboration ability of the time. A study like the one you describe would be done by multiple co-authors (I don't think I've written an image processing paper that doesn't have at least three people as authors, all of whom actually contributed to the work one way or another.)

Furthermore, the traditional paper would have to make a lot of guesses about the kinds of images people were likely interested in and the range of characteristics that mattered. What kind of noise spectrum due your images have, how does increasing contrast affect things, what about the spacial frequency distribution in the images themselves, and so on... Different fields have radically different "typical" images and the attempts at covering a reasonable range of the in traditional papers were not necessarily very limited.

Instead, I see this model of publication as exploiting the possibilities of the 'Net to allow more effective communication and collaboration. And it is publication: it is making public, which is what makes the difference between science and alchemy... if there had been a "Proceedings of the Alchemical Society" we'd have had chemistry a thousand years ago.

What this model of publication does not (yet) have is a reputation mechanism, but it isn't clear it needs one, because you can see the results (and the code) for yourself. As such, I think the author has not only done something interesting in the image compression space, they are pointing the way on the future of scientific publication.

Measuring this model as if it could be described as a certain amount of progress along a line toward the old endpoint is mistaken. This is a paradigm shift, and the models are incommensurable.

> What this model of publication does not (yet) have is a reputation mechanism, but it isn't clear it needs one, because you can see the results (and the code) for yourself. As such, I think the author has not only done something interesting in the image compression space, they are pointing the way on the future of scientific publication.

The original post is certainly interesting, but that doesn't mean it extends our knowledge of image processing. For example, see this 20 year old paper that proposes the idea:


This is something peer review would pick up on... That said, I don't mean to discourage the author. It's a great idea and nicely presented!

Maybe OP or someone else will take the initiative to add that and other reference material to the project's wiki.

I think there has been a misunderstanding. I never meant to say there is no place for the work of the author. I merely wanted to say that saying academic work produces nothing, as the reaction I replied to stated, is shortsighted.

I completely agree with the rest of your comment. Depending on audience you might prefer the one approach over the other. A person searching for an algorithm to use in a (large) production environment will most likely search for traditional publications (as you called it), while research might search for the other type described.

That leaves a discussion if you can talk of academic publications missing all the contents required for a publication. ;). I would like to call that academic work, and the other academic publications.

In my experience[1] code is available maybe half of the time, if you really search for it: checking the academic and personal pages of every author, and scouring the code of every framework mentioned. You're lucky when the code is in C or C++ using a framework nobody uses (e.g. MegaWave), but half of the time the code is in MATLAB. All uncommented, using single character variables and under some restrictive or just weird license.

And then it only works on grayscale images. Maybe because it's easier to get funding for medical images. Just applying the algorithm to each color channel separately leads to color fringing when they get out of sync.

Finally, usability, distribution, and performance are afterthoughts. I don't disagree but it makes a huge difference.

[1] https://github.com/victorvde/jpeg2png

Just applying the algorithm to each color channel separately leads to color fringing when they get out of sync.

Is this effect bad enough that it's still visible in other color spaces that use a luminance channel?

I was working in YCbCr and I found it noticable. Compare the bottom-right tile in https://github.com/victorvde/jpeg2png/commit/64bf10789092ccf... with swipe. It's worst with green around the top of the left black line, but there's green and red fringing everywhere. And this is a normal use case.

I agree that it's nice to have this kind of write up and code, but "works" is a low bar to pass, especially in image compression. If this were written up as an academic paper, a few questions would have to be answered:

* what is the related work?

* can you put any bound on the error of the reconstruction?

* how does performance vary across resolution, noise, and content?

* how does that compare to the other state-of-the-art methods?

Without that, how do we know whether this is worth using?

the results were far from what we wanted

That could be the reason why they didn't want to release any source code.

Garland and Heckbert had a nice algorithm for this sort of thing in their 1995 paper, "Fast Polygonal Approximation of Terrains and Height Fields." The paper is mainly devoted to height fields, obviously, but at the end they demonstrate that their algorithm is also effective at triangulating color images for Gouraud-shading as well.

I'd be curious to know how this stacks up in terms of speed and quality.

EDIT: Oh yes, and there's also "Image Compression Using Data-Dependent Triangulations" and "Survey of Techniques for Data-dependent Triangulations Approximating Color Images", both by Lehner et al., 2007. I don't mean to discourage you here, just pointing out the bar to be beaten. It's a cool idea.

To the OP: There are also several other tools for scattered data approximation/interpolation developed in the last few decades, both mesh-based and mesh-free. Linear interpolation using barycentric coordinates on a triangulation is fast (and might be the most practical method for this particular use case), but nowhere near as good a result as you can get via other methods.

See e.g. http://scribblethink.org/Courses/ScatteredInterpolation/scat...

Not sure if that applies for my purposes, since I'm not actually using linear interpolation barycentric coordinates (I don't think that's possible). The barycentric coordinates supply the gradient within themselves.

I may have to read further, though. That's a lot of math.

What you’re calling a gradient is also known as linear interpolation.

I have an improvement to this:

Run the edge detection twice. That way you get better gradients along sharp edges.

Single run: http://i.imgur.com/kusJDRo.png

Double run: http://i.imgur.com/rHYSzpq.png

To me, at least, the double looks better. Especially in the stems.

Just add the following to FrequencyTable.cs, after line 18 (var edges = ...)

    detector = new SobelEdgeDetector();
    edges = detector.Apply(edges);

The double run does look a lot better.

Since the order of the samples doesn't matter, could you sort them somehow so the Gzipped stream of samples can be compressed better? (E.g. sort the color index by component average, by red, by maximum component, ... and sort the point index by color.)

Have you tried struct-of-array (AAA...BBB...) instead of array-of-struct (ABABAB...) layouts?

I tried your struct-of-array idea, and that's produced an okay improvement ~1%.

Sorting them seems tough as the index of each value must match for each channel, so any sorting would have to occur beforehand. Except that is already sorted by x-y values, and my attempts otherwise have failed to produce results.

I'm missing where they are sorted by x-y values.

I did a quick and dirty experiment: http://pastebin.com/NjZNRjw1

Seems about 30% smaller than before on http://i.imgur.com/5zwCEF5.png

Wow, that works pretty well. I was mistaken in thinking that either the Dictionary class or the process of sampling would sort them.

Mind if I merge that? Or you could submit a pull request. Either would be great!

Also, do you know of any resources for learning about how to optimise for gzip compression? Google is just telling me about compression for websites.

Sure, feel free to merge.

I don't know any resources specifically about gzip compression. Demosceners have very practical and fun compression know how, so maybe look into: http://www.farbrausch.com/~fg/seminars/workcompression.html

Compression gurus hang out at http://encode.ru

You'll likely find a compressor much more suited to your particular data than gzip.

This makes me feel obligated to share my ongoing master thesis project. Among other things, my approach is the reverse to this, namely decimating a full detail mesh using edge/ridge detection.



Nice pictures, but they should really have indicated the filesize for the 3000-sample version and given more details about this part:

the samples can be saved and zipped up

Depending on the algorithm the results could vary wildly - there could be some characteristic of the samples that make them encodable in a smaller/easily-compressible way.

It also reminds me of this:


Very impressive!

How might the final rendering look if it used some of the standard triangle shading techniques? Treat the sample points as coordinates in a mesh, assign colors to those coordinates based on what you sampled, then interpolate colors for the points between those coordinates using something like Gouraud or Phong shading (without the lighting). That might produce a satisfying result with fewer samples.

I wonder if this could be used as an image resizing mechanism? Take a large number of samples, then render the resulting image using those samples and a smaller or larger size. Or, generalizing further: turn the image into samples and associated colors, apply a transform to the sample coordinates, then render.

This also reminds me quite a bit of the algorithm used in http://research.microsoft.com/en-us/um/people/kopf/pixelart/... (for which, sadly, code is not available). I wonder if some of the techniques from there could improve the quality of the results with fewer samples?

That's exactly what it does, no? (Standard triangle shading technique, interpolating colors between the mesh, Gouraud shading without the lighting.) Phong shading (interpolate normal vectors) wouldn't make sense, as the mesh has no normals.

It isn't obvious from the article that the color interpolation used here matches Gouraud.

Phong is a lighting model.

What you are talking about is barycentric interpolation, which is what this is doing.

There are already image resizing algorithms that use triangulation (and something called DDE - data dependent interpolation) so the answer to your question is yes it is absolutely a valid idea.

"Phong" refers to two distinct but related things: The Phong reflection model (Ambient + Diffuse + Specular) and Phong shading (normal vector interpolation).

https://en.wikipedia.org/wiki/Phong_reflection_model https://en.wikipedia.org/wiki/Phong_shading

Correct (I over simplified). Neither really applies here because the interpolation is based on barycentric coordinates, which is the Phong part that was probably being referred to.

This is neat and the illustrations are great. A few things that will probably give large gains while being low hanging fruit:

1. There are triangle interpolation schemes out there now that are smoother than barycentric coordinates which should give much better results.

2. Look up DDE - Data dependent triangulation. It switches edges to connect points to neighbors that have similar values. It will get rid of some of the spikiness and leave more smooth gradients.

3. The running the edge detection twice scheme mentioned in the comments works because you want the change of the gradient, and you need both sides represented. So the double edge detection will give you manifolds, which is good.

4. Instead of having arbitrary vertex positions, you can just specify the offset to the next point. Then instead of an x and y value you can use one (possibly uint8_t) value to encode where the next point will go.

5. You can also chop some accuracy off of colors. In RGB, you can lose accuracy in blue and some in red. In other schemes like you can keep accuracy in luminance and lose it heavily in hue and chroma/saturation, etc.

W.r.t. running the edge detection twice. Is there a name for this operation? Finding points that are near an edge but not on the edge?

The Laplacian.

Weird. I hadn't made the connection with Physics there. I suppose it is a general mathematical operator.


Yes, it's not exactly equivalent here but it's pretty close. Basically, Sobel kernels are analogous to the 1st derivative and the Laplacian is analogous to the 2nd. You'll also often see the Laplacian combined with a Gaussian (the "LoG" operator) for pre-smoothing since it tends to be particularly sensitive to any noise.

Being a second derivative, I suppose it would be.

So does that mean that there are other kernels that approximate derivatives "better"? Like with finite differences?

Hmmm... I wonder if there's potential to use this concept for video. Not this implementation, naturally, but the concept.

I'm a little too sleep deprived to think through this entirely, but just off the top of my head, it seems like over the course of a few frames that an edge in motion would wind up as a series of triangles where one of the points remains static while the other two shift away from it -- in other words, the Trigrad approach would yield motion blur as an artifact. And then when the next key frame comes along, the remaining point gets reselected, probably further along the motion path... So much like a normal differential approach you wouldn't need to store all of the point locations each frame, just the ones that change and which triangle they belong to.

It might be hard to make it stream-friendly though, since obviously the compression efficiency depends heavily on the storage structure (see pjtr's comments)...

There's been some work on encoding video with triangles. See "Video Compression Using Data-Dependent Triangulations" (http://www-home.htwg-konstanz.de/~umlauf/Papers/cgv08.pdf) for example.

I really like the look at low sample rates, but the stems on the flowers look rather jaggy even with 100,000 samples.

It would be nice to have the original image for a more detailed comparison. As beautiful as the picture is, the depth-of-field effect makes comparison between the left and right sides of the image a little tricky.

How are the sample points selected? I get that they're weighted according to edge intensity, but what kind of distribution are you using in cases where there is no edge?

EDIT: I've read the code - it seems to be using random sampling. Still not entirely sure how a point can be placed at a place with absolutely no Sobel response - maybe it can't, which would make sense. My question arose after looking at: https://i.imgur.com/9YHOtQ0.png and then https://i.imgur.com/XRF7mz4.png. It looks like samples have been placed in regions with no response, but perhaps my eyes just can't see the edges.

This is an area that needs work, but basically there's the table of 'edge intensity' that gets multiplied by a constant baseChance variable for every pixel.

baseChance = 8 * samples / (width * height)

samples is the desired number of samples.

Again, this needs work. I'm pretty sure there's a way I can more accurately match the number of output samples to the desired number of samples.

Edit: And no, it can't produce a response if there's no edge (a value of zero). However, there's always some level of noise. This is true of the example you've shown - there's very light noise visible.

I see - thanks for the explanation!

Have you tried using Canny Edge detection rather than Sobel? It might help a bit with the noise. I've just attempted it, but can't seem to get Canny working under Mono (I've not got a windows machine at the moment).

Not sure exactly, but wouldn't these images theoretically scale up better than jpeg? i.e. Making a 600x800 image out of a moderately compressed 300x400 seems like these would potentially scale better than jpeg (for some types of images).

I think you'd just start to notice the fuzzy artifacts more in a larger image and need to have more and denser samples to make up for it.

Some more explanation about barycentric coordinate would be appreciated.

Barycentric coordinates are "weights" (u, v, w) that determine a point P on a triangle (A, B, C) by weighting the triangle corner points. You can calculate the cartesian coordinates of P = u * A + v * B + w * C.

Since the weights u+v+w = 1 you actually don't need all three: u = 1-v-w, so P = (1-v-w) * A + v * B + w * C = A + v * (B - A) + w * (C - A).

Gotcha. That makes sense.

Reading how this works, I'm a bit disappointed in how the low sample images performs, particularly for the stems. It feels like one should be able to tweak this to get much better results in this situation. I'm disappointed in the amount of yellow over the stem.

Here's my first thought on what could improve that: samples are taken at edges, which is exactly where colors vary quickly. So perhaps samples should be taken in pairs, one on either side of the edge.

Fun project!

The "easy" way to do this is to run the edge detection twice. In other words, run edge detection, then run edge detection on the result of the edge detection.

Congrats! but beware of Gavin Belson.

What about thresholding gradients (with adaptive threshold), storing resulting edges alongside (compressed with RLE f.ex.), and then blending it over the triangle gradients? It could remove some artifacts.

Can you try it with the lenna benchmark image and post the results please?

Here's Lenna reconstructed after 30,000 samples. http://i.imgur.com/hlPonsO.png

The compressed data comes out to 303KB, which isn't that great. It's a pretty noisy image.

How does the speed compare to other compression algorithms?

Currently compressing the example image (the one of the flowers) with 100,000 samples takes 1.5 seconds + 1.5 seconds for AForge's edge detection.

I'm sure this could be increased by a huge amount if I had a not-terrible CPU or if I did some major refactors to use the GPU.

How does it compare in quality/size vs PNG ? How about using examples where JPG are traditionally bad at, such as pictures with dark gradients leading to blocky artifacts? Would that process be more efficient there?

PNG is lossless, so I don't get the quality comparison.

I don't think this approach can compete with JPEG and newer transform based variants for natural photos (event at edge cases), but seems like it would be nice for lossy compression of logos/general internet pics.

> PNG is lossless, so I don't get the quality comparison.

The point is, is that technique in between JPEG and PNG in terms of quality/size or is it worse than JPEG altogether ?

It's not very comparable to PNG, since they're designed for different types of imagery. I know currently Trigrad cannot handle text at all. In regards to your other comment, it's currently definitely worse than JPEG for vector imagery.

However, it handles gradients amazingly. A full colour gradient such as [0] can be made a tenth of the size since only ~500 samples are really needed.

[0] http://i.imgur.com/QzW0z2O.png

Love how the algorithm is simple compared to iDCT based ones, very good job!

> No text at all

Or indeed anything which poorly maps to gradients. For this I'm thinking about instead of storing pixel values per sample vertex, store only dct coefficient(s) per each tri - result of which gets texture-like-mapped to the tri surface. Think JPEG instead of 8x8 quads using variably sized tris.

JPEG artifacts would then be much more fine grained around edges.

EDIT: It would not need to be as complex as full DCT because a lot of information is carried through tri shape/positioning on edges. The idea is to have "library of various gradient shapes" to pick from, not full 8x8 DCT matrix capable of reconstituting 8x8 bitmap approximations.

Once again, thanks for inspiring implementation.

Sampling should at most take linear time in the number of pixels (ignoring the edge detection). Even less if edge detection result is represented as a sparse array.

Try using something better than Zip. LZMA2/PPMd are very good.

This is what I use for backups:

    7z a -m0=PPMd:mem=256m -mx9 archive.7z file_to_compress

haven't you tried applying the edge detection filter twice? i think this way the samples are taken not in the edge but before and after the edges, maybe it will end with a blurrier image but with less artifacts.

This doesn't seem to have any benefit unfortunately. It seems like the current approach already produces the before-after edge effect.

This is very cool!

"Not really well enough to try to push onto people. Current results show that Trigrad can only beat something like JPEG with optimal conditions."


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact