It was ages ago, and has since been archived on google code :)
Still a very cool technique, and pretty easy to implement.
Simply resampling an image smaller keeps the large-scale structure (the shape and color), but loses the fine-scale structure (texture.) That’s why we don’t just resample photos down to 32x32 and use them as icons :)
Seam Carving (whether parallel or not) does the opposite: it preserves the fine-scale structure (texture) but distorts the large-scale structure (shape.)
But not always! The amount of shape distortion in Seam Carving is worst case O(N) with the number of seams deleted, but best-case o(1). It depends on the algorithm’s choices of seam.
Regular iterative Seam Carving, as described by its original paper, isn’t stateful-enough to be able to minimize any whole-image quality (like loss of large structure.) So it’s no surprise it hasn’t been used for something like this.
But parallel Seam Carving might just be the ticket for this problem. A model could be trained to select an optimal set of seams that work together to minimize the large-structure deformation of the image (i.e. the image’s lower-band difference in frequency-space), while also individually being low-entropy seams from the Seam Carving algorithm’s perspective.
Anyone want to take a crack at this?
This is still amazing all of these years later.
This discovery is 13 years old. Although it has clear limitations when it comes to humans, I'm surprised I don't see this kind of resize feature available in more software. Does Photoshop have this hidden in a drawer somewhere? Does imagemagick support this?
Also none of the implementations I have tried (last time I checked, not up to date on this) were as slick / realtime / interactive as the SIGGRAPH demo.
$ apt install gimp-plugin-registry
Perhaps using seam carving was a little bit overboard but I thought it would be interesting and it works really well for most Earth landscape photos. I only rarely see artifacts and minor distortions in the resized images it produces. Also since I only needed to resize one photo per day, the computational overhead was not a very big deal.
Each pixel can be tagged with a rank according to its seam number along each axis. If you have 4,000x4,000 pixels then each pixel needs a 12-bit integer value for each axis; and if you want the optimal map it’s another 1 bit per pixel. So something on the order of an extra 25-bits per pixel for a naive encoding, which almost doubles the size of an 8-bit per channel image.
It also has no explicit scene awareness. The energy function essentially defines an implicit heuristic of scene importance, which is surprisingly reasonable until you’re removing >25% of pixels along either axis.
I believe it shipped with Photoshop for a couple years (maybe still?) but in my experience it’s a bit more of an artistic effect than an automatic and scalable solution—and it would be really hard to make it run in real time at game frame rates. (I think the artifacts alone would make it not worth it for games.)
My current favorite: https://www.youtube.com/watch?v=NRB_ASog9QU
If you want one on a real video, I guess there's this: https://www.youtube.com/watch?v=a8k3b-QNbhs
If you were going to divide the image into K horizontal strips of triangles, instead divide the top half into K/2 horizontal strips of triangles and divide the bottom half into K/2 horizontal strips of triangles. To make things easier at the end, please include a 1px tall overlap between the top and bottom parts. Work downward from the top and upward from the bottom until you reach the middle. At the middle, for each column, you have the cost of the top half of the minimum seam that includes that pixel and the cost of the bottom half of the minimum seam that includes that pixel. So you can add those together then take the min as before.
Also, I guess this is beside the point, but surely almost all of the DP structure can be reused if you're repeatedly removing seams...
It's not besides the point at all. If you can reduce this linear-time algorithm to, say, amortized constant, then you'll handily beat parallelism.
After you remove a seam, you need to recompute the cones below every removed pixel -- which ends up being the cone below the topmost removed pixel (sadly, you can't just zip down the neighbors of the seam, but you can avoid expensive data moves with a 2d linked list). If the image is super wide, that gives you a speedup by approximately the aspect ratio. If it's square, you should be able to cut the runtime in half. If it's tall and skinny, neither this nor the author's approach are terribly helpful.
Because the seam is defined such that each pixel in the next row is within 1 column of the pixel in the preceding row, you can reuse all the calculations except for a pyramid shaped area around the last removed seam (in the worst case). However, that area is on the order of half the total number of pixels, so it doesn’t save _too_ much to reuse the other half.
> in the worst case
I wonder if typical data is more forgiving. If, for example, we cut an optimal seam down the column n=x, and the DP algorithm tells us that the best seams starting at (0,x-1) and at (0,x+1) are also column seams, the next iteration can happen in linear time in the number of rows. (Plus number of columns -- pessimistically -- if the next-optimal seam isn't "nearby".)
Basically, you're checking the things that crossed the old seam, and I'd guess that'd happen often enough, but I don't know how far the effects would typically spread. Maybe it would work worse in "sparse" images like blue sky, and better in "noisier" images like forests and crowds and dogs' fur.
I'm not sure how much cost that data-dependence would incur though. Might not be worthwhile.
Unlike other typical data augmentation operations, spatial relationships are altered in nonlinear but still realistic ways.
There is some interesting work on making ML invariant to rotations and such, but I'd be curious if there's an algorithm which is invariant to this. I could imagine that convolutions and pooling might be relatively invariant to this technique, for example, as long as the algorithm isn't doing a lot with the large scale structure of the image.