My brother has a YouTube channel full of content-aware scaling videos:
My first reaction is how actor's faces look surprisingly like traditional caricatures that illustrators do -- e.g. shrinking foreheads and chins which are detail-light but keeping eyes and ears which are detail-heavy.
But my second thought is that the extreme jumpiness in frames occurs because each frame is processed separately. But if you considered each seam not to be a "jagged line" from point A on one edge to point B on the opposite edge of a single frame, but rather a "jagged plane" cutting through a series of frames -- all frames in a single shot -- you could eliminate the jumpiness entirely.
You might need to build a bit more flexibility into it to allow for discontinuities generated from object movement and camera panning, but I wonder if anyone's tried to do something like that?
Though I imagine it might be quite a lot of programming for a tool that might only ever be used as a kind of video filter for entertainment purposes -- I have a hard time imagining a cinematographer ever using it for serious purposes.
Actually the authors of the seam carving paper went on to do just that . From the abstract: "We present video retargeting using an improved seam carving operator. Instead of removing 1D seams from 2D images we remove 2D seam manifolds from 3D space-time volumes. To achieve this we replace the dynamic programming method of seam carving with graph cuts that are suitable for 3D volumes."
My favorite is:
easier solution would probably be frame interpolation between the two seperate frames.
It seems as though there's an additional effect (for extra... effect) when they scream. Not sure if that is a natural result of the content of the visual scene being processed, or if there's some sort of audio input into the visual processing, or if they manually/intentionally applied some sort of parameter change (at 0:29 and 0:38 in the video) that causes the video to get all chaotic.
For example, what if some ML tagging mechanism is used to find the silhouette of interesting objects in the image (people, animals, traffics signs, etc), and then "freezing" them to prevent the energy function from operating on those areas, thus preserving those objects intact, while resizing the rest of the image.
As an approach it seems to do a decent job with either very small changes (e.g. slight change of aspect ratio) or uninteresting images, but aesthetically the results seem bad on most interesting images; I suspect because targeting "low information" regions of the image removes tension that is needed. Often a simple crop is much better, it seems.
Brilliant implementation anyway. Having a lot of fun!
I use Photoshop frequently, and I use content-aware removal A LOT (super handy). But it never occured to me, not even once, that I need to use the content-aware resizing despite it's there for years. If I really need to change the ratio of an image/photo I usually just crop.
I am using this content aware image resizing library, which is used for: "Face detection to avoid face deformation."
One of the biggest complaints I have about HN is that it promotes really crappy "Look at me! I just learned a thing and wrote a 300 word blog doing a crappy job explaining it because I don't really get it but want to pad my CV..."
This article is exceptional. Thank you OP.
This is a broad brush, are you sure the intent is always resume padding? Some folks (like me) write poorly but I find writing tests what I know (and shows me what I don’t). I share anyway so I can be corrected and learn more, and so others might benefit if they have a similar problem. Your comment felt like shaming.
> This article is exceptional. Thank you OP.
100% agree, OP’s writing and content are examplary!
That's fine, just don't have such a big ego you need to share your crap with the world unless you have something important to say. That's why when you try to google something to learn, you have to wade through pages and pages of half-baked crap: all the good stuff has been drowned out.
As globally the task is defined as displacing pixels while minimizing a perceptual loss, it should be reasonably easy to express in a differentiable way. The benefits I see are higher quality semantics preservation, and potentially faster inference (one pass only).
The recent development of transformer models might provide just the tool to tackle variable sizes efficiently, maybe I should give it a go
Edit: if you're interested too and want to play on it together, shoot me a message :)
1. It was first developed by Shai Avidan at MERL.
2. Then introduced in paper by Vidya Setlur, Saeko Takage, Ramesh Raskar, Michael Gleicher and Bruce Gooch in 2005 which won 10-year impact award in 2015.
3. Adobe Systems acquired a non-exclusive license to seam carving technology from MERL and implemented in Photoshop CS4.
Then, when uploading the Solar System, it managed to capture each planet and its label without distorting them while only removing the space in-between... except for Saturn's rings which became wobbly :)
Architecture pictures tends to perform horrible because they contain so many straight lines and perspective cues. Faces are too stretched regardless of aspect ratio.
I do have one question: I see this is based on RGB, but how good is a "seam carving" implementation using RGB compared to one based on a color space more like human vision (such as CIELAB)?
What are the performance implications of this? Would it be possible and or a good idea to implement this in WebASM?
There's an obvious version of the algorithm in that direction. For one line "seam", it's easy enough, you just pull data from either side. But repeatedly applying it the more often your new "seams" end up next to something already estimated, the less real information there is - I suspect this becomes visually noticeable pretty fast.
Although I'm not really familiar with traditional algorithms for inpainting, I've seen some ML research do some stuff with it that I found to be really impressive.
One demo that really stood out to me was the following: https://shihmengli.github.io/3D-Photo-Inpainting/
The algorithm they describe is able to inpaint pixels AND depth information from existing RGB-D photos, enabling images to be viewed in 3d space and be used with parallax effects. Really cool stuff!
Yes, too late to edit but that's the more common name.
It's not quite the same thing as superresolution, since it's seam carving.
But like the top comment pointed out. This algorithm is easy to implement and interesting, but in real-world examples are not better than salient object detection + cropping.
Resize: 50% width, 70% height
The basketball hoop is heavily distorted, as is the court, the squares on the building and the 3 point line.
This image should look better with a strong penalty against seams that depart from vertical or horizontal lines, but it wouldn't be enough: the purple pillars and panels would be straighter but still squeezed.
As a plumber once said to me; you can't flush an 8 inch shite down a 4 inch hole.