
Real-world dynamic programming: seam carving - akdas
https://avikdas.com/2019/05/14/real-world-dynamic-programming-seam-carving.html
======
maaaats
I held a workshop once where people implemented this (and other image
algorithms). The end result can be seen here [0], and the tasks here [1] (but
in Norwegian). Can click the "run seam carving" button to watch it unfold step
by step.

[0]: [https://matsemann.github.io/image-
workshop/](https://matsemann.github.io/image-workshop/) [1]:
[https://github.com/Matsemann/image-
workshop](https://github.com/Matsemann/image-workshop)

------
alleycat5000
This was a homework assignment in Robert Sedgewick's Algorithms course on
Coursera!

[https://www.coursera.org/learn/algorithms-
part1](https://www.coursera.org/learn/algorithms-part1)

Great class and fun assignment!

~~~
fizwhiz
Wasn't the first assignment about detecting percolation using the union find
datastructure? IIRC, sedgwick's class doesn't cover dynamic programming
directly at all.

~~~
Chickenosaurus
Seam carving is the second assignment of Algorithms Part II. I also highly
recommend Princeton's Algorithms course on Coursera.

~~~
fizwhiz
Ah, I haven't taken Part II of the course. Probably not a bad idea to start :)

------
milliams
It feels like those images of the energy could have been improved by plotting
the log of the energy. It would allow us to see changes at the low end where
the decisions are being made.

~~~
tylorr
I'm toying around learning the algorithm. Thought I might share a log plot of
the energy. [https://imgur.com/a/YlrS7aV](https://imgur.com/a/YlrS7aV)

------
GolDDranks
This brings to mind the Viterbi algorithm that calculates the most likely
sequence of events in a hidden Markov model (Markov model itself is just a
state machine on a graph with weighed edges, where each weight on a state
transition is represented as a probability of that transition).

It's essentially bases to the same algorithm: you can eliminate sequences if
they lead to the same event that is already part of a solution, but with
lesser probability. The amount of paths would be exponential, but the ability
to eliminate them keeps it polynomial. That really brings forth the beauty of
dynamic programming.

~~~
n4r9
One really nice application of the Viterbi algorithm is map-matching, i.e.
taking a GPS trail and "snapping" it to the road network to determine the
actual route taken. It's difficult because GPS trails are often sparse and
noisy, so you can't do something simple like "project onto the closest road
segment". If you take the "states" of the model to be road segments and derive
the transition probabilities from the shortest route between segments, you can
apply Viterbi and get a very accurate result most of the time. Of course,
calculating shortest routes involves Dijkstra, another famous dynamic
algorithm.

~~~
benrbray
Cool! Has anyone written this up and made cool visualizations? It's one of
those many things I'd like to do but don't have time for :)

~~~
cechner
yep - look up 'Hidden Markov Map Matching through Noise and Sparseness' by
Krumm and Newson

it calculates the final probability by combining 'emission' probabilities (the
probability that a GPS observation was on a particular road) by the
'transition' probability that if an observation was on a particular road at
one point, what is the probability that it is now on this other road segment.
By combining these two the final probability incorporates both the nearness of
the GPS signals to the roads and the connectivity of the road network itself.

I've found the formulas applied in this paper are good in practice only if the
GPS updates are relatively frequent

~~~
n4r9
I've also found that it can be tricky to map "convoluted" routes that don't
necessary go simply from point A to pointB.

If you don't mind me asking, roughly what frequency threshold have you found
the algorithm to perform badly above, and are you aware of any algorithms or
formulae which perform better in these situations?

~~~
cechner
it should be in seconds - the problem is that the paper assumes that the
'great circle' (straight line) distance between two points should be almost
the same as the 'route' distance between those points, with an exponential
probability distribution.

This means that if the path between two points is not simple (around a corner)
the probability drops off very quickly. If the time between measurements is in
minutes, this heuristic is pretty useless (and you should really use log-scale
for your numbers!)

edit: this is actually shown in figure 8 of the paper where they explore
different 'sampling periods'

edit 2: I have not explored other methods yet, but it would probably make
sense to start by deriving the formula the way they do, by exploring ground-
truth data.

edit 3: I just noticed that my comments are largely repeating what you're
saying - sorry!

~~~
n4r9
Ah, that rings a bell now. You can vary a parameter they call "beta" to allow
for more convoluted routes, and I think a larger value gives a little leeway
for less frequent fixes.

Agreed, the log scale is really important to avoid arithmetic underflow =] I
believe OSRM and Graphhopper both do it that way. In my implementation I've
flipped from thinking of measurement/transition "probabilities" to
"disparities", and I choose the final route that has the least disparity from
the trail. It seems to handle trails with around a 30-60s frequency over a
5-10hr period with decent accuracy.

~~~
cechner
actually, beta is less useful than that! I think it represents the median
difference between the two distances, its not a tolerance (at least as far as
I can recall after experimenting with tuning this value).

As with you, I have found that it still often gives ok results with slower
frequencies, as long as the transition probabilities are still relatively in
the same scale as each other for a particular observation pair. However it
means that there's no point trying to 'tune' using the gamma and beta
parameters

------
andreareina
There's also the approach that calculates the energy of the resulting image as
opposed to the seam being removed, which allows the seams to pass through
objects where doing so will minimize artifacts.

Paper:
[http://www.faculty.idc.ac.il/arik/SCWeb/vidret/index.html](http://www.faculty.idc.ac.il/arik/SCWeb/vidret/index.html)

GitHub: [https://github.com/axu2/improved-seam-
carving](https://github.com/axu2/improved-seam-carving)

~~~
vanderZwan
Thanks for sharing - it's kind of sad that nobody ever seems to know about the
significantly better forward energy version. Especially since it's such a
minor tweak.

------
gabeiscoding
Cool to see this popping up again. It always impresses if you haven't seen it
before and is a cool algorithm to work through.

The original paper was discussed on slashdot and back at that time I was
inspired to build a little GUI around an open source algorithm implementation
to play with my Qt skills.

It allows you to shrink, expand and "mask out" regions you don't want touch
etc.

Still available on Google Code archive:

[https://code.google.com/archive/p/seam-carving-
gui/](https://code.google.com/archive/p/seam-carving-gui/)

------
co0nsta
If you like DP imaging applications like this, this old Microsoft Research
technical report is neat: it uses DP to merge frames from two webcams placed
left and right to synthesize a view in the middle, like having a webcam in the
middle of your monitor. The DP is interesting because it has penalities set up
assuming planar content because faces are pretty flat and in front of the
cameras. Link: [https://www.microsoft.com/en-
us/research/publication/efficie...](https://www.microsoft.com/en-
us/research/publication/efficient-dense-stereo-and-novel-view-synthesis-for-
gaze-manipulation-in-one-to-one-teleconferencing/)

------
bcp2384
Why are DP problems so popular for interviews? I am doing leetcode now and
they seem to be everywhere.

~~~
walrus1066
I'd honestly just walk out if a company asks this stuff, yet the actual work
is maintaining a CRUD app.

------
jharger
I remember implementing this for a class years ago, and then the professor
suggested doing the inverse to try to expand the image width. The idea was you
would duplicate the lowest energy seam... but all that did was create a lot of
repeats of the same seam.

I never did finish that weird idea, but I probably needed to try something
like increasing the energy of the chosen seam (and its duplicate)... I may try
that again, just because I'm curious what would happen.

~~~
Scaevolus
The original seam carving paper discussed expanding too:
[https://youtu.be/6NcIJXTlugc?t=56](https://youtu.be/6NcIJXTlugc?t=56)

[http://www.faculty.idc.ac.il/arik/SCWeb/imret/imret.pdf](http://www.faculty.idc.ac.il/arik/SCWeb/imret/imret.pdf)

> Figure 8: Seam insertion: finding and inserting the optimum seam on an
> enlarged image will most likely insert the same seam again and again as in
> (b). Inserting the seams in order of removal (c) achieves the desired 50%
> enlargement (d). Using two steps of seam insertions of 50% in (f) achieves
> better results than scaling (e). In (g), a close view of the seams inserted
> to expand figure 6 is shown.

------
petschge
This is also known as "liquid rescale" and there is (was?) a gimpl plugin for
it. It last updated in 2013 or so. After that the developer was hired by Adobe
to work on Photoshop.

~~~
BeetleB
Yes, Gimp had it as a plugin first - before Photoshop.

------
ttoinou
Great. Now adapt this to video (:

~~~
itronitron
I noticed they don't provide any images with human faces.

~~~
a-priori
Since the human visual system is so highly sensitive to faces I think the best
approach here would be to apply a facial detection algorithm, then boost the
energy for the regions where faces are detected.

Basically you'd apply a heuristic that because faces are so special to the
human visual system, the _perceived_ energy of a face is higher than the
pixels would otherwise indicate.

This would make seams avoid altering any faces in the scene.

~~~
akdas
> apply a facial detection algorithm, then boost the energy for the regions
> where faces are detected.

Great intuition! The original paper actually goes into this, and they come up
with that solution.

This solution is a special case of allowing the user to apply positive and
negative penalties as they wish. The latter allows targeted object removal.

------
bitL
Any good set of super hard dynamic programming problems to practice? (I mean
way harder than leetcode/hackerrank etc.)

~~~
bouk
You can look at competitive programming sites, they are a lot more serious
than leetcode/hackerrank

[https://dmoj.ca/problems/?type=5&show_types=1&order=-points](https://dmoj.ca/problems/?type=5&show_types=1&order=-points)

[https://codeforces.com/problemset?order=BY_RATING_DESC&tags=...](https://codeforces.com/problemset?order=BY_RATING_DESC&tags=dp)

------
slig
Really interesting, thanks for sharing!

I believe there's a typo here: "the time complexity would still be 𝑂(𝑊)"
should be "the space complexity would still be 𝑂(𝑊)".

~~~
akdas
Thanks so much for pointing that out! I've updated the article.

------
trhway
Tangential - the turbulent water looks for me like the large scale structure
of the Universe.

------
xemoka
Non-medium version: [https://avikdas.com/2019/05/14/real-world-dynamic-
programmin...](https://avikdas.com/2019/05/14/real-world-dynamic-programming-
seam-carving.html)

~~~
sctb
Thanks! We've updated the link from [https://medium.com/@avik.das/real-world-
dynamic-programming-...](https://medium.com/@avik.das/real-world-dynamic-
programming-seam-carving-9d11c5b0bfca).

------
amelius
Isn't there a more generic deep-learning approach (i.e. with less assumptions)
for this problem?

~~~
madhadron
It wouldn't really have fewer assumptions. In fact, it probably would have
more. We just wouldn't know what they are. Classical image analysis is still
interesting and valuable because you can construct an algorithm based on
desired properties without having to have a large, labelled training set
beforehand, and because it's computationally much less expensive.

~~~
amelius
> It wouldn't really have fewer assumptions. In fact, it probably would have
> more.

It depends on how you look at it. A deep learning approach is supposedly more
generic. Therefore I suppose the assumptions would be dynamic instead of
fixed.

~~~
activatedgeek
Assumptions are NOT dynamic. Once we have chose a "loss" function or whatever
fancy name you want to call the objective function by, you've already made a
choice. There are never dynamic assumptions (a classic example would be the
choice of use L2 loss in the pixel space essentially assumes a Gaussian
likelihood, which is in principle kind of goofy but hey it works). Although,
as alluded to earlier, it is hard to understand the space induced by the
architectural assumptions (and many other moving parts)

I like to think it this way - effectively, deep learning provides learned
priors from data for a downstream task whereas the manual way comes from
expert knowledge without the learning part.

