Great class and fun assignment!
It's essentially bases to the same algorithm: you can eliminate sequences if they lead to the same event that is already part of a solution, but with lesser probability. The amount of paths would be exponential, but the ability to eliminate them keeps it polynomial. That really brings forth the beauty of dynamic programming.
Agreed, it's basically the same algorithm.
- OSRM (the main routing engine behind openstreetmap.org) https://blog.mapbox.com/matching-gps-traces-to-a-map-7373019...
- Graphhopper https://www.graphhopper.com/blog/2016/08/08/releasing-the-ma...
Unfortunately it's not a simple project to get started on because you already need to have a way to import road network data (e.g. from openstreetmap), build an internal graph model, do some geographic indexing and perform fast many-to-many routing calculations.
it calculates the final probability by combining 'emission' probabilities (the probability that a GPS observation was on a particular road) by the 'transition' probability that if an observation was on a particular road at one point, what is the probability that it is now on this other road segment. By combining these two the final probability incorporates both the nearness of the GPS signals to the roads and the connectivity of the road network itself.
I've found the formulas applied in this paper are good in practice only if the GPS updates are relatively frequent
If you don't mind me asking, roughly what frequency threshold have you found the algorithm to perform badly above, and are you aware of any algorithms or formulae which perform better in these situations?
This means that if the path between two points is not simple (around a corner) the probability drops off very quickly. If the time between measurements is in minutes, this heuristic is pretty useless (and you should really use log-scale for your numbers!)
edit: this is actually shown in figure 8 of the paper where they explore different 'sampling periods'
edit 2: I have not explored other methods yet, but it would probably make sense to start by deriving the formula the way they do, by exploring ground-truth data.
edit 3: I just noticed that my comments are largely repeating what you're saying - sorry!
Agreed, the log scale is really important to avoid arithmetic underflow =] I believe OSRM and Graphhopper both do it that way. In my implementation I've flipped from thinking of measurement/transition "probabilities" to "disparities", and I choose the final route that has the least disparity from the trail. It seems to handle trails with around a 30-60s frequency over a 5-10hr period with decent accuracy.
As with you, I have found that it still often gives ok results with slower frequencies, as long as the transition probabilities are still relatively in the same scale as each other for a particular observation pair. However it means that there's no point trying to 'tune' using the gamma and beta parameters
The original paper was discussed on slashdot and back at that time I was inspired to build a little GUI around an open source algorithm implementation to play with my Qt skills.
It allows you to shrink, expand and "mask out" regions you don't want touch etc.
Still available on Google Code archive:
I never did finish that weird idea, but I probably needed to try something like increasing the energy of the chosen seam (and its duplicate)... I may try that again, just because I'm curious what would happen.
> Figure 8: Seam insertion: finding and inserting the optimum seam on an enlarged image will most likely insert the same seam again and again as in (b). Inserting the seams in order of removal (c) achieves the desired 50% enlargement (d). Using two steps of seam insertions of 50% in (f) achieves better results than scaling (e). In (g), a close view of the seams inserted to expand figure 6 is shown.
This results in characters in frame apparently moonwalking around. It's a really cool effect, but not good for actually resizing a film. Might make for a cool bit of a music video.
Might make for a cool bit of a music video
Or maybe doing histogram equalisation for pairs of adjacent frames before the seam cutting might help.
If you visualize a video as a stack of frames, then the seam would cut through the stack, like cutting a cake with a knife, following the minimum-energy path through the stack.
You'd do this by modifying the recurrence relation to add a term for the energy of pixels in the previous frame as well as the previous line in the current frame.
In the single frame case, each pixel is a 0 dimensional point, and for each pixel, you evaluate the 3 adjacent pixels in the row above. Then to find the seam you just pick the lowest energy pixel at the edge of the frame and follow the thread up. So the total runtime of this algorithm is O(pixels).
In the multi-frame case, if you want the video to be totally smooth, you have to think higher-dimensionally.
In the multi frame case, each seam is a 1 dimensional list of pixels, and for each seam, you evaluate the $HUGE_NUMBER of adjacent seams in the previous frame.
That is, the 2D case's runtime is proportional to the image height times the number of adjacent pixels, which is a small constant (3). In the 3D case, the runtime is proportional to the number of frame times the number of adjacent seams, which is massive.
I can imagine some heuristic optimizations that would allow you to guide your search, though. For instance, you could significantly downsample the video in both pixels and framerate and solve that, and then use that low-resolution solution to constrain and prune an approximate high-resolution search.
The problem is that humans watching the video will build a 3D mental model of the scene; any transform that modifies multiple frames must do so in such a way as to maintain global spatial consistency, or it will look odd. Seam carving is too primitive to do this. You will have shots where (for instance) someone is walking obliquely away from a building towards the camera, and yet making no headway because the algorithm is carving away the (low energy, yet perceptually indispensible) pixels that seperate them - the result will be that they appear to be walking in place (and growing).
(HN really should make it easier to make "anchor" links, so people don't have to load a new page just to see a sub-thread. That probably took me almost 5 minutes, which is like a quarter of the "anti-procrastination" budget with the default settings. It should have taken 10 seconds.)
Basically you'd apply a heuristic that because faces are so special to the human visual system, the perceived energy of a face is higher than the pixels would otherwise indicate.
This would make seams avoid altering any faces in the scene.
Great intuition! The original paper actually goes into this, and they come up with that solution.
This solution is a special case of allowing the user to apply positive and negative penalties as they wish. The latter allows targeted object removal.
One part I didn't get into is artificial weights. The original paper discusses "painting" certain regions of the image as high energy, driving seams away from those regions. This ties into faces when the paper suggests face detection to automatically paint faces with high energy, avoiding these issues in the first place!
I think I have a list of curated DP problems bookmarked somewhere, I'll see if I can track it down.
The problem is that you'd need to get your hands on a large dataset and none of the true ones are public. I think some of the national labs have synthetic models of various sizes to play with. Seriously cool field.
I believe there's a typo here: "the time complexity would still be 𝑂(𝑊)" should be "the space complexity would still be 𝑂(𝑊)".
It depends on how you look at it. A deep learning approach is supposedly more generic. Therefore I suppose the assumptions would be dynamic instead of fixed.
I like to think it this way - effectively, deep learning provides learned priors from data for a downstream task whereas the manual way comes from expert knowledge without the learning part.