As mentioned in the blogpost, the rolling-shutter version of this won the best paper prize at the International Conference on Computation Photography (ICCP), which was held last weekend in Seattle. This is a fairly new but very high quality conference. In many respects, I prefer it to the standard-bearing vision conferences like CVPR, ICCV, or ECCV -- although of course, ICCP is more narrowly focused on computational imaging and photography applications.
In their talk, the authors of this work showed many more video results and they were all quite impressive. In fact, they were good enough to fall into an "uncanny valley of motion", similar to the "uncanny valley" of faces or humans  that most people are familiar with. I.e., the motion correction was almost perfect, but just enough off that something felt vaguely surreal about the results. Nevertheless, it's a nice step forward.
Also, as others have pointed out, this is a fully uncalibrated method -- requiring no knowledge of how the video was captured. If you do have some knowledge, then you can often exploit it to do better. But the authors mentioned that most videos uploaded to youtube have either no calibration information, or if present, it's often incorrect. As such, it made sense for them to focus on the uncalibrated case.
Finally, I should point out that rolling shutter, standard on most mobile cameras, is causing all sorts of problems for traditional image and video analysis algorithms, which often make the assumption, sometimes implicitly, that the entire frame was captured at a single instance in time. This is not true anymore, and can lead to gross errors in many methods. Hence the many recent papers on correcting for, and in some cases exploiting, rolling shutter effects.
A recent interesting work along these lines from my former lab at Columbia University is "coded rolling shutter photography: flexible space-time photography" . This paper takes advantage of the fact that different rows in an image are seeing the world at slightly different instances in time to do things like high-speed photography, HDR imaging, etc.
> Finally, I should point out that rolling shutter, standard on most mobile cameras, is causing all sorts of problems for traditional image and video analysis algorithms, which often make the assumption, sometimes implicitly, that the entire frame was captured at a single instance in time. This is not true anymore, and can lead to gross errors in many methods. Hence the many recent papers on correcting for, and in some cases exploiting, rolling shutter effects.
This is not just true of mobile phones, but of any current CMOS-sensored imaging device (most of them on the market). Compact cameras and SLR's included.
It also looks like the stabilized video is magnified a little bit. Regardless, these tools are really neat and they are still in their infancy. It's good to see that GooTube still believes in user-generated content.
Just as a note for anyone using this, you'll want to read some of the options as well. The amount of smoothing is hard to get right automatically, and you will want different values for different effects.
Having said that, my results with this tool have been excellent in the past.
I really hope this keeps being optional because otherwise a lot of the authentic value of the videos will be lost. Also, the demo they showed looked like it was algorithmically deteriorated to make the change more noticeable. There's shaky hands and then there's parkinsons-level shaking which was what the demo showed...
Or the person holding the camera was shaking like mad to demonstrate the abilities of the algorithm. Artistically speaking, shake removal is also authenticity removal. But most times, personal videos are shot by people with no eye for framing and stability. And most times, artistic (and professionals) videos are shot with an eye toward these things.
I can't foresee a reason not to keep this feature as an option rather than enforcing it on all uploads.
it's much easier to record it without much shake and then maximize the effect of their algorithm after the fact instead of recording a bunch of versions with varying degrees of shake. They completely overdid it in my opinion, the example doesn't seem authentic at all to me (though the technology remains cool)
If they used a telephoto lens then the shaking would be much more noticeable. Also the motion hints at someone actually attempting to correct for the movements, but because it's zoomed so far in every movement to counteract the shaking is greatly magnified.
You should already start to see that working its way out: the iPhone 4S does video stabilisation, I am sure high-end Android phones do or will start to do the same.
The algorithm being discussed here is specifically designed for when information about the camera or environment is not available: there are much better ways of carrying out digital image stabilisation on the device itself, such as using the accelerometer data to compensate, or in significantly advanced cameras (DSLRs, for example) compensating by moving the lens itself.
If you read the Google paper, you'll notice that they actually refer to this and other work by Liu et al. The overall technique is the same, estimate the original camera path, calculate an optimal camera path, retarget the input frames to a crop window that fits the optimal path.
The primary difference seems to be estimation and calculation technique. Liu's work does a structure-from-motion reconstruction, ie: rebuild a 3d model of the original scene. Google's work uses something called pyramidal Lucas-Kanade to do 'feature tracking' instead. This is sort of localized reconstruction, it seems to only care about the viewport differences from frame to frame. They then feed it through some linear programming voodoo to get the best path.
I don't understand either well enough to say why one is better than the other, although I'd guess it's because Lucas-Kanade is temporally and physically localized, it's easier to farm out to a parallel cluster than an SfM technique.
There also seems to be a difference on the rear end of the technique, having feature detection allows them to add 'saliency' constraints, ie: retarget based on the inclusion of certain features, like a person's face. Again, the math is beyond my understanding, but it seems like this isn't part of Liu's work.
Have you tried it? I have and I'd say the quality is pretty close if not the same. The much bigger problem to solve now is that shaky videos shot in less-than-perfect lighting contain motion blur, which is extremely hard to remove. You'll notice that all of these demo videos were conveniently shot outside in direct sunlight and contain no motion blur at all.
I have only tried the one offered by YouTube, and not recently. I don't know if they have improved the algorithm in this respect, but for what I've seen in the past, the filter often creates a very eerie wobbly effect on the video, an effect that makes it look fake, like being underwater or drunk. It is slightly visible in the demo video if you observe the borders. This strange effect is totally absent on the link I posted, which I believe it's on a different level of quality. But I imagine it's computationally very expensive and can't be offered to millions of users for free.
Thanks for that link, this should have a submission of its own. I can easily see how something like this would make a 'point and shoot' video camera really useful. Think "Flip Camera meets James Cameron"
Going off on a tangent here, but from the distant memory of my film studies degree days one of the reasons you get so much fast cutting and hard-to-make-out action in modern fight scenes is because choreographing and shooting a fight scene properly is hard work, particularly if your actors aren't that experienced in stage combat. It's a big cheat, designed to make shooting fight scenes much easier (this is especially true if you're shooting a fight-scene where one participant is CG'd in).
Higher frame rates would help a lot with the blur. I think Peter Jackson made a mistake shooting The Hobbit at 48 FPS for the entire movie. He should have shot most of it at the traditional 24 FPS but used 48 or 72 for fast motion shots. Hopefully his blunder won't poison high FPS forever in the minds of filmgoers.
> Higher frame rates would help a lot with the blur. I think Peter Jackson made a mistake shooting The Hobbit at 48 FPS for the entire movie. He should have shot most of it at the traditional 24 FPS but used 48 or 72 for fast motion shots. Hopefully his blunder won't poison high FPS forever in the minds of filmgoers.
You can't shoot parts of a film at 24FPS, and parts at 48FPS - the 48FPS parts would be transformed down to 24FPS and would appear to be in "slow motion".
Jackson, for what it's worth, is sticking to his guns re: 48FPS and believes that part of the dislike is because it's "change".
You absolutely can shoot parts of a film in 24 FPS and parts at 48 FPS. Instead of transforming the 48 FPS parts to 24 FPS, you do the other way around and transform the 24 FPS parts to 48 FPS, not by doubling the speed but by repeating each frame twice. In fact, film projectors have always displayed movies at 48 FPS with frame doubling to reduce the appearance of flicker: http://en.wikipedia.org/wiki/Frame_rate#Background
Isn't quality inherently lost because the same video has to be reencoded again but without the shakes? Also, I just tested on a video and it looked slightly smudgy. OK, so if I am filming driving down a dirt road or after half a bottle of Jack Daniel's (or both) then it'd be good, otherwise it does more harm than good.
As I understand it the motion blur is a product of lossy compression (CCD has very short pixel-local exposure times; the shearing the article refers to appears when sweeping the whole image); which means that yes, stabilisation algorithms would work best with source data that hasn't been compressed using a perceptual model of motion blur.