There is some motion blur (around 24 seconds, when the momma tiger lies down) which is a bit puzzling once the image has been stabilised. But hey, it's better than it was before.
In their talk, the authors of this work showed many more video results and they were all quite impressive. In fact, they were good enough to fall into an "uncanny valley of motion", similar to the "uncanny valley" of faces or humans  that most people are familiar with. I.e., the motion correction was almost perfect, but just enough off that something felt vaguely surreal about the results. Nevertheless, it's a nice step forward.
Also, as others have pointed out, this is a fully uncalibrated method -- requiring no knowledge of how the video was captured. If you do have some knowledge, then you can often exploit it to do better. But the authors mentioned that most videos uploaded to youtube have either no calibration information, or if present, it's often incorrect. As such, it made sense for them to focus on the uncalibrated case.
Finally, I should point out that rolling shutter, standard on most mobile cameras, is causing all sorts of problems for traditional image and video analysis algorithms, which often make the assumption, sometimes implicitly, that the entire frame was captured at a single instance in time. This is not true anymore, and can lead to gross errors in many methods. Hence the many recent papers on correcting for, and in some cases exploiting, rolling shutter effects.
A recent interesting work along these lines from my former lab at Columbia University is "coded rolling shutter photography: flexible space-time photography" . This paper takes advantage of the fact that different rows in an image are seeing the world at slightly different instances in time to do things like high-speed photography, HDR imaging, etc.
This is not just true of mobile phones, but of any current CMOS-sensored imaging device (most of them on the market). Compact cameras and SLR's included.
transcode -J stabilize --mplayer_probe -i $infile
transcode -J transform --mplayer_probe -i $infile -y xvid4 -o $outfile
Having said that, my results with this tool have been excellent in the past.
Or the person holding the camera was shaking like mad to demonstrate the abilities of the algorithm. Artistically speaking, shake removal is also authenticity removal. But most times, personal videos are shot by people with no eye for framing and stability. And most times, artistic (and professionals) videos are shot with an eye toward these things.
I can't foresee a reason not to keep this feature as an option rather than enforcing it on all uploads.
Edit: also what's new here isn't the stabilization, it's that they will fix "rolling shutter" artifacts in each frame as well. Rolling shutter is something photographers generally dislike.
I've used video stabilizer filters on virtualdub but I doubt they could fix as much as was done in that demo.
Another interesting stabilization demo http://www.youtube.com/watch?v=_Pr_fpbAok8
The algorithm being discussed here is specifically designed for when information about the camera or environment is not available: there are much better ways of carrying out digital image stabilisation on the device itself, such as using the accelerometer data to compensate, or in significantly advanced cameras (DSLRs, for example) compensating by moving the lens itself.
The primary difference seems to be estimation and calculation technique. Liu's work does a structure-from-motion reconstruction, ie: rebuild a 3d model of the original scene. Google's work uses something called pyramidal Lucas-Kanade to do 'feature tracking' instead. This is sort of localized reconstruction, it seems to only care about the viewport differences from frame to frame. They then feed it through some linear programming voodoo to get the best path.
I don't understand either well enough to say why one is better than the other, although I'd guess it's because Lucas-Kanade is temporally and physically localized, it's easier to farm out to a parallel cluster than an SfM technique.
There also seems to be a difference on the rear end of the technique, having feature detection allows them to add 'saliency' constraints, ie: retarget based on the inclusion of certain features, like a person's face. Again, the math is beyond my understanding, but it seems like this isn't part of Liu's work.
You can't shoot parts of a film at 24FPS, and parts at 48FPS - the 48FPS parts would be transformed down to 24FPS and would appear to be in "slow motion".
Jackson, for what it's worth, is sticking to his guns re: 48FPS and believes that part of the dislike is because it's "change".
Shows how far we are in this whole cloud era.