Clearly it could be applied immediately to robotic manufacturing. Tracking parts, understanding their orientation, and manipulating them all get easier when its 'cheap' to add additional tracking sensors.
Three systems sharing data (front, side, top) would give some very good expressive options for motion based UIs or control.
Depending on how well the computational load can be reduced to hardware small systems could provide for head mounted tracking systems. (see CMUCam  for small)
The training aspect seems to be a weak link, in that some applications would need to have the camera 'discover' what to track and then track it.
A number of very expensive object tracking systems used by law enforcement and the military might get a bit cheaper.
Photographers might get a mode where they can specify 'take the picture when this thing is centered in the frame' for sports and other high speed activities.
Very nice piece of work.
He's clearly developing it on his laptop with a shitty webcam. That's why this is amazing. Screw robotic manufacturing, this is for my phone.
It says he's running on a "Intel Core 2 Duo CPU 2.4 GHz, 2 GB RAM" according to his website. As a good rule of thumb, computer vision runs about an order of magnitude slower (10x) on a phone (like an iPhone) than on a desktop/laptop.
Also - a crappy webcam actually makes things computationally easier because there's less data to deal with. In a lot of computer vision algorithms the first step is to take input and resize it to something that can be computed on in a reasonable time frame.
I bet he isn't using the GPU though.
Also - a crappy webcam actually makes things computationally easier because there's less data to deal with
Perhaps, but lens distortion, motion blur and a rolling shutter don't make things easier.
Anyway, the inventor himself claims a phone implementation is feasible.
Also, completely agree on how camera blur would worsen the accuracy of said algorithm, I was trying to point out that it would run faster on a lower quality camera (with the caveat that it might not work nearly as well).
I have only a vague understanding of the math behind how it works, yet I'm very successfully using it in an art project I'm playing with. An afternoon's Googling found me the OpenCV plugins for Processing and some face detection examples, and I've got a prototype that really disturbs my girlfriend - I call it "Death Ray" for extra creepiness factor - but I've got a infra-red capable camera mounted on a pair of servos to steer it, and another pair of servos aiming a low power laser. An Ardunio driving the servos and switching that laser, with Processing just "magically" calling OpenCV for face detection in the video stream - _all_ the "heavy lifting" has been done for me - viva le open source!
 The thing that _really_ creeps the girl out is when I sit it all on top of the TV, and have it find faces watching the tv and paint "predator style aiming dots" onto peoples foreheads...
Wait a year or two and phones will be as powerful as today's laptop.