As this doesn't seem like an April fools joke (some of the papers were published last year :-)) its interesting to think about it in the context of what it might change. That being said I don't doubt for a minute that the university has locked up as much of the technology as possible in patents but that is another story. We can speculate about what it will be like in 20 years when people can do this without infringing :-)
Clearly it could be applied immediately to robotic manufacturing. Tracking parts, understanding their orientation, and manipulating them all get easier when its 'cheap' to add additional tracking sensors.
Three systems sharing data (front, side, top) would give some very good expressive options for motion based UIs or control.
Depending on how well the computational load can be reduced to hardware small systems could provide for head mounted tracking systems. (see CMUCam  for small)
The training aspect seems to be a weak link, in that some applications would need to have the camera 'discover' what to track and then track it.
A number of very expensive object tracking systems used by law enforcement and the military might get a bit cheaper.
Photographers might get a mode where they can specify 'take the picture when this thing is centered in the frame' for sports and other high speed activities.
It says he's running on a "Intel Core 2 Duo CPU 2.4 GHz, 2 GB RAM" according to his website. As a good rule of thumb, computer vision runs about an order of magnitude slower (10x) on a phone (like an iPhone) than on a desktop/laptop.
Also - a crappy webcam actually makes things computationally easier because there's less data to deal with. In a lot of computer vision algorithms the first step is to take input and resize it to something that can be computed on in a reasonable time frame.
Yep, I'm sure he isn't. I don't doubt that you could optimize this algorithm to run on a phone but that takes an insane amount of effort and expertise and is a feat in and of itself. The word lens guys, for example, spent about a year porting from an optimized C implementation on i386 to ARM for the iPhone - they even initially used the GPU but decided that the overhead of shuffling data between buffers wasn't worth the advantage gained by the iPhone's measly GPU (which only had 2 fragment shaders at the time I think).
Also, completely agree on how camera blur would worsen the accuracy of said algorithm, I was trying to point out that it would run faster on a lower quality camera (with the caveat that it might not work nearly as well).
Specialized processing hardware != general-use CPU. Face tracking and image stabilization in dirt-cheap cameras is a good example, as is hardware video decoders or graphics cards. If a market emerges, specialized hardware will be built, and it'll be embeddable in just about anything.
Face tracking is a remarkable well solved problem these days.
I have only a vague understanding of the math behind how it works, yet I'm very successfully using it in an art project I'm playing with. An afternoon's Googling found me the OpenCV plugins for Processing and some face detection examples, and I've got a prototype that really disturbs my girlfriend - I call it "Death Ray" for extra creepiness factor - but I've got a infra-red capable camera mounted on a pair of servos to steer it, and another pair of servos aiming a low power laser. An Ardunio driving the servos and switching that laser, with Processing just "magically" calling OpenCV for face detection in the video stream - _all_ the "heavy lifting" has been done for me - viva le open source!
 The thing that _really_ creeps the girl out is when I sit it all on top of the TV, and have it find faces watching the tv and paint "predator style aiming dots" onto peoples foreheads...
That 'first step' is so dangerous its mind blowing. One thing that is seriously holding academic CV back is datasets made for slow computers. Eyes take advantage of every possible input and the idea that you should start your CV task by throwing away data to make it 'easier' is so dumb its laughable. While I admit industry demands speed, if you have the luxury of doing pure research today and you're using black and white images you're not even wrong.