There's a lot of current work going on that effectively splits computer vision into multiple parallel tasks for better results but uses previously well known techniques (PTAM is another good example).
As an aside, I read through the paper and it doesn't look like this could track, say, your index finger separately from other fingers if, for a moment, your hand was occluded. This pretty much bars using this exclusively in a Minority Report style interface (you would need hand pose tracking like the stuff Kinect does). Though, I'm just re-iterating your point that this isn't the second coming of computer vision.
That being said, there are some really good ideas here.
A trackpad with a separate screen would be optimal (so you don't have to look at your hands).
By now, it's practically a 145-minute tech concept video with a plot starring Tom Cruise.