The tracking was incredibly good. The first seconds of each example are a little off because the algorithm doesn't have ANY training data to work with. It's 100% ad-hoc.
The tracking improved during the mouse pointer example (which I found incredibly impressive). The point he was making during that example was that it learns the different scales/rotations of objects on the fly and tracking improves automatically.
Before commenting, I viewed it a second time, pausing it several times (it's at 1:20, if you'd like to confirm the following). After about 10 seconds of training data (500 frames), he starts to draw with it. It loses tracking four times (not counting the two times his hand goes off-screen), and it's not obvious to me why - there don't seem to be major rotations or changes of his finger placement. In practical use, that would be annoying.
btw: my comment originally included praise for his work - but I thought his merit was obvious and that it distracted from my comment's point, so I deleted it. Instead, I'll just note that the first telephone also had lots of room for improvement - practically, the first of anything does. The cool thing was the idea of the telephone, and then making it real. How good it was is irrelevant compared with that it became real. It doesn't take away from the immense task of doing something that had not been done before or even imagined. Quality is not as crucial, because once you have the basic thing, it's (relatively) easy to iterate to improve it. I think having the idea and making it real deserves far far greater admiration than the quality of the prototype algorithm and implementation. Just as with the telephone.