Hacker News new | comments | show | ask | jobs | submit login

This is massively ground breaking. You'll get it if you've used motion tracking on several game interfaces and had to make perfectly white backgrounds with bright lights to make it work. This is incredibly accurate - really game changing stuff.

>This is massively ground breaking.

Sounds like you're used to bad algorithms. I think there is a serious disconnect between the state of the art in computer vision and what's used in industry.

The demo was cool, but the techniques are not that revolutionary. From a cursory glance through the papers, it is basically AdaBoost (for detection) and Lucas-Kanade (for tracking), with a few extensions.

Not to discount the guy's work at all, it's very cool and does a good job of pulling together existing algorithms. But not groundbreaking in the sense of, say, Viola-Jones was for object detection.

Valid comment but its analogous to claiming peanutbutter cups aren't novel because peanut butter and chocolate were both well known. There is novelty in being able to synthesize new systems from known elements which frankly I don't believe gets quite the credit it deserves. But that is just me.

The point is of course that being broadly familiar with a number of things can help you put together a novel thing out of a previously unknown combination of those things.

Spot on.

There's a lot of current work going on that effectively splits computer vision into multiple parallel tasks for better results but uses previously well known techniques (PTAM is another good example).

As an aside, I read through the paper and it doesn't look like this could track, say, your index finger separately from other fingers if, for a moment, your hand was occluded. This pretty much bars using this exclusively in a Minority Report style interface (you would need hand pose tracking like the stuff Kinect does). Though, I'm just re-iterating your point that this isn't the second coming of computer vision.

That being said, there are some really good ideas here.

I don't understand why everyone seems to have such a hardon for Minority Report-style systems. Gorilla arm pretty much rules that out from the start, and a tablet is more natural anyway.

A trackpad with a separate screen would be optimal (so you don't have to look at your hands).

Gorilla arm would prevent people from using that kind of system to replace mouse and keyboard, but I don't see why it could not work for some applications. I can think of several use cases where I would like a UI that does not require me to touch the hardware (think cooking or watching videos in the bath tub).

Minority Report is tremendously well-remembered as an interface concept and almost forgotten as a Spielberg movie. The Wikipedia article's longest section is "Technology."

By now, it's practically a 145-minute tech concept video with a plot starring Tom Cruise.

Yes using 2d displays in minority report was a huge mistake, but imagine it in 3d. Also that doesn't mean that you will have to keep your arms out before your eyes. Ideally you don't have to sit before your computer the whole day and use only a keyboard and a mouse, when you can have so much more freedom. Think of opening a book, ironing, or placing lego blocks etc.

Tony Stark, Iron Man: JARVIS --- that was his greatest invention - without JARVIS he couldn't have made his later generation suits.

since you're familiar with the topic, does this look lightweight enough for say, mobile applications, or does it require massive processing power?

According to his website [1] for this, he says that "TLD has been tested using standard hardware: webcam, Intel Core 2 Duo CPU 2.4 GHz, 2 GB RAM, no GPU processing is used and runs in a single thread. The demands of the algorithm depend on required accuracy of the algorithm. Implementation for mobile devices is feasible." in response to "What kind of hardware it was running on?"

So, according to him, it is lightweight enough to run on mobile devices. I'd imagine there are also several optimizations that can be done (leveraging multi-core chips or GPUs, for instance) to make the performance significantly better than the prototype he's demonstrating now. Also, taking into account Moore's Law, we may not be able to run this on today's mobile devices, but surely could on tomorrow's. Given that research is generally a few years ahead of industry, I would expect that, by the time this would come to market, the devices will be more than capable.

[1]: http://info.ee.surrey.ac.uk/Personal/Z.Kalal/tld.html

I think that it has an online-training algo for their 2-bit binary pattern as well. Haven't checked out the paper yet though.

Yeah, pruning and growing random forests (whatever that's formally called)

agreed, about the disconnect between state of the art and industry application of CV.

Also the applications are enourmous. Watch the end of the video for ideas. For example, it will be a huge enabler in robotics and some of the quadcopter applications you've been seeing.

Keep in mind that there's no report of the processing power required to do this in the video. It very well could be an algorithm that is extremely accurate, but at the expense of many CPU cycles. While it's obvious, remember that for use in games, you have to perform the detection and run the game in realtime. Whether or not this can be done with current hardware is what I'm interested in.

On the website he claims it's a fairly standard dual core setup and you can see the number of frames per second in the video. I noticed at one point it was staying around 15fps. It may not work at this point for an FPS (Especially because it would have to be done in conjunction with graphics and other game-related processes), but it would likely be fine with other less fast-paced games.

At least some of the processing could also be off-loaded to an accessory device. How much work is the Kinect doing vs. the actual 360?

Yesterday I bodged together something that's maybe half as good by gluing various OpenCV components together with numpy, and that runs at 15fps on an Atom netbook. I get the impression that this is just a lot more clever with the algorithms it uses, rather than specifically relying on the CPU grunt.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact