Hacker News new | comments | show | ask | jobs | submit login

>This is massively ground breaking.

Sounds like you're used to bad algorithms. I think there is a serious disconnect between the state of the art in computer vision and what's used in industry.

The demo was cool, but the techniques are not that revolutionary. From a cursory glance through the papers, it is basically AdaBoost (for detection) and Lucas-Kanade (for tracking), with a few extensions.

Not to discount the guy's work at all, it's very cool and does a good job of pulling together existing algorithms. But not groundbreaking in the sense of, say, Viola-Jones was for object detection.

Valid comment but its analogous to claiming peanutbutter cups aren't novel because peanut butter and chocolate were both well known. There is novelty in being able to synthesize new systems from known elements which frankly I don't believe gets quite the credit it deserves. But that is just me.

The point is of course that being broadly familiar with a number of things can help you put together a novel thing out of a previously unknown combination of those things.

Spot on.

There's a lot of current work going on that effectively splits computer vision into multiple parallel tasks for better results but uses previously well known techniques (PTAM is another good example).

As an aside, I read through the paper and it doesn't look like this could track, say, your index finger separately from other fingers if, for a moment, your hand was occluded. This pretty much bars using this exclusively in a Minority Report style interface (you would need hand pose tracking like the stuff Kinect does). Though, I'm just re-iterating your point that this isn't the second coming of computer vision.

That being said, there are some really good ideas here.

I don't understand why everyone seems to have such a hardon for Minority Report-style systems. Gorilla arm pretty much rules that out from the start, and a tablet is more natural anyway.

A trackpad with a separate screen would be optimal (so you don't have to look at your hands).

Gorilla arm would prevent people from using that kind of system to replace mouse and keyboard, but I don't see why it could not work for some applications. I can think of several use cases where I would like a UI that does not require me to touch the hardware (think cooking or watching videos in the bath tub).

Minority Report is tremendously well-remembered as an interface concept and almost forgotten as a Spielberg movie. The Wikipedia article's longest section is "Technology."

By now, it's practically a 145-minute tech concept video with a plot starring Tom Cruise.

Yes using 2d displays in minority report was a huge mistake, but imagine it in 3d. Also that doesn't mean that you will have to keep your arms out before your eyes. Ideally you don't have to sit before your computer the whole day and use only a keyboard and a mouse, when you can have so much more freedom. Think of opening a book, ironing, or placing lego blocks etc.

Tony Stark, Iron Man: JARVIS --- that was his greatest invention - without JARVIS he couldn't have made his later generation suits.

since you're familiar with the topic, does this look lightweight enough for say, mobile applications, or does it require massive processing power?

According to his website [1] for this, he says that "TLD has been tested using standard hardware: webcam, Intel Core 2 Duo CPU 2.4 GHz, 2 GB RAM, no GPU processing is used and runs in a single thread. The demands of the algorithm depend on required accuracy of the algorithm. Implementation for mobile devices is feasible." in response to "What kind of hardware it was running on?"

So, according to him, it is lightweight enough to run on mobile devices. I'd imagine there are also several optimizations that can be done (leveraging multi-core chips or GPUs, for instance) to make the performance significantly better than the prototype he's demonstrating now. Also, taking into account Moore's Law, we may not be able to run this on today's mobile devices, but surely could on tomorrow's. Given that research is generally a few years ahead of industry, I would expect that, by the time this would come to market, the devices will be more than capable.

[1]: http://info.ee.surrey.ac.uk/Personal/Z.Kalal/tld.html

I think that it has an online-training algo for their 2-bit binary pattern as well. Haven't checked out the paper yet though.

Yeah, pruning and growing random forests (whatever that's formally called)

agreed, about the disconnect between state of the art and industry application of CV.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact