
Predator Object Tracking Algorithm - helwr
http://www.gottabemobile.com/2011/04/01/predator-object-tracking-algorithm-the-future-of-computer-interface/
======
6ren
The key thing seems not to be the specific algorithm, but the idea of using
images obtained during performance for training - an algorithm that can do
that. An early prototype algorithm, with lots of room for tweaking - and there
are likely radically different learning algorithms, as yet untried or
undiscovered, that work better. It seems that in the past, performance images
has been religiously separated from training image.

It reminds me of early approaches for robot walking, which tried to plan
everything out, and more recent approaches of incorporating feedback - which
turned out to be much simpler and work better. Sort of waterfall vs. agile.

It seems a tad unreliable (his "mouse pointer" was lost a few times while
still on screen), but this is still a prototype. It's really impressive how
the panda was tracked in 360 orientations - probably helped by the distinctive
colouring.

New input devices (this, kinect, multi-touch) and applications that can really
use them, may be a main source of disruptive innovation in computers for the
next decade or two.

~~~
jhuckestein
The tracking was incredibly good. The first seconds of each example are a
little off because the algorithm doesn't have ANY training data to work with.
It's 100% ad-hoc.

The tracking improved during the mouse pointer example (which I found
incredibly impressive). The point he was making during that example was that
it learns the different scales/rotations of objects on the fly and tracking
improves automatically.

~~~
6ren
Before commenting, I viewed it a second time, pausing it several times (it's
at 1:20, if you'd like to confirm the following). After about 10 seconds of
training data (500 frames), he starts to draw with it. It loses tracking four
times (not counting the two times his hand goes off-screen), and it's not
obvious to me why - there don't seem to be major rotations or changes of his
finger placement. In practical use, that would be annoying.

btw: my comment originally included praise for his work - but I thought his
merit was obvious and that it distracted from my comment's point, so I deleted
it. Instead, I'll just note that the first telephone also had lots of room for
improvement - practically, the first of anything does. The cool thing was the
idea of the telephone, and then _making it real_. How good it was is
irrelevant compared with that it became real. It doesn't take away from the
immense task of doing something that had not been done before or even
imagined. Quality is not as crucial, because once you have the basic thing,
it's (relatively) easy to iterate to improve it. I think having the idea and
making it real deserves far _far_ greater admiration than the quality of the
prototype algorithm and implementation. Just as with the telephone.

------
d2
This is massively ground breaking. You'll get it if you've used motion
tracking on several game interfaces and had to make perfectly white
backgrounds with bright lights to make it work. This is incredibly accurate -
really game changing stuff.

~~~
jallmann
>This is massively ground breaking.

Sounds like you're used to bad algorithms. I think there is a serious
disconnect between the state of the art in computer vision and what's used in
industry.

The demo was cool, but the techniques are not _that_ revolutionary. From a
cursory glance through the papers, it is basically AdaBoost (for detection)
and Lucas-Kanade (for tracking), with a few extensions.

Not to discount the guy's work at all, it's very cool and does a good job of
pulling together existing algorithms. But not groundbreaking in the sense of,
say, Viola-Jones was for object detection.

~~~
newhouseb
Spot on.

There's a lot of current work going on that effectively splits computer vision
into multiple parallel tasks for better results but uses previously well known
techniques (PTAM is another good example).

As an aside, I read through the paper and it doesn't look like this could
track, say, your index finger separately from other fingers if, for a moment,
your hand was occluded. This pretty much bars using this exclusively in a
Minority Report style interface (you would need hand pose tracking like the
stuff Kinect does). Though, I'm just re-iterating your point that this isn't
the second coming of computer vision.

That being said, there are some really good ideas here.

~~~
StavrosK
I don't understand why everyone seems to have such a hardon for Minority
Report-style systems. Gorilla arm pretty much rules that out from the start,
and a tablet is more natural anyway.

A trackpad with a separate screen would be optimal (so you don't have to look
at your hands).

~~~
fjh
Gorilla arm would prevent people from using that kind of system to replace
mouse and keyboard, but I don't see why it could not work for some
applications. I can think of several use cases where I would like a UI that
does not require me to touch the hardware (think cooking or watching videos in
the bath tub).

------
ChuckMcM
As this doesn't seem like an April fools joke (some of the papers were
published last year :-)) its interesting to think about it in the context of
what it might change. That being said I don't doubt for a minute that the
university has locked up as much of the technology as possible in patents but
that is another story. We can speculate about what it will be like in 20 years
when people can do this without infringing :-)

Clearly it could be applied immediately to robotic manufacturing. Tracking
parts, understanding their orientation, and manipulating them all get easier
when its 'cheap' to add additional tracking sensors.

Three systems sharing data (front, side, top) would give some very good
expressive options for motion based UIs or control.

Depending on how well the computational load can be reduced to hardware small
systems could provide for head mounted tracking systems. (see CMUCam [1] for
small)

The training aspect seems to be a weak link, in that some applications would
need to have the camera 'discover' what to track and then track it.

A number of very expensive object tracking systems used by law enforcement and
the military might get a bit cheaper.

Photographers might get a mode where they can specify 'take the picture when
this thing is centered in the frame' for sports and other high speed
activities.

Very nice piece of work.

[1] <http://www.cs.cmu.edu/~cmucam/>

~~~
extension
_Depending on how well the computational load can be reduced to hardware small
systems could provide for head mounted tracking systems_

He's clearly developing it on his laptop with a shitty webcam. That's why this
is amazing. Screw robotic manufacturing, this is for my phone.

~~~
newhouseb
Phone != Laptop.

It says he's running on a "Intel Core 2 Duo CPU 2.4 GHz, 2 GB RAM" according
to his website. As a good rule of thumb, computer vision runs about an order
of magnitude slower (10x) on a phone (like an iPhone) than on a
desktop/laptop.

Also - a crappy webcam actually makes things computationally easier because
there's less data to deal with. In a lot of computer vision algorithms the
first step is to take input and resize it to something that can be computed on
in a reasonable time frame.

~~~
extension
_It says he's running on a "Intel Core 2 Duo CPU 2.4 GHz, 2 GB RAM" according
to his website_

I bet he isn't using the GPU though.

 _Also - a crappy webcam actually makes things computationally easier because
there's less data to deal with_

Perhaps, but lens distortion, motion blur and a rolling shutter don't make
things easier.

Anyway, the inventor himself claims a phone implementation is feasible.

~~~
newhouseb
Yep, I'm sure he isn't. I don't doubt that you could optimize this algorithm
to run on a phone but that takes an insane amount of effort and expertise and
is a feat in and of itself. The word lens guys, for example, spent about a
year porting from an optimized C implementation on i386 to ARM for the iPhone
- they even initially used the GPU but decided that the overhead of shuffling
data between buffers wasn't worth the advantage gained by the iPhone's measly
GPU (which only had 2 fragment shaders at the time I think).

Also, completely agree on how camera blur would worsen the accuracy of said
algorithm, I was trying to point out that it would run faster on a lower
quality camera (with the caveat that it might not work nearly as well).

------
sbierwagen
Interesting that TFA mentions "Minority Report-like interfaces" several times
when: 1.) The Minority Report interface is the canonical example of a UI that
is very impressive visually, and is beautifully mediagenic; but is hideously
fatiguing and impractical in a real world scenario. (Hold your hand out at
arm's length. Okay, now hold that pose for eight hours.) 2.) The MR UI has
actually been commercialized, and has entirely failed to take the world by
storm.

Also, computer vision demos are trivially easy to fake, and it's even easier
to make an impressive demo _video_. You can have the guy who invented it spend
a couple hours in front of the camera trying it over and over, then edit it
down to three minutes of the system working perfectly. It wouldn't be nearly
as impressive when you have an untrained user trying it live, in the field.

------
mrleinad
From his webpage at Surrey: "We have received hundreds of emails asking for
the source code ranging from practitioners, students, researchers up to top
companies. The range of proposed projects is exciting and it shows that TLD is
ready to push the current technology forward. This shows that we have created
something "bigger" than originally expected and therefore we are going to
postpone the release of our source code until announced otherwise. Thank you
for understanding."

Also, the message where he states the source code is under GPL 2.0
dissapeared. Seems that he chose to leave Richard Stallman empty handed and go
to the dark side.

~~~
endian
Did anybody get a GPLv2'd copy?

~~~
mrleinad
Here you go: <http://news.ycombinator.com/item?id=2411459>

------
sp332
_With something like this we could have truly “Minority Report” style human-
computer interface._

Actually, the guy who invented the Minority Report interface commercialized it
and has been selling it for years. Product website: <http://oblong.com> Edit
better video:
[http://www.ted.com/talks/john_underkoffler_drive_3d_data_wit...](http://www.ted.com/talks/john_underkoffler_drive_3d_data_with_a_gesture.html)

~~~
extension
Predator doesn't need gloves

~~~
sp332
Well, he mentioned a "Minority Report"-style interface, and there it is. At
least they could use cooler gloves:
[http://singularityhub.com/2010/05/28/mits-ridiculously-
color...](http://singularityhub.com/2010/05/28/mits-ridiculously-colorful-
glove-is-the-latest-hand-tracking-interfacevideo/) :)

------
jallmann
Technical details here, with links to relevant papers at the bottom.
<http://info.ee.surrey.ac.uk/Personal/Z.Kalal/tld.html>

~~~
julianc
Unfortunately you cannot download the source code, the link is disabled. And
the GPL license he says he is using requires that the source code must be
available for download without any restrictions like "send me an email" or
"create an acount".

Edit: I sent him an email :)

~~~
nolanw
He's licensing the code to _you_ under the GPL. He's free to use his own code
however he likes.

~~~
reemrevnivek
...and you're free to redistribute it!

~~~
nolanw
Consider it done.

~~~
reemrevnivek
Done? Done where?

------
dotBen
Ok so the fact that he has produced this himself, using off-the-shelf
commodity laptops etc is really great.

But this technology doesn't seem new to me - technology already exists for
surveillance cameras in police and military helicopters to track an object
like a car and keep it in vision as the helicopter turns and maneuvers.

Likewise, facial recognition - both statically and within a video stream -
isn't new either.

Not taking anything away from the guy, but just wondering what it is I'm not
getting that is new/amazing with this particular implementation?

------
BoppreH
The face recognition part was _too_ good for not picking up the face of other
people. Or was it detecting just the _most_ similar face?

But facial recognition aside, the uses are endless. If it can be brought to
the same level Kinect drivers are at, but with _finger tracking_ and _no
custom hardware_ , this could change everything.

------
pyrhho
Bah! I was hoping to download the source (from here:
<http://info.ee.surrey.ac.uk/Personal/Z.Kalal/tld.html>) and check out his
algorithm, but he requires you to email him with his project. If anyone knows
how the algorithm works, or where it is described in detail, I'd love to read
that!

Absolutely amazing stuff!

~~~
mrleinad
What if you just... email him and ask for it?

~~~
pyrhho
Don't really have a project, just curiousity...

~~~
ElliotH
I certainly intend to drop him an email and see if he's willing to share his
stuff with an interested compsci undergrad. It's always worth asking - he's
probably just doing it for metrics.

~~~
Devilboy
Let us know what happens please, I'm also interested.

------
donnyg107
Every time something like this comes out, I feel us taking a step away from
"video camera mounted on a robot where the eyes should be" and a step toward
real perception. I always wonder though, if a computer can one day recognize
all different types of hands, could it draw a new one?

~~~
Devilboy
To answer your question you can watch this presentation by Prof Hinton:
<http://www.youtube.com/watch?v=AyzOUbkUf3M>

He shows how he trained a restricted bolzmann machine to recognize handwritten
numbers and how he can run it in reverse as a generative model, in effect the
machine 'dreams' about all kinds of numbers that it's not been trained on but
nonetheless makes up properly formed legible digits.

------
direxorg
World becoming a better place with such code available for public to be built
up on and not only to military in homing heads. I guess it is one point for
"Make something free that was initially available for pay?" Just like "plenty
of fish" doing... <http://www.vti.mod.gov.rs/vti/lab/e-tv.htm>

------
exit
_> Can Predator be used to stabilize and navigate a Quadcopter?_

 _> That is not straightforward._

anyone know why not?

~~~
extension
Probably because this is just 2D tracking rather than 3D mapping. But tracking
can be applied to mapping, for example: <http://www.robots.ox.ac.uk/~gk/PTAM/>

So the question is, can predator be used to improve mapping? AFAIK, that would
require a) automatically selecting trackable objects and b) tracking many of
them simultaneously. That PTAM technique tracks thousands, but with tracking
this reliable, you might get by with much less.

So, more work is required to apply it to mapping, but I have to imagine it
could be done. And seeing how well predator adapts to changes in scale,
orientation, and visibility, I suspect it could improve mapping considerably.

~~~
y0ghur7_xxx
"AFAIK, that would require a) automatically selecting trackable objects and b)
tracking many of them simultaneously."

I'm not really sure I understood you, but this two problems are already
solved. Hugin[1] for example has automatic control point generation for photo
stitching. Were you talking about something else?

[1]<http://hugin.sourceforge.net/releases/2010.4.0/en.shtml>

~~~
extension
Yeah, there are many ways to detect features but I haven't read the paper yet
and I don't know what kind of features it wants and if there are any problems
with choosing them automatically. Like, can it group features into distinct
objects without a human pointing them out?

------
elvirs
The video where system tracks Roy from IT Crowd sucking his fingers is epic:)
<http://www.youtube.com/user/ekalic2#p/u/2/tKXX3A2WIjs>

------
giardini
It must be shown what to track. That is, you (or some other external system)
define the "object" to be tracked by clicking on a bounding box.

A good addition would be an algorithm that automatically delineated "objects"
in the visual field, then passed them to Predator.

Which raises another question: how many "objects" can Predator simultaneously
track (with given horsepower)?

------
helwr
Here is the code: <https://github.com/abelsson/TLD>

------
chops
Wow, this is pretty amazing stuff. I sincerely hope this guy makes a pile of
money off this.

------
motters
This looks impressive. I've written tracking systems previously, so can
appreciate that it's not an easy problem to solve.

------
bossjones
Extremely impressive. Can't wait to see how this is applied to everyday
problems. Kudo's to this gentleman.

------
marcomonteiro
This looks awesome! I want to build this into apps for iPad 2.

------
Tycho
Uhhh... 'Predator?' What's his next project, SkyNet Resource Planning? This
seems like an April fools to me. I mean I'm sure he's done work in the area...
but the article is dated April 1 and the previous literature didn't mention
'Predator.' I could be wrong, but it seems too advanced, and scary.

~~~
Tycho
So nobody else found it suspicious that the source code was promised but not
actually available. And now not even promised.

