
How to build a robot that “sees” with $100 and TensorFlow - nogaleviner
https://www.oreilly.com/learning/how-to-build-a-robot-that-sees-with-100-and-tensorflow?twitter=@bigdata
======
bernardopires
Just a nit, but the author keeps talking about object recognition while what
he was actually doing is image classification. Object recognition actually
consists of two tasks, one is classifying the object (this is a beer bottle)
and the other is also says where in the image the object is. Additionally it
can/should detect multiple objects in the image. This is a more complex than
classification, which only associates one category with the image.

~~~
dharma1
[http://pjreddie.com/darknet/yolo/](http://pjreddie.com/darknet/yolo/),
[https://github.com/daijifeng001/MNC](https://github.com/daijifeng001/MNC),
[https://bitbucket.org/aquariusjay/deeplab-public-
ver2](https://bitbucket.org/aquariusjay/deeplab-public-ver2) or similar should
do the job. Choose depending on how fast it needs to be, and how accurate the
segmentation boundaries need to be

~~~
oxymoron
Here's another one:
[https://github.com/facebookresearch/deepmask](https://github.com/facebookresearch/deepmask)

------
rbanffy
> recognizing arbitrary objects within a larger image has been the Holy Grail
> of artificial intelligence

The Holy Grail is general AI. Recognizing objects is a side quest, perhaps a
required step, but, by no means, the end goal.

~~~
taneq
Like so many other things in the field of AI, general object recognition was
the "holy grail" because it was assumed that it required AGI. Now we've
figured out a way to do general object recognition without AGI.

~~~
argonaut
Is there a story somewhere of AI researchers concluding general object
recognition was the holy grail of AI?

I get that a lot of people downplay achievements in machine learning by saying
it's nothing like AGI, but it's almost a meme now that "once upon a time
everyone thought that was the holy grail and they're moving the signposts"
even when 1) nobody thought that, or 2) some people thought that and some
people didn't think that.

~~~
MrQuincle
I - for what it is worth - would still say general object recognition, with
the emphasis on general, is indeed the holy grail.

The ability to recognize objects like people do is not properly represented by
current benchmarks. I can imagine that you can built a perfect robotic "bird
spotter" but if you put that in a self-driving car I would not be surprised if
it stops for something that's just a shadow, or if you put it on a humanoid
it's unable to distinguish its own hand from that of its clone. Imagine two of
them cleaning out the dishwasher. :-)

A lot of AI is still working only in lab conditions or restricted application
domains. That's why I consider robots and cars so important in driving AI
towards the "general" dimension.

~~~
MrQuincle
Now a nice article that addresses some limitations of vision:
[https://medium.com/@andrewt3000/tesla-mobileeye-and-deep-
lea...](https://medium.com/@andrewt3000/tesla-mobileeye-and-deep-
learning-b7ceb8828482#.eqkz0yu62)

------
icemelt8
This was amazing, I am amazed at your command of both hardware and software
technology. Even as a Software Engineer, I have a hard time trying to make
TensorFlow do something for me.

~~~
dharma1
To do the task in the article (classify images, pretrained model) - it's
pretty easy - just follow the tutorial here:

[https://www.tensorflow.org/versions/r0.10/tutorials/image_re...](https://www.tensorflow.org/versions/r0.10/tutorials/image_recognition/index.html)

~~~
sidarape
I tried that
([https://www.tensorflow.org/versions/r0.9/how_tos/image_retra...](https://www.tensorflow.org/versions/r0.9/how_tos/image_retraining/index.html))
which seems to work well.

~~~
dharma1
yep, that's good if you want to retrain a pre-trained model for specific
categories on new image data (that are pretty close to imagenet type images)

~~~
sidarape
What do you mean by "pretty close"? What would be "not close"?

~~~
dharma1
Something that looks very different to the ImageNet dataset. Microscope,
ultrasound, satellite, x-ray etc images

~~~
sidarape
Ok, I see. Thank you.

------
urvader
I would like to know how long it "thinks"\- it is clear the camera is paused
for a while while the robot parses the image..

~~~
lukas
It thinks for around 3 seconds. It doesn't use the PI's GPU - I bet that if it
did it could get a lot faster. I bet you could monkey around with the compiler
flags and speed it up. If anyone working on TensorFlow has some ideas, I'd
love to hear them.

------
salex89
My biggest current question is which keyboard is this, on the image in the
article?!

[https://d3ansictanv2wj.cloudfront.net/Figure_2-985cd20ea0c0b...](https://d3ansictanv2wj.cloudfront.net/Figure_2-985cd20ea0c0bcd68a7d896f11312edf.jpg)

~~~
cezor
Looks like Karnotech Foldable Silicone Keyboard

~~~
salex89
Looks like you are right. Based on the image I thought it was something
mechanical. Looks nice, but I'm not fond of foldable keyboards :( .

------
visarga
Great project. Locomotion and vision are pretty advanced compared to grasping
and complex handling of objects. If we could have a workable arm, it would be
much more interesting in applications.

~~~
Animats
There are some affordable robot arms. uArm has a hobbyist-level arm. I have
one, and I made a 6DOF force sensor out of a 3Dconnexion SpaceNavigator on the
end and got it to talk to ROS. I want to hook a classifier system up to it so
it can use tools like end wrenches and get them onto a bolt by feel.

Robot manipulation in unstructured situations still sucks, though. Willow
Garage was making progress, but didn't last.

------
dharma1
Dis the author publish a repo for this? It's easy getting tensorflow going for
basic image classification but the hard part is actually making the robot move
in a way that makes sense - using the camera and the sonar data to make
decisions and then drive the motors. Or is this not autonomous?

~~~
OilDerek
"I then built a simple Python webserver to spin the wheels of the robot based
on keyboard commands that made for a nifty remote control car."

So, not autonomous it would seem. With that and an arm, though, you could
eventually get it to play fetch...

~~~
dharma1
Ah, right. You could do this project with a cheap RC car and a phone running
TF stuck on it

~~~
ultrasounder
Since when did TF started to run on stock phones?

~~~
dharma1
A few months ago.

[https://github.com/tensorflow/tensorflow/tree/master/tensorf...](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/ios_examples)

[https://github.com/tensorflow/tensorflow/tree/master/tensorf...](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android)

------
criddell
This reminds me of a low res vision system I read about 20 years ago:

    
    
        http://www.seattlerobotics.org/encoder/jan97/lowresv.html
    

I've always been kind of intrigued by what is possible with very simple
hardware.

~~~
gugagore
Do you know what ever happened to the Encoder? I used to be so excited as a
kid when a new issue came out.

~~~
criddell
No I don't.

It was a fantastic resource and I bet it was a lot of work to put one
together. I sure appreciated the people that took the time to make it.

I always wanted to find some way to get to one of their meetings but it never
worked out.

------
nojvek
Oh my god. You are trying to build the exact thing I am trying to build.
Albeit you've made much more progress.

I'm still soldering wires into the motors. You should take off the paper from
acrylic. The transparent effect makes it look awesome.

My goal is to make a raspberry pi bot that plays indoor fetch. I would love to
have a chat with you.

------
forgotAgain
Sorry for the off topic but is anyone else getting very high cpu usage from
O'Reilly websites? Any known resolution or work around?

With Chrome developer tools I see one error: "Uncaught SecurityError: Failed
to read the 'localStorage' property from 'Window': Access is denied for this
document."

