
Robots that learn - grey-area
https://blog.openai.com/robots-that-learn/
======
jorgemf
The real value of this is the algorithm of "one-shot imitation learning" (the
paper is here:
[https://arxiv.org/pdf/1703.07326.pdf](https://arxiv.org/pdf/1703.07326.pdf)
). The title about robots it is only to catch the attention of the media. The
domain is simple to show the idea, but it can be applied to more complex
domains once you know how to define it (which it is usually really complex).
The blocksworld domain is used because it has been used in automatic planning
for decades, and it is well know in research. It feels trivial to use only 6
blocks, but when you want to create an automatic plan of the steps to reach
the final position it is not that simple for the computers.

~~~
karpathy
+1 that the meta-learning spin on this approach is the really interesting
part. The normal approach would be as follows:

"you want to stack 6 blocks on one another? great, let me collect 1,000
examples of doing that in VR, and I'll train my policy on this and see how
that works"

instead, we change the question:

"you want to stack 6 blocks on one another? great, that's one possible thing
out of thousands you might want to do. so lets create a dataset of 1,000
examples of tuples: one 'query' demonstration, and a second demonstration as
the target behavior to train the network on, when it sees the query. The
training data is now 1,000 tuples of (query_demo, target_demo)), trained again
with supervised learning."

Once this is trained, we can sub in (in theory) any arbitrary desired
demonstration, and the network will learn how to "extract" what is intended,
and uses the demonstration as a crutch that is being imitated. It's a bit of a
change of mindset, but a very powerful one, much more general one, and much
more exciting one.

~~~
speby
karpathy, I see you are at Stanford for Deep learning and NLP... I'm working
on a project for audio/sound classification and have been sniffing around for
some folks who may have encountered a similar set of feature points for audio
data in deep learning. Would you be open to connecting? If so, let me know an
email or other way to contact you and I'll reach out.

~~~
IanCal
For some constructive feedback, this is a really awkward way of getting in
touch with someone. Assume they're busy and professional, you're sending them
a message to ask them to send you a message to tell you how to send them a
message...

If you want to contact someone, check their public profile on their website
and see if they've said there's a preferred way (some people want everything
to a certain email address, call them directly, never call them, flat out tell
you not to contact them or more commonly just say email them and get to the
point). Follow whatever they suggest.

Write something simple and clear, and be upfront about what you're asking for.
Make it as easy as possible for the person to help you (this applies to both
reading _and_ answering the question). I'm far more likely to reply if I can
open an email, type a sentence or two, and then move on.

With your message, I don't know if you're just after datasets, help with a
particular problem, a mentor, business partner or what. I also don't know what
area of audio/sound classification so if I was actually in that area then I'd
not know right now if I could help or not (whereas if you'd said human voices,
bird chirps, etc. I'd have a better idea).

Essentially, assume most people are pleasant and helpful but also extremely
busy.

------
wonderous
Reminds me of this robot that was trained in a simulation, then in real life,
"Autonomous Drifting using Machine Learning":

[https://m.youtube.com/watch?v=opsmd5yuBF0](https://m.youtube.com/watch?v=opsmd5yuBF0)

------
gallerdude
Imagine how revolutionary a robot would be that could simply walk around, pick
things up, and set things down.

"Herbert: grab my car keys." "Herbert: set the dinner table." "Herbert: put
the mail in the mailbox."

Obviously, cost will be a big deal, but I think we'll get there.

~~~
sillysaurus3
The central problem with this idea is noise and creepiness. Nobody wants a
noisy robot, and roboticists tend to produce creepy ones. Both problems might
be solvable, but I think robotics tends to focus on the technical problems at
the exclusion of the aesthetics. It's definitely a startup idea.

~~~
drzaiusapelord
I wouldn't consider this creepy.

[https://cdn0.vox-
cdn.com/thumbor/XV2kBAa21cI8yKgmfm90EwYalA0...](https://cdn0.vox-
cdn.com/thumbor/XV2kBAa21cI8yKgmfm90EwYalA0=/0x71:1111x775/1200x800/filters:focal\(489x200:693x404\)/cdn0.vox-
cdn.com/uploads/chorus_image/image/52599249/Retail___Dept_Store_B_rev2.0.jpeg)

Nor these:

[https://s-media-cache-
ak0.pinimg.com/236x/2b/87/e5/2b87e5634...](https://s-media-cache-
ak0.pinimg.com/236x/2b/87/e5/2b87e56348a9ec95296fd303206c96b2.jpg)

[https://s-media-cache-
ak0.pinimg.com/originals/2e/a5/fc/2ea5...](https://s-media-cache-
ak0.pinimg.com/originals/2e/a5/fc/2ea5fc68bc944d6d15aba9ac1be26cc3.jpg)

[https://3c1703fe8d.site.internapcdn.net/newman/gfx/news/hire...](https://3c1703fe8d.site.internapcdn.net/newman/gfx/news/hires/2016/11-robotsatcent.jpg)

[https://cdn.thisiswhyimbroke.com/images/kuri-home-
robot1.jpg](https://cdn.thisiswhyimbroke.com/images/kuri-home-robot1.jpg)

~~~
Razengan
Has anyone asked WHY robots NEED to have humanoid bodies?

I can see the case for familiarity, especially for robotic pets and attendants
for children, but the human body and its limbs are the result of millions of
years of evolution, in conditions that don't exist or apply anymore.

It had to adapt to being born naked and having to grow up, possibly on its
own, and feeding itself and fending for itself...

I mean, consider wheels vs. legs. You don't see the former anywhere in nature,
but it's the most efficient mode of locomotion in human society.

Why aren't we exploring more efficient robotic bodies that would also be
easier to program?

Say, a rotating trunk-like body with multiple octopus-like tentacles with
suckers instead of fingers. Just an idea. Or a robot that is actually a swarm
of ant-like machines.

~~~
animal531
There will be useful cases for robots of all body types. BUT we humans have
created a humanoid world where the human form makes sense. As such we'll need
robots that mimic that for certain functions.

For example wheels, they're easier to do that legs (by far) but gets stuck on
cables, carpets and stairs.

------
hackpert
This transition from end-to-end differentiable 'black box' systems to multiple
networks dedicated to certain tasks working in conjunction is interesting and
very probably the idea that will keep this field going. We might not fully
understand end to end systems in as much detail as we'd like to, but this
abstraction layer enables us to at least know what part is doing what,
empirically.

------
a_d
Is the vision network learning continuously or it has been trained with many
configurations of the blocks and gives a continuous output?

The post says that the imitation network takes the input from the vision
network and processes it to infer the intent of the task. Isn't the "intent"
always "to stack"? Or can the imitation also be just re-arranging blocks in
another configuration?

This part is interesting, if I understood it well.. > "But how does the
imitation network know how to generalize? The network learns this from the
distribution of training examples. It is trained on dozens of different tasks
with thousands of demonstrations for each task. Each training example is a
pair of demonstrations that perform the same task. The network is given the
entirety of the first demonstration and a single observation from the second
demonstration. We then use supervised learning to predict what action the
demonstrator took at that observation. In order to predict the action
effectively, the robot must learn how to infer the relevant portion of the
task from the first demonstration."

Does this mean that the imitation network has been trained on stacking, un-
stacking, throwing...and other such tasks, and then it identifies that
"stacking" is what is being done in order to imitate it?

Is there an ELI5 for what the 2 NNs are actually learning?

~~~
npew
The vision network is trained before-hand on lots of different configurations
in simulation and then used to infer the block locations in the image from the
camera. So it’s not learning continously. The imitation network takes the
block locations predicted by the vision network, together with the
demonstration trajectory in VR, and imitates the task shown in the
demonstration. So, it learns to look through the demonstration to decide what
action to take next given the current state (i.e. location of blocks and
gripper). To keep the setup simple, we only trained the imitation network on
stacking tasks (so no unstacking, throwing, etc). In future work, we want to
make the setup and tasks much more general.

~~~
a_d
Thanks for the explanation. Can you also explain the significance of "one-shot
imitation learning" generally (beyond the context of this experiment)?

------
Houshalter
The fact that NNs can generalize from simulated data to real data is very
interesting. We can generate tons of simulated data of stuff like driving in
simulators or even video games like GTA. And do experiments that would be way
to dangerous or costly to perform in real life. It's not good when the first
iteration of a reinforcement learner crashes the car at 60 miles per hour!

You can then add tons of randomization to the simulation to make sure it
doesn't overfit to the particulars of the simulated data. Like random filters
on the input, moving the camera around and vibrating it, making cars and
pedestrians behave unrealistically erratically, etc, or having sensors fail.
If it can learn to handle these extreme situations in the simulators,
hopefully it would be generalize even to rare scenarios that occur in real
driving.

------
blazespin
This, IMHO, is the real revolution of the one-two punch of AI/VR. Training
robots in VR and then using them to produce in a real time environment that
maps well to the VR env.

Humans interact in the VR env (with oculus/vive/etc) to train / configure the
robots and assembly line.

Once complete, the factory line is built IRL.

~~~
lefnire
With current VR content, these robots will be damn fine archers.

------
tshadwell
'we've taught this robot to move small uniform blocks and we're going to make
it perform arbitrary complex tasks on a variety of objects' sounds a lot like
'i'm trying to draw the mona lisa and I got her eyebrow down really good'

------
porter
can anyone recommend a good intelligent robot arm kit for building myself?

~~~
dbcurtis
Depends on your goals and budget. I've seen a lot of cheap hobby servo based
arms. Precision is awful, payload is miniscule. OTOH, you will learn why
precision is important :) and you will learn the software of arm motion
planning.

The next step up in cheap arms is to build around Bioloid/Dynamixel style
servos, which will increase the budget significantly, but you will end up with
useable but coarse precision. Meaningful payloads will still cause servos to
overheat but at least those brands will shut down rather than smoke --
usually.

If you want to do serious research you will need a serious budget. Arms are
hard. This is not to say you can't have a lot of fun and learn a lot with
something less. A friend build a smart task lamp around Bioloids -- the goals
was a lamp that would autonomously aim a work light where he was soldering by
following the soldering iron tip using CV and a cheap web cam. This is totally
within the payload and precision limits of hobby servos, and the software can
run on a RasPi.

------
tomdre
What are the benefits of using VR vs simulation or precollected training data?
Maybe I'm missing something...

~~~
npew
We actually use both. The data used to train the neural networks is all
generated through a scripted policy simulation. We then use a single human
demonstration in VR to show the robot how to carry out a particular task. VR
lets a human do a demonstration quite naturally, and would also be an
effective way to collect training data for more complex tasks, where we’re not
be able to create a scripted policy.

~~~
tomdre
Thanks for your reply. So essentially VR doesn't affect the training phase per
se, but simplifies the life of the trainer.

------
zardo
Looks like "Blocks World" with learning.

Can you out stack SHRDLU?

------
deepnotderp
More awesome stuff from OpenAI!

------
briga
Imagine if you went back 50 years and told Terry Winograd or Marvin Minsky
that in 50 years we'd still be trying to figure out how to get robots to stack
blocks on top of each other. They'd think you were nuts.

~~~
ThomPete
On the other hand we have done things they wouldn't even imagine.

~~~
crush-n-spread
I don't think so, I'm sure they imagined the internet in some abstract way,
and smartphones really are quite a step backward if you want to quote that.
Search is also quite imaginable.

We have had hundreds of thousands of software engineers working in advertising
companies or making hardware that gets used to sell products and keep the
attention of the youth perpetually captured, hindering their growth as humans.
I say we've regressed in a big way and they would think the same.

Check out Alan Kay's talks from this week if you still feel good about today's
software industry.

~~~
romaniv
You mean this one?

[https://www.youtube.com/watch?v=ZDM33CMJvp8](https://www.youtube.com/watch?v=ZDM33CMJvp8)

[https://www.youtube.com/watch?v=DIR6Rmhm3To](https://www.youtube.com/watch?v=DIR6Rmhm3To)

(I'm trying to watch all of his recent talks. Despite [or because of?] being
critical of the industry, they are some of the most inspiring videos I've seen
in years.)

------
hagakure0c
Waiting for the first IKEA assembly robot that will follow you home, setup
your new furniture and then return to base.

------
nojvek
I have to say openai website is very well designed. Love those gradients.

~~~
Houshalter
Gradients are their speciality.

------
Candles123
Is this 'learning' or just 'copying'?

~~~
kreutz
what is the difference?

~~~
backpropaganda
And I was enlightened.

------
known
"Show me the code" \--Linus

------
pplonski86
Does Tesla use this in their factory?

------
11thEarlOfMar
Now.

Make me a peanut butter and jelly sandwich.

~~~
rl3
It's not long before a huge chunk of food service jobs will be automated away
by robotics, and that will likely include sandwich making.

~~~
H1Supreme
That's all I want a homebot for. "Homebot, make me some Gordon Ramsey level
Beef Wellington". Beep boop beep, "Ready in 40 minutes".

