
Teaching Robots to Understand Semantic Concepts - rey12rey
https://research.googleblog.com/2017/07/teaching-robots-to-understand-semantic.html
======
skywhopper
The headline seems to misuse the word "semantic" (not to mention
"understand"). Does the door-opening robot now understand how to open all
hinged doors with a similar opening mechanism? Or was it just trained to
imitate a sequence of changes in a 2D image from a fixed angle? Can the same
software and robot also be taught to open windows? Boxes? We are talking about
"semantics" explicitly here. Does it understand "open" versus "closed" for
these different types of closures/portals?

I don't want to discount the value of this research. It's absolutely necessary
to do this sort of basic proof-of-concept testing of these ideas. But the
claim being made implicitly here is way beyond what's actually going on. The
software understands nothing, and the "semantics" extend to simple image-
matching of objects, but there's no deeper meaning associated with the labels,
so I think calling that "semantics" is a major stretch.

This approach is not going to teach a robot how to pick fruit, or serve food,
or clean floors anytime soon. In the best case where this is even a workable
approach, research like this is just the first of millions more tiny steps
along the path. Anyway I think it's naive to assume that a good way to
approach automation is to write software to let robots learn by watching
humans do the desired task. As cool as that sounds, chances are that approach
would ultimately be a massively inefficient way to solve the problem. It'd be
like trying to invent the automobile by building a steam-powered horse robot
that can tow carriages. The critical purpose is being overlooked in favor of a
cool-looking but totally impractical toy demo.

~~~
suyash
That seems to be what they are aiming towards but true AI is not the same as
Machine Learning.

Google is still using what we could call a very rudimentary form of AI as they
describe "Unsupervised learning on very small datasets is one of the most
challenging scenarios in machine learning. To make this feasible, we use deep
visual features from a large network trained for image recognition on
ImageNet".

------
QAPereo
To reference an earlier article on HN, this reads like a future of, "Robot,
observe these field workers picking fruit for a week. Now practice in this
field for a day. OK, now the job is yours.

What I can't tell at all from this article is whether that day is years or
decades away.

~~~
Swizec
> whether that day is years or decades away

Yes, it's both. Depends on the task.

Afaik we can already program industrial robots by showing them what to do.
Robot records movement in its actuators, then keeps replaying over and over.

And we already have machine learning agents that can observe your behavior and
learn which news stories are "good" news stories and which aren't. (algo
newsfeeds). You could use them to observe a magazine editor and after a few
issues, you'd have a robo editor.

~~~
webmaven
That's not "showing them what to do" it is more along the lines of "hand-
holding".

------
richard___
Summary--

Semantics part: Seems like the idea is we can "transfer" knowledge from prior
labeled samples so that we don't need to do as much new work labeling sample
images with semantic labels.

Grasping part: "Emulating human movements with self-supervision and
imitation." High-level imitation based on visual frame differences avoids
needing to manually control actuators. Not sure how this works exactly

Two-stream model: ventral network asks What class, dorsal network asks Is this
how we should grasp this object. The benefit is that we can make use of all
the automatically generated (robot-generated) grasping data without having a
human supervise all that automated grasping, e.g. "This process is a
successful way to pickup this object, and also this object is an apple." The
ventral network ties back this the grasping data (without object labels) to
object labels, which allows for semantic control of the trained robot e.g.
"Pickup that apple".

