
Learning a hierarchy - gdb
https://blog.openai.com/learning-a-hierarchy/
======
canjobear
It seems to me there's been an interesting turn in AI recently, toward
focusing on adaptability as a goal in itself. Deep learning has shown that
there is incredible power in stochastic gradient descent over a space of
functions, but so far that has mostly been applied to rigid tasks. Now work
like this is about turning that power towards adaptability itself as a goal,
and it seems to me that this brings us towards "real" intelligence.

The logical extreme of this thinking would be agents that actually maximize
entropy of future actions as the only objective function, like in [1]

[1]
[http://paulispace.com/intelligence/2017/07/06/maxent.html](http://paulispace.com/intelligence/2017/07/06/maxent.html)

~~~
ionforce
Is this like maximizing for movement options in a Chess AI?

~~~
Retra
This reminds me of a thought I had some years ago. The idea was that we can
think of general intelligence not as an optimization for a specific given
goal, but as optimization for a special position from which some wider set of
goals can be most rapidly converged upon. Thus optimizing for future
flexibility rather than current results.

At the time, I remember being excited to hear of a physics paper[1] concerning
an inverted pendulum, where they solved the system for some dynamic forces
which would keep the system at the position of maximum instability, and
claimed that it was, in some sense, a description of dynamic intelligence. The
analogy there is that this is the unique position from which the pendulum can
be efficiently made to move quickly in any 'required' direction (the 'goal'.)

I still think that idea has some merit, but putting together a coherent
formalization of it seems really tricky and requiring some genius far beyond
my own meager pondering.

[1] I found the article:
[https://physics.aps.org/articles/v6/46](https://physics.aps.org/articles/v6/46)

~~~
ewjordan
As a step in that direction, you can take some inspiration from what the brain
does: as you learn things better and better, the knowledge essentially gets
pulled down to neuronal layers that are closer to sensory input. This leaves
the higher layers more free to do other stuff (potentially reusing the results
from the surface layers), which is a step in the direction of optimizing for
future flexibility.

It's possible to create rules that operate on an already-trained network and
push it in this direction without totally destroying what it's learned, by
"fuzzing" the original network to generate a bunch of input/output pairs, and
then using that dataset to retrain smaller sub-networks. For instance, if you
have a 5 layer network that you've trained on a classification task, you can
often use that network as a teacher to train a smaller network to do pretty
damn well on the same classification task, even in some cases where training
the smaller network directly would have been very difficult. There are several
reasons that this trick can work, not the least of which is that in a sense it
is a way to expand the training set dramatically.

NB: the above approach is probably not how you'd implement this, there are
less crude methods to incentivize shallower levels to have more activation
than deeper ones that would probably work better

I can easily imagine a phased training strategy that oscillates between a)
learning new things by making the deeper layers more malleable and the
shallower ones fairly rigid, and b) compressing all the data by opening up the
shallow layers to change and replaying input/output into itself. I have no
idea if there are any benchmarks around this sort of thing, though, typically
benchmarks have fixed goals so ability to retrain for additional tasks is not
really measured.

------
anon404123
super cool that this was done by a high schooler

~~~
akhilcacharya
More discouraging to me to be completely honest.

~~~
fjsolwmv
Why have a whole humanity if you only think a single best person has value?

~~~
anon404123
"It is not enough that I should succeed - others should fail."

~~~
akhilcacharya
No it's not that...as tempting as that is often...

It's that the spoils of the new economy are accumulating in a way that
completely forgets the middle 90% of the country. Kevin is obviously really
smart, but has access to things I don't even have in a state school by virtue
of being a sharp high schooler in Palo Alto, much less when I was in high
school.

~~~
nostrademons
Life is long. If his location gives him access to opportunities that you don't
have, figure out a way to get access to those opportunities and execute on it
once you graduate from college. Many prominent Silicon Valley people came from
small towns in the mid-west (Marc Andreessen, Evan Williams) or immigrated
from poor political situations abroad (Sergey Brin, Jan Koum, Elon Musk).

~~~
LrnByTeach
very well said with annotated sample personalities who made it top of Silicon
Vally ...

> Life is long. If his location gives him access to opportunities that you
> don't have, figure out a way to get access to those opportunities and
> execute on it once you graduate from college. Many prominent

> Silicon Valley people came from small towns in the mid-west (Marc
> Andreessen, Evan Williams) or immigrated from poor political situations
> abroad (Sergey Brin, Jan Koum, Elon Musk).

------
hacker_9
Does this optimise the hierarchy as the environment changes? For example when
cooking, I unpackage food as needed, but when it starts to clutter the
workspace I make a decision to fit in a 'clean up cycle' while waiting on some
other food to cook.

~~~
sharemywin
As far as I understood it, it learned sub-tasks then learned to apply those
sub tasks.

Kind of reminds me of the Soar system except using Deep learning instead.

[https://en.wikipedia.org/wiki/Soar_(cognitive_architecture)](https://en.wikipedia.org/wiki/Soar_\(cognitive_architecture\))

------
zardo
I was mulling over this idea yesterday in the context of RTS games... There's
no reason to consider changing your overall strategy every frame. Nice to see
it works!

It will be interesting to see how it performs with more tiers in the
hierarchy, and with more structured tasks.

Controlling a virtual arm to play a board game for example.

------
sharemywin
Found the paper from the wired article below

[https://s3-us-west-2.amazonaws.com/openai-
assets/MLSH/mlsh_p...](https://s3-us-west-2.amazonaws.com/openai-
assets/MLSH/mlsh_paper.pdf)

~~~
ohitsdom
There are buttons below the first video to read the paper and view the code.

------
indescions_2017
Next step: transfer learning and sharing amongst sub-policies in the graph
hierarchy. If an Ant Agent learns to "move up" to avoid obstacle or reach
goal. Why can't it infer the same for any cardinal or diagonal direction,
after observing the world around it. It's just a rotation or translation after
all.

Also, for small numbers of sub-policies, would Monte Carlo playouts be faster.
Where we are searching over the next step the Any may encounter. Which
presumably is a finite set of possible "wall-floor" configurations ;)

In any case, great work! Always love watching OpenAI vids...

~~~
jng
Well, it's really hard to read text upside down.

~~~
fiddlerwoaroof
I’ve heard this many times, but I’ve never had any issues reading text at any
angle.

------
sputknick
I don't understand where the 'hierarchy' comes into play? This reads to me as
a standard computer program where you execute code, and some of those lines
execute other segments of code which might be much more complex than what I
see. If I execute the line 'printline('Hello World')' I only excuted one line,
but many other things happened that I did not directly execute. I'm sure I'm
missing something, and this is somehow different and novel, but I'm just
missing it from this blog post.

~~~
zardo
It is effectively a system of reinforcement learning agents working in a
command hierarchy to solve problems that single reinforcement learning agents
fail to.

It's (somewhat)obvious that this is an idea worth trying. But that doesn't
mean actually getting it to work is easy.

~~~
sputknick
Got it, okay, so it is different from a traditional computer program, and more
like a business or military unit, where the agent at a high level "determines"
an action, and delegates the action to a lower level entity that doesn't
necessarily have the knowledge as to why it's doing this thing?

------
setr
Is it just me or is there something revolting about the character model?

Good work nonetheless but for god's sake give it six legs and make it black

------
gthinkin
Great work, Kevin!

~~~
kevinfrans
:)

