
Agent57: Outperforming the human Atari benchmark - EvgeniyZh
https://deepmind.com/blog/article/Agent57-Outperforming-the-human-Atari-benchmark
======
sytelus
This whole evolution looks more and more like expert systems from 1980s where
people kept adding more and more complexity to "solve" a specific problem. For
RL, we started with simple DQN that was elegant but now the new algorithms
looks like a massive hodge podge of band aids. NGU, as it is, extraordinarily
complex and looks adhoc mix of various patches. Now on the top of NGU, we are
also throwing in meta-controller and even bandits among other things to
complete the proverbial kitchen sink. Sure, we get to call victory on Atari
but this is far and away from elegant and beautiful. It would be surprising if
this victory generalizes to other problems where folks have built different
"expert systems" specific to those problems. So all this feels a lot like
Watson winning jeopardy moment to me...

PS: Kudos to DeepMind for pushing for median or even betten bottom percentile
instead of simplistic average metric which also hides variance.

~~~
chpatrick
Doesn't the human brain have millions of years of band aids?

~~~
jacobush
Yes and at this pace it may take thousands (or at least hundreds) of years to
patch up our A.I. to general A.I.

------
callmekit
"The human" is an "average human" from the "Human-level control through deep
reinforcement learning" paper.

For example, for Montezuma's Revenge:

\- "average human" score is 4753.30

\- Agent57 score is 9352.01

\- human record is 1219200.0

------
WnZ39p0Dgydaz1
According to the paper this takes around 53418 hours to train (5e10 frames /
260 frames/sec), distributed over 256 machines that's around 200 total hours
wall-clock time.

~~~
odnes
Mind-boggling. They train on each game for a total of 1e11 frames of
experience. At 30 FPS that is ~106 calendar years of constant gameplay.

~~~
londons_explore
This is how I explain AI to clients...

Think of your best employee. You show them how to operate the
videoconferencing system, and that's done.

Now think of your worst employee. You have to show them how to operate the
videoconferencing system 3 times because they keep getting it wrong or not
understanding how to start a meeting. Eventually, when they've got the hang of
it, they're fine too.

AI is like an employee that needs to be shown how to do it a million times.
Unless you have the time or the data to show the AI how to do a simple task
millions of distinct times, or someone else has already done it, AI can't help
you.

Here deepmind is just showing that for slightly more complex tasks, that ~1e6
factor turns to ~1e11...

~~~
antpls
That's only the current state of the art however. The goal is transfer
learning, meaning that when the AI learns something somewhere, it will improve
its knowledge on other taks too.

A better analogy would be a baby and an adult. The adult is full of past
experiences and can use that experience to learn some other tasks faster.

Currently, our AI technology isn't even at the "baby" stage, as it is not able
to transfer knowledge between arbitrary tasks. This is an active research
domain.

------
thedance
Reading the blog post (haven't read the paper yet) makes it sound like this
technique might apply to fuzzing. If this thing seeks out and exploits novel
states in a large state space, that's the kind of direction you want in your
fuzzer too.

------
KaoruAoiShiho
Definitely getting closer, another landmark development just like alphago.

------
maerF0x0
Not exactly related to Agent57, but to the post itself. Perusing a bunch of
those games in the playlist gave me a deep respect for Atari game makers'
creativity. Many things that were "games" then may not be considered a game
today (eg, a gopher digging dirt to get carrots from a farmer). Also I
recognized many similarities between the game mechanics and NES games for
example atari road running is like excitebike, pitfall like a boy and his
blob, etc.

------
francismb
Shouldn't be a comparison between single agent and human performance a better
measurement? I mean ok, its defined against "human performance" but a human
cannot play parallel on multiple environments or on a tree of possibilities.
It just plays again and again single shot/agent.

------
unwoundmouse
Would it be fair to say this is a resnet moment for rl

~~~
WnZ39p0Dgydaz1
This seems more like overfitting a complex system to a specific problem set.

~~~
KaoruAoiShiho
This feels like the typical HN less storage than a Nomad comment. To me this
system reads as very important and necessary for the development of AI in
general.

~~~
WnZ39p0Dgydaz1
Meh, I'll take the bait.

There are no fundamental new ideas in this paper compared to the preceding
papers. What they do is to tune hyperparameters (BPTT length,
exploration/exploitation tradeoff, and policy parameterization) in a smart
fashion as to fit the bottom 5% of Atari games. Obviously the parameters, or
equivalently, architecture choices, are tuned to achieve exactly that - good
performance at the bottom 5% of Atari games. None of these choices will
generalize outside of this specific set of Atari games.

The reasons we are doing badly at these games are well-understood. They
typically require "world knowledge" (what is a key? what is a door?) and
reasoning (I found a key, that can be used to open a door). That is, the
visual representations need to encode such knowledge. Algorithms don't possess
this world knowledge as they are not embodied in our world, so they need to
learn it from scratch, i.e. brute-force it. That's exactly what this paper is
doing - brute-forcing the solutions by finding just the right hyperparameters
with millions of hours worth of compute.

A good analogy is what would happen if you took the game but flipped the
pixels in some deterministic way so that the screen would look like noise to a
human. A key would no longer be a key, but the structure is still the same. If
someone asked you to solve Montezuma's revenge with that representation, you
would not be able to. Does that make you stupid or non-human? So, because
these games require human world knowledge, solving them in the same way as
simpler games is kind of besides the point.

~~~
KaoruAoiShiho
Thanks for explaining your take but this sounds very reductive. When it comes
down to it every problem is solved by tuning for that problem specifically.

While Never Give Up (NGU) is not fundamentally new, it is an important step in
computers learning. You need to be able to generalize solutions to problems
where you don't have contextual information. Imagine if you were a caveman and
asked to operate an iPhone. You're not stupid if you don't know how, but if I
tell you, "never give up", and put you in a room for 5 years I'd expect some
results from a sentient being. This is an important process too.

We would be much closer to good AI if it can figure things out by itself
instead of being constantly fed "clean" data.

~~~
WnZ39p0Dgydaz1
There's a difference between research and engineering. Is this system
impressive? Definitely! It's a complex engineering effort, and highly tuned to
solve a specific problem - beat the Atari benchmark.

Does it teach us anything fundamentally new? No, it still has horrible sample
complexity and does not generalize to anything outside of Atari unless you
completely re-tune it. And I don't mean re-train. I mean changing the
architecture and assumptions. That's different from projects such as e.g.
AlphaZero or MuZero.

IMO this would've been more appropriate to be published as an open-source
system so that it can be applied and tuned to other problems as opposed to a
research paper. As research, nobody outside of DeepMind can ever reproduce
this.

You are completely changing the topic with this:

> While Never Give Up (NGU) is not fundamentally new, it is an important step
> in computers learning. You need to be able to generalize solutions to
> problems where you don't have contextual information

We're not even talking about NGU, we're talking about the paper linked in this
post. This specific paper proposes little new in that regard. It just
engineers a system to do this specifically for Atari games by taking a
previous paper and changing some parameters. Neat, but it's not some kind of
breakthrough.

~~~
visarga
> Does it teach us anything fundamentally new? No, it still has horrible
> sample complexity and does not generalize to anything outside of Atari
> unless you completely re tune it.

I thought it was interesting that they let the agent learn the
exploration/exploitation tradeoff, also combining memory and intrinsic
motivation.

Another contribution of this paper would be that they showed all these tricks
can build on each other, thus, are complementary.

Humans brains are also a bag of tricks fine-tuned to the goal and requirements
of making more humans.

------
Tepix
Impressive! Even more impressive would be outperforming a human with the same
amount of training...

~~~
visarga
That comparison would be difficult to do. From the time of first experience to
beating Agent57 it takes a lot of time to train the human too.

------
trevyn
Ok, maybe it's time to slow down and consider if this is going in a good
direction, before just building whatever it is that can be built?

~~~
unwoundmouse
Moloch whose mind is pure machinery! Moloch whose blood is running money!
Moloch whose fingers are ten armies! Moloch whose breast is a cannibal dynamo!
Moloch whose ear is a smoking tomb!

