
Impala: Scalable Distributed Deep Reinforcement Learning - gwern
https://deepmind.com/blog/impala-scalable-distributed-deeprl-dmlab-30/
======
ysleepy
The name is taken [https://impala.apache.org/](https://impala.apache.org/) .
Both are "distributed and scalable".

~~~
LaPrometheus
This is typical Google/DeepMind ignorance. The rest of the world is non-exist.

~~~
krisives
I believe it started with Golang, where Go was already an existing language by
a hobbyist - granted nobody was using it kinda lame to steal his name.

~~~
visarga
Go was taken by ancient China before that.

~~~
nielsbot
maybe the game itself but the name is derived from Japanese, fairly recently

------
otoburb
I love the different yet similar 3D environments for RL agent learning. OpenAI
has a similar environment they call OpenAI Gym[1] to train agents, of which
RoboSchool is the closest environmental[2] analogy to DMLab-30.

From a cursory glance, DMLab-30 levels are based on id's code, while
RoboSchool uses a full-blown physics engine (Bullet[3]).

Pretty exciting times!

[1] [https://gym.openai.com/envs/](https://gym.openai.com/envs/)

[2] [https://blog.openai.com/roboschool/](https://blog.openai.com/roboschool/)

[3] [https://pybullet.org/wordpress/](https://pybullet.org/wordpress/)

~~~
visarga
These game environments are the sandbox where AI agents need to play in order
to learn and evolve. Just like human children need to play, RL agents have the
same need. There is a fundamental difference between learning statistical
patterns from a fixed dataset (supervised learning) and learning by
interactivity and exploration (reinforcement learning), where the agent can
create and test hypothesis, thus gaining causal inference powers.

A simulator is an unlimited, dynamic dataset, so much more than, say, ImageNET
or even Wikipedia. The ability of an agent to create its own predictions about
the evolution of the environment is essential in planning and reasoning.
That's how we get to be so smart - we have a mental simulator we can apply to
any situation to check what the outcome would be. When agents learn to do the
same, they too can become smart. AlphaGo for example was doing that - planning
ahead (MCTS) and that's how it beat us at Go. A simple neural net without
planning wouldn't have beaten the best humans.

I think simulation is going to be the next ingredient we add to AI to make it
learn and reason like humans. It's the missing ingredient we need, that
combined with RL will lead to the next breakthrough.

------
yazr
Reddit discussion with paper authors.

[https://www.reddit.com/r/MachineLearning/comments/7vkvg5/r_i...](https://www.reddit.com/r/MachineLearning/comments/7vkvg5/r_impala_scalable_distributed_deeprl_with/)

Learns a policy. Transfer learning using "v-trace". Improves on A3C.

------
leblancfg
Imagine a swarm of small robots, say drones with sufficient compute power on
board.

If I understand the repercussions of this correctly, could this be applied so
that the swarm be able to learn from the mistakes of a single individual?

~~~
gwern
Yes, but only if they are dispersed and not in a swarm. Once the agents are
interacting with each other in the swarm, the usual RL methods slow down
because now the 'environment' looks like it's changing every timestep (because
your fellow robots are executing slightly different policies than you and your
policy has been learned on the assumption of older policies).

But of course there are various RL algorithms for dealing with having multiple
interacting agents, for example, for Starcraft environments (in SC, there's
only one player/mouse/keyboard, true, but the idea there is that multiple-
agent RL methods have the advantage of, instead of learning one giant master
NN to do everything, you can let individual smaller NNs control a few or just
one unit and then they can coordinate for attacks).

------
spolu
Really like how V-trace reduces to A3C when on-policy. Quite elegant, but it’s
unfortunate they don’t give results on continuous tasks.

------
lawlessone
Would it be feasible for me to run their learning avatar at home? giving it
tasks to learn that i assign it?

Or do you really need a server farm for this?

~~~
Maybestring
If you allow untrusted users to define tasks for shared agents, then you must
assume you will have malicous or incompetent users defining 'poison' tasks.
These would be tasks that bias agents toward taking unproductive or harmful
actions.

Afaik, this problem hasn't been solved.

~~~
radarsat1
While you make a fair point, who mentioned anything about untrusted users? You
seem to be off on a tangent..

~~~
Maybestring
The question is about running a node in a distributed learning algorithm. The
algorithm depends on trusted nodes to function.

