Hacker News new | comments | show | ask | jobs | submit login
Impala: Scalable Distributed Deep Reinforcement Learning (deepmind.com)
135 points by gwern 8 months ago | hide | past | web | favorite | 28 comments

The name is taken https://impala.apache.org/ . Both are "distributed and scalable".

This is typical Google/DeepMind ignorance. The rest of the world is non-exist.

I believe it started with Golang, where Go was already an existing language by a hobbyist - granted nobody was using it kinda lame to steal his name.

While I agree with the other two, I remember when Golang came out and no one except the author of the other Go language had ever heard of it.

Go was taken by ancient China before that.

maybe the game itself but the name is derived from Japanese, fairly recently

First Flume and now this.

Flume is a map reduce framework at google and it is also the same name as a product by apache.

Ignorance, or arrogance? I find it hard to believe that devs at the caliber Google employs are ignorant of existing software, especially when it's in the same (or related) domain.

I'm starting to this we need namespaces for open source projects so we could refer to deep-learning:impala and database:impala. The java reverse domain name notation does handle this to a degree but I feel something could be done in this area to make the naming issue easier.

There are like 10 projects called themselves Aurora. Naming clash is nothing new, and can someone really claim a word or phrase as taken?

It doesn't matter if you "claim" it, but it's still confusing and harder to search for.

"Impala" != "IMPALA", and the domain is very different: "distributed database" vs "RL agent with game zoo", nobody's going to meet one in the same space as the other, to create confusion.

Err, I worked in a place that was using Spark-SQL (which is an Apache Impala competitor) and was doing deep learning (if not RL).

I love the different yet similar 3D environments for RL agent learning. OpenAI has a similar environment they call OpenAI Gym[1] to train agents, of which RoboSchool is the closest environmental[2] analogy to DMLab-30.

From a cursory glance, DMLab-30 levels are based on id's code, while RoboSchool uses a full-blown physics engine (Bullet[3]).

Pretty exciting times!

[1] https://gym.openai.com/envs/

[2] https://blog.openai.com/roboschool/

[3] https://pybullet.org/wordpress/

These game environments are the sandbox where AI agents need to play in order to learn and evolve. Just like human children need to play, RL agents have the same need. There is a fundamental difference between learning statistical patterns from a fixed dataset (supervised learning) and learning by interactivity and exploration (reinforcement learning), where the agent can create and test hypothesis, thus gaining causal inference powers.

A simulator is an unlimited, dynamic dataset, so much more than, say, ImageNET or even Wikipedia. The ability of an agent to create its own predictions about the evolution of the environment is essential in planning and reasoning. That's how we get to be so smart - we have a mental simulator we can apply to any situation to check what the outcome would be. When agents learn to do the same, they too can become smart. AlphaGo for example was doing that - planning ahead (MCTS) and that's how it beat us at Go. A simple neural net without planning wouldn't have beaten the best humans.

I think simulation is going to be the next ingredient we add to AI to make it learn and reason like humans. It's the missing ingredient we need, that combined with RL will lead to the next breakthrough.

Reddit discussion with paper authors.


Learns a policy. Transfer learning using "v-trace". Improves on A3C.

Imagine a swarm of small robots, say drones with sufficient compute power on board.

If I understand the repercussions of this correctly, could this be applied so that the swarm be able to learn from the mistakes of a single individual?

Yes, but only if they are dispersed and not in a swarm. Once the agents are interacting with each other in the swarm, the usual RL methods slow down because now the 'environment' looks like it's changing every timestep (because your fellow robots are executing slightly different policies than you and your policy has been learned on the assumption of older policies).

But of course there are various RL algorithms for dealing with having multiple interacting agents, for example, for Starcraft environments (in SC, there's only one player/mouse/keyboard, true, but the idea there is that multiple-agent RL methods have the advantage of, instead of learning one giant master NN to do everything, you can let individual smaller NNs control a few or just one unit and then they can coordinate for attacks).

Really like how V-trace reduces to A3C when on-policy. Quite elegant, but it’s unfortunate they don’t give results on continuous tasks.

Would it be feasible for me to run their learning avatar at home? giving it tasks to learn that i assign it?

Or do you really need a server farm for this?

The author addresses this on Reddit: https://www.reddit.com/r/MachineLearning/comments/7vkvg5/r_i...

DRL is currently CPU-heavy and GPU-lightweight, since you spend most of your time exploring the environment and only a little bit of work updating the neural network on the GPU, so if you have a fairly typical home computer like 6-cores + a Nvidia 1080 GPU, your GPU will go under-utilized while it waits on the 6 actors. So it'll still work, but obviously your wallclock is going to be a lot longer than the numbers in OP and it'll be somewhat worse than you think because your system is unbalanced and has reduced throughput.

I don't think it will work well unless you have several actors. IIRC, one of the main factors giving A3C an edge compared to previous solutions, is that multiple learners from different episodes contribute experience at the same time. This avoids weights being updated too much in one direction. This is similar to what using batches does in supervised learning.

If you allow untrusted users to define tasks for shared agents, then you must assume you will have malicous or incompetent users defining 'poison' tasks. These would be tasks that bias agents toward taking unproductive or harmful actions.

Afaik, this problem hasn't been solved.

While you make a fair point, who mentioned anything about untrusted users? You seem to be off on a tangent..

The question is about running a node in a distributed learning algorithm. The algorithm depends on trusted nodes to function.

"untrusted users"

I mean using this at home on my own machine, not uploading it..

I don't understand the question then. You want to run an actor disconnected from the learner, but you want it to learn?

Edit: Or... You mean running the whole distributed learning algorithm on one machine. I don't see why not, but I doubt it would outperform something intended to run on a single machine.

Alternatively, could this be some sort of Folding@Home model where a variable number of users can train actors for whatever learner they partner with?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact