Flume is a map reduce framework at google and it is also the same name as a product by apache.
From a cursory glance, DMLab-30 levels are based on id's code, while RoboSchool uses a full-blown physics engine (Bullet).
Pretty exciting times!
A simulator is an unlimited, dynamic dataset, so much more than, say, ImageNET or even Wikipedia. The ability of an agent to create its own predictions about the evolution of the environment is essential in planning and reasoning. That's how we get to be so smart - we have a mental simulator we can apply to any situation to check what the outcome would be. When agents learn to do the same, they too can become smart. AlphaGo for example was doing that - planning ahead (MCTS) and that's how it beat us at Go. A simple neural net without planning wouldn't have beaten the best humans.
I think simulation is going to be the next ingredient we add to AI to make it learn and reason like humans. It's the missing ingredient we need, that combined with RL will lead to the next breakthrough.
Learns a policy. Transfer learning using "v-trace". Improves on A3C.
If I understand the repercussions of this correctly, could this be applied so that the swarm be able to learn from the mistakes of a single individual?
But of course there are various RL algorithms for dealing with having multiple interacting agents, for example, for Starcraft environments (in SC, there's only one player/mouse/keyboard, true, but the idea there is that multiple-agent RL methods have the advantage of, instead of learning one giant master NN to do everything, you can let individual smaller NNs control a few or just one unit and then they can coordinate for attacks).
Or do you really need a server farm for this?
DRL is currently CPU-heavy and GPU-lightweight, since you spend most of your time exploring the environment and only a little bit of work updating the neural network on the GPU, so if you have a fairly typical home computer like 6-cores + a Nvidia 1080 GPU, your GPU will go under-utilized while it waits on the 6 actors. So it'll still work, but obviously your wallclock is going to be a lot longer than the numbers in OP and it'll be somewhat worse than you think because your system is unbalanced and has reduced throughput.
Afaik, this problem hasn't been solved.
I mean using this at home on my own machine, not uploading it..
Edit: Or... You mean running the whole distributed learning algorithm on one machine. I don't see why not, but I doubt it would outperform something intended to run on a single machine.