Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Reinforcement learning for single, lower end graphic cards?
49 points by DrNuke 9 days ago | hide | past | favorite | 16 comments
On one side, more and more hardware is being thrown in parallel to ingest and compute astonishing amounts of data generated by realistic 3d simulators, especially for robotics, with big names like OpenAI now just giving up on the field as from https://news.ycombinator.com/item?id=27861201 ; on the other side, more recent simulators like Brax from Google https://ai.googleblog.com/2021/07/speeding-up-reinforcement-learning-with.html are aiming at “matching the performance of a large compute cluster with just a single TPU or GPU”. Where do we stand on the latter side of the equation then? What is the state of the art with single, lower end GPUs like my 2016 gaming laptop’s GTX 1070 8GB? What do we lower end users need to read, learn and test these days? Thanks.

For many RL problems you don't really need GPUs because the networks used are relatively simple compared to supservised learning, and most simulations are CPU-bound. Many RL problems are constrained by data so that running simulations (CPU) is the bottleneck, not the network.

Yes. Unfortunately my experience in this area is about four years old, so maybe obsolete at this point. But at that time, for our problem -- which used many of the typical benchmark gym simulators -- the main bottleneck were CPUs, not GPUs.

I suspect the equation changes on how complex the neural network is (if it's simple, not much is gained from GPU), whether the simulation can take advantage of GPUs (the ones we used didn't, but for 3D graphics-heavy simulation and other kinds of computation I'm sure it can help) and the algorithm -- some algorithms rely more on on-line evaluation and others make more of an effort to reuse older rollouts. (An extreme case is offline RL, which has also attracted a lot of interest recently. Since you were asking for references, this might be worth a read https://arxiv.org/abs/2005.01643).

Yes. GPUs are extremely good at doing millions of small repetitive calculations, and not great elsewhere.

My capstone project used RL on a Raspberry Pi to train hardware-in-the-loop (essentially when to open and close valves based on sensor input). It was incredibly slow because it couldn’t be parallelized (without buying additional hardware for $500 each). Lots of professors asked why a Raspberry Pi was chosen when we had high end GPUs in the lab, and I had to explain that the Pi was NOT the bottleneck, and in fact stayed idle 95% of the time.

Would virtual physical simulation have helped in this case? What was the phase lag between valve actuation and the measured signal?

Create a metalearner than can do one-shot learning when it gets access to physical hardware?

Agreed, deep architectures are really only needed for feature engineering. There have been a few papers showing that even for these very deep setups, the actual policy can almost always be fully captured in a small mlp.

Can you share some recent references?

(Are you referring to the early papers showing that MPC and LQR solve SOME problems faster ?!)

One example with model-based RL is "World Models" by Ha and Schmidhuber. They pre-train an autoencoder to reduce image observations to vectors, then pre-train a RNN to predict future reduced vectors, then use a parameter-space global optimization algorithm (but any RL algorithm would work) to train a policy that's linear in the concatenated observation vector and RNN hidden state.

The important thing here is that the image encoder and the RNN weren't trained end-to-end with the policy. The learned "features" captured enough information to be an effective policy input, even though they only needed to be useful for predicting future states.

It's also interesting that the image encoder was trained separately from the RNN. I think that only worked because the test environments were "almost" fully observable - there is world state that cannot be inferred from a single image observation, but knowing that state is not necessary for a good policy.

I don't really see the dichotomy. Surely the features you need to learn depend on the task? Or do you mean running linear / shallow rl on top of unsupervised learning?

Instead of using your low end GPU, you could get a TPU like https://coral.ai/docs/edgetpu/benchmarks/. Or rent a single GPU on the cloud which costs less than a $/hour and can be free in certain cases.

In terms of APIs, you can try WebGPU which is nominally meant for Javascript in the browser, but there are native interfaces for it such as Rust: https://github.com/gfx-rs/wgpu

The Coral TPU is mostly made for inference only, not training, with a few exceptions.

This is mostly in the realm of computer vision, but I would recommend checking out AlexeyAB's fork of Darknet: https://github.com/AlexeyAB/darknet It's got decent CUDA acceleration, I personally run a GTX 960M for training.

Check out Andrej karpathys convnet.js and deepq learning web apps.

Convnet.js hasn't been updated in over half a decade.

Not answering the question directly but you could use a free gpu from colab https://colab.research.google.com/github/tensorflow/docs/blo... . Note that you need to be backing the checkpoints if you intend to run for more than a couple of hours.

GPU memory is a key limit here and RL worsens the problem because it requires some eligibility trace or memory system. One option you could try would be to store past S,A,R,S' tuples on disc using something like DeepMind Reverb, and have a small batch size and simpler model. Or as mentioned in other comments, you can just use CPU and RAM which is often a lot higher capacity depending on your rig.

Depends on what sort of RL you are doing. If you are trying to train agents to play small games with vision the agent will need a small cnn to process images. This will need a gpu and what you have should be enough.

I was training on atari for a while with 1080ti. The games run on the cpu so you need a decent cpu as well.

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact