For me, one of the most amazing things about this work is that a small group of people (admittedly well funded) can show up and do what used to be the purview of only giant corporations.
The 256 P100 optimizers are less than $400/hr. You can rent 128000 preemptible vcpus for another $1280/hr. Toss in some more support GPUs and we're at maybe $2500/hr all in. That sounds like a lot, until you realize that some of these results ran for just a weekend.
In days past, researchers would never have had access to this kind of computing unless they worked for a national lab. Now it's just a budgetary decision. We're getting closer to a (more) level playing field, and this is a wonderful example.
Determining the scale needed, fiddling with the state/action/reward model, massively parallel hyper-parameter tuning.
I may be overestimating but I would reckon with hyper-parameter tuning and all that was easily in the 7-8 figure range for retail cost.
This is slightly frustrating in an academic environment when people tout results for just a few days of training (even with much smaller resources, say 16 gpus and 512 CPUs) when the cost of getting there is just not practical, especially for timing reasons. E.g. if an experiment runs 5 days, it doesn't matter that it doesnt use large scale resources, because realistically you need 100s of runs to evaluate a new technique and get it to the point of publishing the result, so you can only do that on a reasonable time scale if you actually have at least 10x the resources needed to run it.
Sorry, slightly off topic, but it's becoming a more and more salient point from the point of academic RL users.
Depending on your institution, this is precisely why we (and other providers) give out credits though. Similar to Intel/NVIDIA/Dell donating hardware historically, we understand we need to help support academia.
Amazing, indeed. That's only 5/8 of my entire travelling allowance, from my PhD studentship.
Hey, I'd even have some pocket money left over to go to a conference or two!
I agree. One of the most amazing things about watching this project unfold is just how quickly it went from 0 to 100 with minimal overhead. It's amazing to watch companies and individuals push the boundaries of what is possible with just the push of a button.
Edit: I guess https://blog.openai.com/content/images/2018/06/bug-compariso... is approximately indicative (you currently need about 3 days to beat humans).
> This logic takes milliseconds per tick to execute, versus nanoseconds for Chess or Go engines.
So this is game engine itself, taking up the CPUs. Maybe the DoTA code can be optimized x2 for self play?!
IIRC AlphaZero was about x10 more efficient than AlphaGo Zero due to algorithm improvement.
So overall, $100K for the final training run, which maybe can go down to $10K for a different domain of similar complexity.
Best case, I'd assume at least a few ms per tick, because games become as complex as possible and still fit in 30 fps (33 ms, much of which is rendering, but still much happens regardless of producing pixels).
Please don't. Every time they change something, several other things break.
Ok, just kidding.
But their fix logs are really look like the game logic is built by adding a hack on top of a hack with no automatic testing. Everything seems to hold on the playtesting.
Getting budgetary approval isn't easy for everyone. Especially with an unproven process. And even then, there could be a mistake in the pipeline. All that money down the drain.
I will note this paragraph from the post:
> RL researchers (including ourselves) have generally believed that long time horizons would require fundamentally new advances, such as hierarchical reinforcement learning. Our results suggest that we haven’t been giving today’s algorithms enough credit — at least when they’re run at sufficient scale and with a reasonable way of exploring.
which is mostly about the challenge of longer time horizons (and therefore LSTM related). If your problem is different / has a smaller space, I think this is soon going to be very approachable. That is, we recently demonstrated training ResNet-50 for $7.50.
There certainly exist a set of problems for which RL shouldn't cost you more than the value you get out of it, and for which you can demonstrate enough likelihood of success. RL itself though is still at the bleeding edge of ML research, so I don't consider it unusual that it's unproven.