Hacker News new | past | comments | ask | show | jobs | submit login

Disclosure: I work on Google Cloud (and vaguely helped with this).

For me, one of the most amazing things about this work is that a small group of people (admittedly well funded) can show up and do what used to be the purview of only giant corporations.

The 256 P100 optimizers are less than $400/hr. You can rent 128000 preemptible vcpus for another $1280/hr. Toss in some more support GPUs and we're at maybe $2500/hr all in. That sounds like a lot, until you realize that some of these results ran for just a weekend.

In days past, researchers would never have had access to this kind of computing unless they worked for a national lab. Now it's just a budgetary decision. We're getting closer to a (more) level playing field, and this is a wonderful example.

I would just want to comment that while this is true in principle, it's also slightly misleading because it does not include how much tuning and testing is necessary until one gets to this result.

Determining the scale needed, fiddling with the state/action/reward model, massively parallel hyper-parameter tuning.

I may be overestimating but I would reckon with hyper-parameter tuning and all that was easily in the 7-8 figure range for retail cost.

This is slightly frustrating in an academic environment when people tout results for just a few days of training (even with much smaller resources, say 16 gpus and 512 CPUs) when the cost of getting there is just not practical, especially for timing reasons. E.g. if an experiment runs 5 days, it doesn't matter that it doesnt use large scale resources, because realistically you need 100s of runs to evaluate a new technique and get it to the point of publishing the result, so you can only do that on a reasonable time scale if you actually have at least 10x the resources needed to run it.

Sorry, slightly off topic, but it's becoming a more and more salient point from the point of academic RL users.

I hear you. I would say that this work is tantamount to what would normally be a giant NSF grant.

Depending on your institution, this is precisely why we (and other providers) give out credits though. Similar to Intel/NVIDIA/Dell donating hardware historically, we understand we need to help support academia.

Yes, thank you for that by the way, did not want to diminish your efforts. Just wanted to point out that papers are often misleading about how many resources are needed to get to the point of running the result. I have received significant amounts of money from Google, full disclosure.

That's so awesome. Thanks for the exchange you two had. I love seeing the technology permeate through it's different causeways to become a useful and tangible product for more and more people. It's a thing of beauty to watch unfold each and every time, to me.

This is a very good point. While the final model might be a weekend of training, getting there is a lot more iterations/work.

>> Toss in some more support GPUs and we're at maybe $2500/hr all in.

Amazing, indeed. That's only 5/8 of my entire travelling allowance, from my PhD studentship.

Hey, I'd even have some pocket money left over to go to a conference or two!

This is more than many academic positions pay (or cost the uni) in a year; esp. in Europe. This an absurd amount of money/resources and more of a sign that this part of academia is not about outsmarting but outspending the "competition".

(I too work for Google Cloud)

I agree. One of the most amazing things about watching this project unfold is just how quickly it went from 0 to 100 with minimal overhead. It's amazing to watch companies and individuals push the boundaries of what is possible with just the push of a button.

Agree 100%, pay as you go compute has helped us tremendously. A large amount of our time is spent analysing results and interpreting models and the ability to power up and train a new topology without the huge cap-ex is the reason my company is still alive!

I agree that 2500x48hrs is probably a reasonably cost to pay for these kind of sweet results. But it is a bit prohibitively expensive for an ML hobbyist to try to replicate in their own free time. I wonder if there is some way to do this w/o all the expensive compute. Pre-trained models is one step towards this, but so much of the learning(for the hobbyist) comes from struggling to get your RL model off the ground in the first place.

It'd be interesting to see in the graphs (when the OpenAI team gets to them) how good you get at X hours in. Because if you're pretty good at X=4, that's still amazing.

Edit: I guess https://blog.openai.com/content/images/2018/06/bug-compariso... is approximately indicative (you currently need about 3 days to beat humans).

Transfer learning is about the best we can do right now. Using a fully trained ResNet / XCeptionNet and then tacking on your own layers after the end is within reach to hobbyists with just a single GPU on their desktop. There's still a decent amount of learning for the user even with pre-trained models.

+1 this is what I do for my at home (non work) experiments in using word embedding and RNNs for generative text summarization. Using transfer learning makes this affordable as a hobby project.

Quoting from the original article.

> This logic takes milliseconds per tick to execute, versus nanoseconds for Chess or Go engines.

So this is game engine itself, taking up the CPUs. Maybe the DoTA code can be optimized x2 for self play?!

IIRC AlphaZero was about x10 more efficient than AlphaGo Zero due to algorithm improvement.

So overall, $100K for the final training run, which maybe can go down to $10K for a different domain of similar complexity.

Interesting question! I assume in the Bot/headless mode, it's pretty optimized to skip the part needed for rendering, but you still need to do enough physics and other state update.

Best case, I'd assume at least a few ms per tick, because games become as complex as possible and still fit in 30 fps (33 ms, much of which is rendering, but still much happens regardless of producing pixels).

> Maybe the DoTA code can be optimized x2 for self play?!

Please don't. Every time they change something, several other things break.

Ok, just kidding.

But their fix logs are really look like the game logic is built by adding a hack on top of a hack with no automatic testing. Everything seems to hold on the playtesting.

Does the approach scale at low scale though? Like, would this project only bear fruit when run at large scale?

Getting budgetary approval isn't easy for everyone. Especially with an unproven process. And even then, there could be a mistake in the pipeline. All that money down the drain.

Good question! RL (and ML generally) definitely works better as you add more scale, but I still feel that this particular work is roughly "grand challenge" level. You shouldn't expect to just try this out as your first foray :).

I will note this paragraph from the post:

> RL researchers (including ourselves) have generally believed that long time horizons would require fundamentally new advances, such as hierarchical reinforcement learning. Our results suggest that we haven’t been giving today’s algorithms enough credit — at least when they’re run at sufficient scale and with a reasonable way of exploring.

which is mostly about the challenge of longer time horizons (and therefore LSTM related). If your problem is different / has a smaller space, I think this is soon going to be very approachable. That is, we recently demonstrated training ResNet-50 for $7.50.

There certainly exist a set of problems for which RL shouldn't cost you more than the value you get out of it, and for which you can demonstrate enough likelihood of success. RL itself though is still at the bleeding edge of ML research, so I don't consider it unusual that it's unproven.

Depends on the function you're approximating.

Great work! Having access to this scale of computing for so cheap really is amazing

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact