Hacker News new | comments | ask | show | jobs | submit login
Accelerating Deep Neuroevolution: Train Atari in Hours on a Single Computer (ubere.ng)
155 points by mkvorwerck 9 months ago | hide | past | web | favorite | 38 comments

> a run that takes 1 hour on 720 cores can be run on the CPUs of a 48-core personal computer in 16 hours

Is calling a 48-core machine a "personal computer" a bit of a stretch, or am I missing something?

I think the intended implication is that it's accessible to individual researchers that don't work at Facebook/Brain/DeepMind with gigantic clusters (and the engineers able to maintain those clusters). Unfortunately deep learning is still expensive, but many areas of science are (most biology research, high energy physics, etc.). That's why we have grants.

If you don't care about your electricity bill, you can buy used Xeon workstations on your favorite auction site.

A couple years ago, we picked up a HP Z800 workstation with two 12-core processors and 24GB of RAM for $400.

The main downside is that if the power supply (proprietary) or motherboard die, there's not easy fix for it other than buying another of the same model.


I wanted something to host my private and experimental stuff. A friend of mine who tracks used server auctions pointed me at some servers that were Dell systems clearly custom built for one of the large SV companies. I paid $450 for a box with 72GB of RAM and 16 cores. At that price, I just bought a second one for spares. Which I haven't needed to use in the 4 or 5 years I've been using it.

It has been great.

Would you care to share the server auction site? I've thought about doing something like that.

This was 4+ years ago, so I have no idea of the current market, but I bought from this eBay vendor:


WeirdStuff also used to be great for this.

i also raised my eyebrows at this. i think "workstation" might have been a better choice of word than "personal computer". but people, especially in the sciences, do actually buy desktop computers with these sorts of specs -- they run like $10k-20k, probably.

Closer to the 20k end, I'm afraid. More options are opening up now with GPU processing power on less expensive desktops ( ~ $13k ).

Dunno, check out [1]. You can configure for 48 cores and 256GB ECC DDR4 for under $10k.

[1] http://www.titancomputers.com/Titan-W375-Dual-AMD-EPYC-Proce...

I think they're using "personal computer" there only in the narrow sense that a dual socket tower workstation conforms to the PC spec. But then they go on to GPU performance which allows a machine you could reasonably have as your home computer to do this quickly.

Dunno, I have a 32 core, 128gb ram that I bought for $1000. I have a Xeon phi 60core card that I bought for $200 sitting by it ready go go in when I get the chance. When GPU prices drop, I'm probably going to add 2 1080Ts.

I'm just a regular computer user.

You’re a computer hobbyist, you mean. You have to have done a whole lot of bargain-hunting to set that build up at that price. The “regular computer user” doesn’t bother with that—they just buy what they can afford at their local shop, which usually means a rather crap laptop.

(Mind you, it doesn’t take any more technical knowledgeability than most users to bargain-hunt like you have; but knowing that the product can be of varying quality at a given price, and caring enough to put in the time to get something of good quality, makes you an exception, in the same way that people who e.g. comparison-shop for quality suits or shoes are an exception. It’s a general mentality that most people don’t subscribe to—neither the “buying the best” of the rich, nor the “economizing” of the poor, but rather the optimizing ROI of the economics-minded. And even an economist will only apply this kind of thinking to cases where they actually believe this thing will affect their life enough that the research time-cost is worth it—i.e. when it’s a hobby of theirs. Economics-minded computer hobbyists are a rather more niche set than “regular computer users!”)

Define "regular computer user." What are you doing with these machines that comes close to using all of the cores?

I don't know what the OP is doing, but something like XGBoost scales really well across multiple cores.

I've used my 2014 MBP with a 4 core i7 (plus hyperthreading) plus 3 desktop i5 boxes for over 24 hours straight on the same XGB task before. The 8 CPU threads on the MBP lets it outperform more recent machines on this kind of task.

Oh, I get that there are things you can do with a setup like this. I challenge that this is a "regular computer user" level of thing to do, though. Mining bitcoin is not a regular user task. Nor is being a render farm for an animated movie. :)

It's on a post called "Accelerating Deep Neuroevolution". I'm pretty sure it's a regular computer for anyone doing that....

I think the point of "regular" was that it's not exactly some extreme budget needed to do this.

That makes some sense. However, even in an article about Olympic athletes, I would expect a "regular person" to not be one of them.

I think there’s an awful lot of emphasis on the word regular and not much on the “32 cores for $1000” which - given the context of this story - is much more relevant and interesting.

I specifically quoted "regular computer user." This thread was specifically about that being a personal computer.

I mean, yes, I can see how an individual can get one. And it is definitely neat to see them pushed to their limits. But calling yourself a regular computer user with a machine that strong is a stretch.


You're also one of "those" people

Where'd you get them? I could use an upgrade for that price.

32 core from Amazon Used, the Xeon phi when it was on sale 2 yrs ago.

It's more than a bit of a stretch I reckon. That's how the all but hermetically sealed SV tech scene warps some engineers sense of what a regular nerd has access to I suppose.

If you only had 10% of the cores, that hour would become a day, which is still pretty good.

> Modern desktops also have GPUs, however, which are fast at running deep neural networks (DNNs).

That's the rub: deep learning is restricted to CUDA/cuDNN/NVIDIA GPUs, which exclude a large amount of desktops with AMD graphics (e.g. all modern Macs).

Current cloud GPU prices have dropped enough such that this approach may be pretty effective with spot/preemptible instances.

Does anyone know why AMD has dropped the ball so hard on not supporting Tensorflow/PyTorch? It seems like the kind of work 2-5 talented engineers could do (i.e. push harder on HipM, reimplement CuDNN, etc.), and would seriously impact sales, right? I probably misunderstand either the hardware limitations or business side from AMD's perspective.

You misunderstand the difficulty in the software side.

Developing competitive CUDA equivalent is much harder task than what you assume. There needs to be lots of tooling and need for optimizations in software to make these low level libraries high performance. The amount of testing and benchmarking to make the right choices may take many people working full time. Implementing the functionality is not enough.

AMD has had apis for a long time and it has an has an open source deep learning stack but it's not good enough. AMD also has CUDA to HIP converter but the results are not competitive and miss features.

Both AMD and Intel are probably getting there eventually and Nvidia is cashing the monopoly phase before the prices stop.

i don’t know that it has to be competitive in speed right away as long as it works and is easy to install.

i’d be thrilled to have the option of doing deep learning on an AMD card even if it ran at 1/3 the speed on a comparable (but probably somewhat more expensive) NVIDIA card. it would open up a lot of options even if it were still less economical in throughput/dollar.

if nothing else, how many machine learning researchers would like to prototype things on their MacBook Pros?

For prototyping (as in making sure you have your matrix shapes right) the CPU versions of TensorFlow and PyTorch are fine.

They are even ok for some kinds of training (eg, if you are doing transfer learning with a fixed embedding/feature representation and have a pretty small parameter space to learn).

But it's so easy to fire up a cloud instance at Paperspace or somewhere and push some code across.

That's fair enough, and I was being a little flippant of a very difficult problem. I guess I was more curious about whether anyone has any insight into their possible upside (i.e. whether they can scale production if demand grows, how big a % of Nvidia's revenue is from scientific computing, etc.).

They are working on it pretty actively [1], you can see a progress chart on their website. I think some of the PyTorch devs have said they'd welcome AMD compatibility. If you go to the ROCmSoftwarePlatform github page you can see some of their ports, most of which seem to be actively developed.

[1] https://rocm.github.io/dl.html

Coriander [1] looks promising: "Build NVIDIA® CUDA™ code for OpenCL™ 1.2 devices".

[1] https://github.com/hughperkins/coriander

> cuDNN/NVIDIA GPUs, which exclude a large amount of desktops with AMD graphics

And whose fault is that?

The question isn't about fault, it's about accessibility and availability of commodity hardware. Consider the position of someone who made a financial decision to go with AMD to build a workstation and didn't have the budget to swap out the video card later.

If anyone’s interested, training a single game is affordable on spot instances.

I have a replication including cloud formation scripts here https://github.com/cshenton/neuroevolution

Nice work

Can u elaborate more on the network requirements? How much bandwidth is required? (I presume latency is less of a problem)

The algorithm only communicates sequences of seeds so the communication overhead is very low. The master server doesn’t even break 1% cpu servicing a few 100 workers.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact