
Accelerating Deep Neuroevolution: Train Atari in Hours on a Single Computer - mkvorwerck
https://ubere.ng/2qUDmwp
======
ironrabbit
> a run that takes 1 hour on 720 cores can be run on the CPUs of a 48-core
> personal computer in 16 hours

Is calling a 48-core machine a "personal computer" a bit of a stretch, or am I
missing something?

~~~
segmondy
Dunno, I have a 32 core, 128gb ram that I bought for $1000. I have a Xeon phi
60core card that I bought for $200 sitting by it ready go go in when I get the
chance. When GPU prices drop, I'm probably going to add 2 1080Ts.

I'm just a regular computer user.

~~~
taeric
Define "regular computer user." What are you doing with these machines that
comes close to using all of the cores?

~~~
nl
I don't know what the OP is doing, but something like XGBoost scales really
well across multiple cores.

I've used my 2014 MBP with a 4 core i7 (plus hyperthreading) plus 3 desktop i5
boxes for over 24 hours straight on the same XGB task before. The 8 CPU
threads on the MBP lets it outperform more recent machines on this kind of
task.

~~~
taeric
Oh, I get that there are things you can do with a setup like this. I challenge
that this is a "regular computer user" level of thing to do, though. Mining
bitcoin is not a regular user task. Nor is being a render farm for an animated
movie. :)

~~~
nl
It's on a post called "Accelerating Deep Neuroevolution". I'm pretty sure it's
a regular computer for anyone doing that....

I think the point of "regular" was that it's not exactly some extreme budget
needed to do this.

~~~
taeric
That makes some sense. However, even in an article about Olympic athletes, I
would expect a "regular person" to not be one of them.

~~~
nl
I think there’s an awful lot of emphasis on the word _regular_ and not much on
the “32 cores for $1000” which - given the context of this story - is much
more relevant and interesting.

~~~
taeric
I specifically quoted "regular computer user." This thread was specifically
about that being a personal computer.

I mean, yes, I can see how an individual can get one. And it is definitely
neat to see them pushed to their limits. But calling yourself a regular
computer user with a machine that strong is a stretch.

------
minimaxir
> Modern desktops also have GPUs, however, which are fast at running deep
> neural networks (DNNs).

That's the rub: deep learning is restricted to CUDA/cuDNN/NVIDIA GPUs, which
exclude a large amount of desktops with AMD graphics (e.g. all modern Macs).

Current cloud GPU prices have dropped enough such that this approach may be
pretty effective with spot/preemptible instances.

~~~
singhrac
Does anyone know why AMD has dropped the ball so hard on not supporting
Tensorflow/PyTorch? It seems like the kind of work 2-5 talented engineers
could do (i.e. push harder on HipM, reimplement CuDNN, etc.), and would
_seriously_ impact sales, right? I probably misunderstand either the hardware
limitations or business side from AMD's perspective.

~~~
Nokinside
You misunderstand the difficulty in the software side.

Developing competitive CUDA equivalent is much harder task than what you
assume. There needs to be lots of tooling and need for optimizations in
software to make these low level libraries high performance. The amount of
testing and benchmarking to make the right choices may take many people
working full time. Implementing the functionality is not enough.

AMD has had apis for a long time and it has an has an open source deep
learning stack but it's not good enough. AMD also has CUDA to HIP converter
but the results are not competitive and miss features.

Both AMD and Intel are probably getting there eventually and Nvidia is cashing
the monopoly phase before the prices stop.

~~~
currymj
i don’t know that it has to be competitive in speed right away as long as it
works and is easy to install.

i’d be thrilled to have the option of doing deep learning on an AMD card even
if it ran at 1/3 the speed on a comparable (but probably somewhat more
expensive) NVIDIA card. it would open up a lot of options even if it were
still less economical in throughput/dollar.

if nothing else, how many machine learning researchers would like to prototype
things on their MacBook Pros?

~~~
nl
For prototyping (as in making sure you have your matrix shapes right) the CPU
versions of TensorFlow and PyTorch are fine.

They are even ok for _some_ kinds of training (eg, if you are doing transfer
learning with a fixed embedding/feature representation and have a pretty small
parameter space to learn).

But it's so easy to fire up a cloud instance at Paperspace or somewhere and
push some code across.

------
cshenton
If anyone’s interested, training a single game is affordable on spot
instances.

I have a replication including cloud formation scripts here
[https://github.com/cshenton/neuroevolution](https://github.com/cshenton/neuroevolution)

~~~
yazr
Nice work

Can u elaborate more on the network requirements? How much bandwidth is
required? (I presume latency is less of a problem)

~~~
cshenton
The algorithm only communicates sequences of seeds so the communication
overhead is very low. The master server doesn’t even break 1% cpu servicing a
few 100 workers.

