
Japan to Unveil Pascal GPU-Based AI Supercomputer - jonbaer
https://www.nextplatform.com/2017/03/06/japan-unveil-pascal-gpu-based-ai-supercomputer/
======
deepnotderp
As someone who works at a deep learning chip startup, this is great news!
Looks like there's a market for our chips ;)

If anyone wants to learn more about AI chips, I'd be happy to answer
questions.

~~~
ericjang
NVIDIA is making a big bet on GPUs being the right abstraction layer for
research/production, and is nearly pivoting the entire company around this.
The software/developer tools layer is really important too. Custom AI chip
makers (e.g. Nervana) hope to compete with NVIDIA despite having fewer
resources. What are your thoughts on how a small AI chip company can best
NVIDIA/other competitors?

~~~
deepnotderp
We compete with them on efficiency. On a self driving car, a 1000W machine for
3d lidar data crunching isn't really feasible. If we can provide that at
10W,which we are aiming for, then we have a selling point.

As for our training chips, without divulging too much on a public forum (I'd
be happy to talk more in private or over email), I think we provide the right
level of abstraction and precision which would allow a researcher to one click
port a tensorflow Model (we plan to support a few others like Caffe, torch,
mxnet, etc out of the gate as well) to our chips.

~~~
imh
I expected that 1kW would be a drop in the bucket for a car, since moving that
huge hunk of metal from A to B seems like it should totally dominate. But I
worked it from first principles, and sanity checked against Tesla anecdotes
[0]. At usual speeds and 2.5-3 mi/kWh [0], that's a power consumption in the
tens of kW [1], so call 1kW a 5% bump. According to CNN[2], that could be
around ~$20/mo from super rough back-of-the-envelope math.

If that estimate is on the money or high, I'd totally pay <= $20/mo for my car
to drive itself. If it's low, I probably wouldn't pay $100+/mo for it.

[0] [https://forums.tesla.com/forum/forums/miles-
kwh](https://forums.tesla.com/forum/forums/miles-kwh)

[1] X mph / (Y mi / kWh) = X/Y kW

[2]
[http://money.cnn.com/2011/05/05/news/economy/gas_prices_inco...](http://money.cnn.com/2011/05/05/news/economy/gas_prices_income_spending/index.htm)

~~~
foota
You might be willing to pay $20/mo, but how much do you think Tesla would be
willing to pay to increase their range by 5%?

~~~
imh
I wrote out the numbers I worked out for a tesla, but I had also done it from
basic physics (plus referenced efficiencies) for a gas car, and it's not
_horribly_ different. I didn't include it because I have no idea what so ever
how big of a deal adding in a 1kW power system to a gas car that still drew
power from the gas engine, or what the losses look like. The order of
magnitude remains about the same. I drive a prius, and if my estimate is close
enough, the range isn't a factor for cars like mine. I was mostly curious in
order of magnitude kW stuff, and only tangentially $ stuff.

------
gwern
192 GPUs eh. Interesting to compare that with the numbers being dropped in
some of the Google Brain and Deepmind papers like 800 GPUs...

~~~
deepnotderp
AFAICT, this is a single supercomputer, not a cluster like the Google systems.

~~~
p1esk
What's the difference?

~~~
deepnotderp
Parallelism of sgd across multiple machines is a,highly non-trivial task.

~~~
gwern
I don't think parallelizing SGD is that important. There are steeply
diminishing and even negative returns (see the sharp minima paper) to
increasing minibatch size, so you wouldn't want to use hundreds of GPUs to run
huge minibatches up to the size of the dataset; and you wouldn't want to split
a model across multiple GPUs because no one has enough data+compute to train
monstrous models which are dozens or hundreds of GB in size (if anything, one
trend has been towards appreciating how powerful small well-trained NNs are
and how grossly overparameterized many past NNs have been). What you would use
that many GPUs for is hyperparameter optimization training many models in
parallel using deep RL or evolutionary computation, asynchronous RL exploring
many environments simultaneously, or supporting many researchers working on
their own individual projects. None of which needs super networking
parallelism stuff.

------
deepnotderp
To address everyone's questions about whether or not 1000W is too much for a
car, I should have clarified, the power itself is not too big of a concern.
But, having a large machine (I'm aware of the PX2 and it's successor, but
that's simply way too weak for what we need) on a car requires a lot of space,
energy to move and energy to cool.

------
ksec
Slightly Off Topic:

How far behind is AMD with GPGPU, or AI. It seems OpenCL is a dead end. CUDA
won. AMD announced some CUDA to (x) code conversion which never really caught
on.

~~~
exodos
I guess I could be wrong but I don't see OpenCL as dead. You usually hear
about CUDA more because it's being marketed and agreements signed between
Nvidia and the company using CUDA. I know quite a few organizations that
utilize OpenCL.

