
New chips for machine intelligence - jwhanlon
https://www.jameswhanlon.com/new-chips-for-machine-intelligence.html
======
sabalaba
Software. Software. Software. Just two companies, Google and NVIDIA, have
publicly launched a viable service or software stack. Just two companies have
successfully written a "sufficiently advanced compiler". Just two companies
actually have a product. And Google refuses to step into the arena and
actually compete with NVIDIA. Man, what a time we live in.

And no, AMD doesn't count. ROCm is a mess.

~~~
trust07007707
I wonder if WebGPU will reduce dependence on CUDA, esp as Tensorflow is being
ported to WebGPU. With WebGPU's improved performance and utility and the fact
that it runs on top of Vulcan, Metal and D3D with any GPU that has drivers for
those, I wonder if DL folks will find it more tempting to use TFJS/WebGPU via
Electron or the browser and just be done with CUDA (i.e. break or soften
NVIDIA's monopoly)

~~~
alexhutcheson
I've never heard anyone suggest that WebGPU would be appropriate for ML
training workloads. Maybe inference, but not training.

~~~
trust07007707
Well, the memory limit of a WebGPU process would be the limiting factor for
training. In addition, the bandwidth between the nodes and the parameter
server, if doing training in data-parallel fashion, is another limiting
factor.

------
cracker_jacks
Some of the numbers in that table do not make any sense and makes me question
the quality of the entire article.

Where are the numbers for the Cerebras chip coming from?:

\- How do you have a TDP of 180W for an entire wafer of chips?

\- Why is there a peak FP32 number when they are clearly working with FP16?

Each of these chips is a completely different architecture and it makes no
sense to compare them at this level. The only meaningful comparison is actual
performance in applications because that reflects how the entire system will
be used.

~~~
inetsee
One of the numbers that jumped out at me as being very unusual about the
Cerebrus chip was this one: "Speculated clock speed of ~1 GHz and 15 kW power
consumption."

15 kW power consumption for 1 chip?!?

~~~
borramakot
Reportedly, it's an insanely large (whole wafer) single chip. But, then, why
is it listed with a size that's not commensurate with that?

------
Symmetry
And this is ignoring all the bit players, some of which are doing crazy things
like using analog multiplication.

[https://fuse.wikichip.org/news/2755/analog-ai-startup-
mythic...](https://fuse.wikichip.org/news/2755/analog-ai-startup-mythic-to-
compute-and-scale-in-flash/)

That's currently just slideware at the moment.

------
carlsborg
Huawei looks like it has a strong game here:

"Ascend 910 is used for AI model training. In a typical training session based
on ResNet-50, the combination of Ascend 910 and MindSpore is about two times
faster at training AI models than other mainstream training cards using
TensorFlow."

[https://www.huawei.com/en/press-events/news/2019/8/Huawei-
As...](https://www.huawei.com/en/press-events/news/2019/8/Huawei-
Ascend-910-most-powerful-AI-processor)

edit: The software framework "MindSpore will go open source in the first
quarter of 2020."

~~~
sgt101
When I read about it first I wondered if it would be a mobile chip - but
apparently not (with a TDP of 300w)

I wonder how brittle the performance will be vs other models such as
transformers and DRL vs CNN and ResNet.

------
lopuhin
Small corrections:

> I’m focusing on chips designed for training

TPU 1 is designed for inference AFAIK.

> TPU v2: 45 TFLOPs

I think it would be great to clarify that what is commonly referred to as "TPU
v2" (e.g. on GCP pricing, also what is shown in the image in this article),
consists of 4 such modules with 8 cores total, which gives a more commonly
quoted value of 180 TFLOPs.

------
hoxmark
FYI: "DISCLAIMER: I work at Graphcore, and all of the information given here
is lifted directly from the linked references below."

------
yboris
I'm excited to see how the Habana Labs Gaudi performs in the real world:

[https://www.jameswhanlon.com/new-chips-for-machine-
intellige...](https://www.jameswhanlon.com/new-chips-for-machine-
intelligence.html#habana-gaudi)

They claim to have some great features. Anyone know when if a consumer version
is coming / any release dates promised?

------
brookhaven_dude
Are deep neural networks really that widely applicable that it's profitable to
design custom chips for them? What about other models of AI that involve, say,
discrete math or graph search?

~~~
modeless
Yes. They are far beyond any other AI technique in speech recognition, speech
synthesis, translation, OCR, object recognition, playing Go, and many other
diverse tasks. And their performance continues to increase with added
computing power with no limit that we've seen yet, so custom hardware improves
results.

~~~
streetcat1
Alas, you do not usually train models from scratch. I think that transfer
learning will dominate, and it does not need this power.

------
zapnuk
Will there be a customer grade TPU in the near future?

Or won't they be able to be as price/performance efficient compared to
(nvidia) GPUs?

~~~
Q6T46nT668w6i3m
Apple’s neural engine in their A-series of SoCs.

~~~
lern_too_spel
This article is about hardware for training neural networks, not the inference
chips that are in most phones today.

------
suyash
This list is missing mobile phone chips that are specially designed for Deep
Learning.

~~~
alexhutcheson
Existing mobile phone chips are designed for inference, not training. The list
is explicitly restricted to chips that are designed for training.

~~~
suyash
there is lot of training happening on the edge (mobile devices) as well, look
up 'federated learning' [https://medium.com/syncedreview/federated-learning-
the-futur...](https://medium.com/syncedreview/federated-learning-the-future-
of-distributed-machine-learning-eec95242d897)

------
chips2001
[https://towardsdatascience.com/how-to-make-your-own-deep-
lea...](https://towardsdatascience.com/how-to-make-your-own-deep-learning-
accelerator-chip-1ff69b78ece4)

How to make your own AI chip

------
ckastner
> Intel NNP-T TSMC 16FF+

Intel has stuff made by other foundries?

~~~
kingosticks
Stuff they acquired. This is something originating from Nervana Systems and I
think there are also some Altera chips out there. Intel's custom foundry
offering has historically been poor so chances are anyone they buy will have
been using someone else (why take the risk and change that).

------
HNLurker2
'''CONCLUSION Graphics has just been reinvented. The new NVIDIA Turing GPU
architecture is the most advanced and efficient GPU architecture ever built.
Turing implements a new Hybrid Rendering model that combines real-time ray
tracing, rasterization, AI, and simulation. Teamed with the next generation
graphics APIs, Turing enables massive performance gains and incredibly
realistic graphics for PC games and professional applications.'''

Quoted from Nvidia Turing datasheet

------
steve19
I am surprised Amazon has not jumped in the game, renting out an accelerator
like Google does with the TPUs

~~~
Jack000
AWS gpu compute is extremely expensive. If this is due to datacenter licensing
costs, I hope they come out with their own hardware soon to reduce these
costs. If on the otherhand, it's because their value-add is not in renting out
the hardware but burst scalability, then I'm less optimistic that they'd
cannibalize their own cloud product.

Currently, it only takes about 1 month to break even if you buy a consumer gpu
like the rtx 2080ti, compared with AWS time. For training purposes it doesn't
seem to make sense.

\- just looked up the numbers and google tpus are pretty similar in terms of
pricing. I think any aws equivalent would probably be just as expensive
compared to a diy pc.

~~~
dannyw
That's because NVIDIA's EULA requires you to buy a NVIDIA Tesla in datacentre
developments; and Teslas are marked up by thousands of dollars.

------
kowarie
what are your thoughts on how federated learning might change the HW landscape
for edge devices?

