
Nvidia Is Building Its Own TensorFlow Processor Unit - mchugbl
https://www.forbes.com/sites/moorinsights/2017/05/15/why-nvidia-is-building-its-own-tpu/#45aef903347f
======
paulkrush
Finally! One of the big boys is talking about embedded deep learning at the
consumer level. (low cost)

Open Source? Wow did not see that coming. It’s good to see Nvidia is not
trying to lock up the low end IOT Deep Learning chips. It makes sense at they
want to sell more GPUs for Training.

What can you do with a $5 Pi Zero like SoC with a TPU/DLA/DSP that runs on 200
milliwatt that can infer deep learning models as well as a desktop CPU? Yes
onboard training would be nice, but models don’t always need to be updated in
real time. Also you can't always rely on the cloud...

~~~
jacquesm
$5? You still need to store the weights, and with 100's of millions of
parameters@32 bits each that still won't be cheap. It'll be a lot cheaper than
a GPU though and it will use far less power.

~~~
Yrlec
You can quantize it to get the parameters down to 8 bits.

~~~
p1esk
Actually even 1 bit might be enough.

------
RockyMcNuts
Strategy is, use expensive GPUs to train.

When your model is ready to deploy at scale, take the matrices and deploy them
in efficient and cheap ASICs.

Nvidia wins if the product of the GPU is as easy to deploy at scale as
possible for everyone, not just Google.

A lot of people who don't have the scale are going to use GPU/CPU for the
whole pipeline, and the people who have the scale weren't going to use GPUs in
the long run, this just helps them get there faster and realize return on
their GPUs.

~~~
joe_the_user
Well,

The broad paradigm of deep learning (and machine-learning generally) is a
"train-test-deploy" cycle and this fits with it. The thing is that however
successful this approach has been so far, it has rather clear limitations.
Unlike a scientific discovery, what's being discovered isn't a universal rule
but a heuristic between massive, real-world data and labels/qualities. As the
world changes, the actual correlation changes but the deployed solution
doesn't (and sure you can re-teach, re-test and so-forth but what if the
required model changes, what if your experts leave, etc).

So it seems like to make a deep-learning solution sustainable, you'd want a
method of including learning in your deployed solution (how you'd do that may
not yet be discovered but that doesn't mean it won't be discovered).

But this approach seems to do the opposite. It bakes the basic learn-test-
deploy process into silicone, making this approach more obligatory.

~~~
amelius
> As the world changes, the actual correlation changes but the deployed
> solution doesn't

Are you saying that self-driving cars might start crashing when pedestrians
wear different fashion in the future, or when cars have different designs?

~~~
jacquesm
That's roughly the gist of it. If the validation set is a subset of the
training set then you are making a whole pile of assumptions that need not
necessarily be true.

Since the network can't explain what those assumptions _exactly_ are it may
make conclusions that work empirically at training time and that pass
validation but that have no solid theoretical underpinning as soon as those
assumptions fail the systems will fail.

See also: data leakage

Continuous learning is hopefully going to address some aspects of this.

[https://deepmind.com/blog/enabling-continual-learning-in-
neu...](https://deepmind.com/blog/enabling-continual-learning-in-neural-
networks/)

And many more future tricks like it.

~~~
ethbro
Or essentially that deep learning is unable to discriminate between beliefs
and facts. And we can't read the models well enough to tell which is which.

So it would be like being handed a list of high level bridge building rules
with no context, then being asked to build a bridge in a geological
environment where one has never been built before.

Will it work? Who knows! How would we say?

~~~
jacquesm
Well, it's early days. We already know it is possible to generate pathological
inputs to elicit a false response from a network much like optical illusions
can fool humans.

For instance, in the context of adaptability of self driving cars:

The retraining could theoretically be fixed using an OTA update that
periodically pools new training data from in-service vehicles to further
improve and generalize the models. You'd have to build that in from day one
but presumably anybody busy with self driving cars would have this at the
forefront of their thinking.

Just so that when and if black clothing with white stripes becomes fashionable
we don't end up with a large amount of roadkill because self driving cars
interpret these as empty lanes.

------
freddealmeida
I think that might be a mistake. It should simply be Tensor Processing Unit.
Tensors, not tensorflow (which is an application). Am I wrong?

~~~
dheera
It's possible they are building hardware specifically for TensorFlow's API,
but it's also possible the journalists are just not tech-savvy and don't know
the difference.

------
bla2
Vaporware announcement 2 days before Google I/O...I wonder what Nvidia expects
Google to announce.

~~~
Symmetry
That they're selling their TPUs to the public?

------
SomeStupidPoint
The article says the title here, but doesn't TPU stand for "Tensor Processing
Unit", not "TensorFlow Processing Unit"?

I remember early releases saying that and Wiki[0] seems to agree.

I suspect that this is just lazy tech journalism.

[0]
[https://en.m.wikipedia.org/wiki/Tensor_processing_unit](https://en.m.wikipedia.org/wiki/Tensor_processing_unit)

------
sgt101
There's a big gap between the price of NVidia link servers and servers with a
bunch of 1080ti's or Titans in them. I think that this must be to do with
commodity production and pricing. GPU's are sold to gamers. Gamers want 60fps
4k gaming and there is a large market for cards that support this. I want
GPU's to train neural networks - but I think that very few businesses are
buying 20 GPU's and sticking them in servers. Even fewer are buying 20 NVidia
link systems. There is a market here, but it's small and hard for a gaming
company to address. I think NVidia know that there is a lot here for them, but
they must look at the numbers and wonder.

~~~
jacquesm
> but I think that very few businesses are buying 20 GPU's and sticking them
> in servers.

No, they're not buying 20, they're buying _100 's_.

And not just for deep learning either, CFD and all kinds of other
computationally expensive algorithms are more and more found on clusters of
servers with GPUs in them instead of mainframes. There is definitely a huge
shift happening there.

------
zitterbewegung
So correct me if I am wrong but when Google created their TPU the only
advantage was inference with high power efficiency . NVIDIAs response on
making their own took a year because Xeons are nearly everywhere and can do
inference but less efficiently. I imagine since a TPU is a specialized
hardware it makes more sense in mobile and then a data center looking at how
slow AWS deploys hardware .

~~~
jacquesm
This is targeted straight at self driving cars and robotics applications and
presumably inference only.

------
chimtim
Intel needs to double or quadruple their processor's AVX width, add 8 bit
instructions and make it slightly more power efficient (for AVX) and it will
easily beat many of these TPUs. Of course this is easier said than done but I
am really surprised they haven't done it so far and instead bought Nervana and
other hardware chip startups.

~~~
phkahler
Have you read about the Google TPU? It's got 65536 ALUs running in parallel
turning out a result every clock:

[https://www.nextplatform.com/2017/04/05/first-depth-look-
goo...](https://www.nextplatform.com/2017/04/05/first-depth-look-googles-tpu-
architecture/)

They claim faster memory (GDDR5?) could easily triple the performance which
should bring it to 270TOPS or so. I don't think extending AVX is going to get
there any time soon.

~~~
chimtim
All this raw compute/parallelism is great but it does not really help the
algorithm output in terms of efficiency. The CPU/GPU difference is much
smaller than nvidia would like you to believe especially for more complex
networks which are becoming more and more common now. Of course if you want to
just do convolutions (which the TPU paper claims is only 5% of their google
workload), building a hardware around it may work well for specific
algorithms.

------
gens
Yea.. it's not a "deep learning processing unit", but a "tensorflow processing
unit". And it certainly is _not_ just a normal modern graphics card (GPGPU)
that can do other-then-32bit-float computations without dying (as AMD's could
do for a while now).

* Written using a FFPU (firefox processing unit).

------
dom0
How do I read forbes articles?

I only get a blurred page with a "Quote of the Day" that I can't close,
navigating to the original URL of course redirects me back to this nonsense
page, and trying to delete the covering elements with the inspector does not
yield results. WTF?

If they don't want people to read their articles, they could just make it
"HTTP 403 Forbidden, Status Code of the Day".

~~~
Encounter
uBlock Origin seems to bypass the quote page.

~~~
dom0
Doesn't for me, I have almost all default lists enabled.

However, whitelisting forbesimg.com works.

