
 Google announces a new generation for its TPU machine learning hardware - deesix
https://techcrunch.com/2018/05/08/google-announces-a-new-generation-for-its-tpu-machine-learning-hardware/
======
sabalaba
This is great news. It’s extremely important to the research community that
large companies enter into the DL silicon space to contend with NVIDIA’s
monopoly.

NVIDIA is now exerting pricing power to the point where they’ve decided to
train their sales people to disregard the metric that is most important to
customers: cost of training. Talk with one of their enterprise sales people
and you’ll find they’ll say things like “FLOPS / $ doesn’t matter” to justify
a 10x increase in price for their TESLA line. As history has shown, a
monopolist sows the seeds of their own destruction and, by disregarding the
metrics that matter, they alienate their customers.

Here are open source projects that you can contribute to to break the
monopoly:

ROCm: [https://rocm.github.io/](https://rocm.github.io/)

MIOpen:
[https://github.com/ROCmSoftwarePlatform/MIOpen](https://github.com/ROCmSoftwarePlatform/MIOpen)

TensorFlow:
[https://github.com/tensorflow/tensorflow](https://github.com/tensorflow/tensorflow)

~~~
twtw
Instead of being able to buy your own nvidia gpu and run any cuda or opencl on
it, now you can run only tensor flow on a tpu only in Google cloud.

How fantastic.

~~~
throwaway2048
Yeah seems like a way more abusive position to me frankly, a Google monopoly
would be way worse than a NVIDIA one.

~~~
jacksmith21006
Not a Google monopoly but instead another choice. To push Nvidia and others.

------
Voloskaya
> Google CEO Sundar Pichai said the new TPU is eight times more powerful than
> last year

Are we sure about this? He specifically said that a "pod" would be 8 times
faster than last year, not the TPU itself.

And the picture in the background showed what looked like to be 8 racks of 64
TPUs (or maybe 32?). Until now a "pod" was a single rack of 64 TPUs. So if the
new definition for a "pod" is 8 times as many TPUs as it was before, the
result is less impressive...

Is there any actual spec released?

~~~
twtw
Google started playing this game with the tpuv2, wherein they define a "TPU"
as whatever they want to make it sound suitably impressive (I.e. Cloud TPU is
four chips). This in turn led to nvidia calling the DGX-2 "the world's largest
GPU

~~~
jacksmith21006
What does it matter? What we care about is the cost. The TPU gen 2 are about
half as much as using Nvidia in the Amazon cloud for the same work. To me that
is what matters. If the TPU 3 further drops the cost fantastic.

How many chips sounds a lot more like a pissing contest. What if it was a
giant chip with actually a bunch of chips inside? Who cares?

~~~
Voloskaya
I care because assuming the cost of a single TPUv3 is roughly similar to the
cost of a single TPUv2 (at launch), then paying for a "pod" of 256 or 512
(whatever a pod is this year) will be nowhere near the cost of 64 TPUv2 "pod".

So I care because I want to know if I will actually see this 8x speed
improvement or if this is just marketing BS.

If NVIDIA tells me a 2080 is 8x time faster than a 1080, I know that a 2080
when released will be roughly the same price as the 1080 when it was released,
and so I can expect to actually see a 8x cost/perf ratio improvement (if we
put aside the fact that this are always best case scenarios)

~~~
jacksmith21006
You can already hook together TPUs. So, IMO, it is here or there what is
happening in the data center. What I care about is how much is the cost. Time
to train is more of a function of how much resources I want to use.

~~~
Voloskaya
So what? I just want to know if the 8x speed increase was achieve at roughly
the same cost (i.e same numbers of TPUs) or not.

If the 8x speed increase was achieve with 4x more TPUs (and roughly 4x the
cost) like it seems it was, then this is just marketing bullshit, and we
should really expect a 2x improvement in cost/perf ratio not 8x.

~~~
jacksmith21006
I am also curious. But really it comes down to cost. Right now we see about
1/2 the cost of using TPUs versus Nvidia on the AWS cloud.

Will be interesting to see if the difference grows further with the TPU 3.0 or
Google will just take larger margins.

------
jamesblonde
What will be interesting to see is if they are going the hardware
specialization route, like Nvidia with their support for efficient 4x4 fused
FP16 matmuls with FP32 matrix output (they call them 'tensor cores' \- hah). I
suspect with the liquid cooling, they are just dialing up the matmul speed,
which is probably the right way to go, IMO.

We are doing data-center experiments - using oil cooling. It's suprising to
see circuitboards dumped in ordinary oil and working away, no short circuits.

~~~
deepnotderp
You mean immersion cooling? As long as the fluid isn't conductive you're good
to go.

If you're interested, Alibaba has been doing some work along the lines of
production immersion cooling.

~~~
jamesblonde
Yes, it's not my research - a colleagues. He has some big vats of oil. It's a
sureal experience to see servers just being oil-boarded (can i say that? :) )
and coming out ok!

~~~
polvs
I'd love to discuss your research. We've been working low-profile during the
last years doing a lot of R&D and experimenting with different fluids and
components.

Mineral oil is ok for experimentation, but for long term material
compatibility and fire risk I wouldn't recommend it.

FWIW I co-founded [https://submer.com](https://submer.com) where we've
developed an all-embedded computing immersion cooling solution that is
virtually compatible with any kind of hardware (even fiber optics) and it's
orders of magnitude more efficient than traditional data center cooling
technologies.

------
esmi
Does the Google Cloud API for TensorFlow expose any hardware details? I am
just curious why Google announces these hardware details at all.

~~~
jacksmith21006
Look at this thread. Because people want to know as just what "techies" do.

But what I wanted to know is performance in terms of joules compared to TPU
2s.

My hope is we get a paper on the TPU 2 now the 3 is released. We got the TPU 1
paper as they were releasing the TPU 2 I suspect.

------
polskibus
Is this going to be available off-the-shelf like NVidia GPUs? That's the only
way to get wider and faster developer buy-in.

~~~
aseipp
No, they're going to be used by Google and nobody else, so they can establish
dominance in the cloud AI/AI SaaS space using their immense resources. This
will be used by internal Google projects to rapidly develop their large scale
models and, if you're lucky, available on Google Cloud one day (TPUv2 already
is, at least).

Google doesn't need "developer buy in" for these to make sense -- they need
better hardware for training and deploying deep learning models for their
products, which is the overwhelming motivation. And, if they offer these to
you, it's only because you're willing to pay for faster TensorFlow iteration.

~~~
option_greek
Models are the new bitcoins and no one is willing to sell the hardware
directly. Weirdly enough exact same thing happened in mining (specialized
hardware passed nvidia gpus in performance at some point).

------
frisco
100 petaflops per what? All of google’s deployment? Per rack? 100 Pflop / chip
can’t be right.

~~~
blattimwind
Per pod (256 TPUs)

~~~
raverbashing
Can you allocate a whole pod to do your computations? How much does that cost?

------
ai2323
This very typical google. V100 comes out, they don't deploy it in their cloud
immediately. Then they launch their tpu cloud. Spend 2 months touting
cost/speed in benchmarks. Then a week before io they make v100 available in
gce, nearly six months after aws. Then at io they announce tpu3. This is
supposed to be nvidia's most captive customer, and now looks like going to be
their biggest problem. Would love to see google spend with them and how its
changed since they ramped tpu2.

~~~
boulos
Disclosure: I work on Google Cloud (and even on the GCE parts).

No conspiracy here, we were simply late to market on the V100 Beta. As many
have noted, we also hold a really high reliability bar before going to Beta
(“Google’s Beta is like AWS’s GA” though I wouldn’t claim that in all cases)
and we’ve be in Alpha for months. If you’d like to be included in Alpha
offerings drop us a line.

As I’ve said in previous threads: Compute Engine intends to be the best place
for computing. That includes the kind of workloads TPUs are tuned for, but it
also includes all the stuff that GPUs excel at. That’s not going away, and GCE
isn’t favoring one over the other. We sell _infrastructure_ , and we don’t try
to (overtly) pick favorites, we let our customers do that.

~~~
ai2323
Apologies, stand corrected. Conspiracy theories are always more fun:) Guess
was just coincidence that after grabbing all this press on GCE v100 finally
being available, tpu3 is announced few days later.

------
tim333
The announcement bit on video
[https://youtu.be/ogfYd705cRs?t=1h36m](https://youtu.be/ogfYd705cRs?t=1h36m)

------
thwd
It's 'Tensor Processing Unit' as far as I know.

~~~
tlb
Title corrected from "Tensorflow Processing Unit"

------
ai2323
Welcome to ASIC land....training going the route of crypto mining

~~~
Maybestring
It's just an arithmetic logic unit for tensors. It's not at all like a crypto
miner that implements a single algorithm.

~~~
ai2323
You kind of summed it up right there...'it's just an arithmetic logic unit for
tensors'...whether its a matrix multiply engine or implmenting sha256 still a
custom rapidly iterated chip for narrow very specific use case. Google
accomplishment here clearly the software, but doubt they only ones going to
crack asic systolic arrays. At some point china inc figures this out in
mass...maybe bitmain themselves.

~~~
wmf
Bitmain already announced TPUs (cue "they'll only sell them after they're
finished training Skynet").

~~~
sanxiyn
FYI, here is Bitmain's AI product homepage:
[https://www.sophon.ai/](https://www.sophon.ai/)

They shipped Deep Learning Accelerating Card SC1 at $589. You can't buy it
right now because it is sold out.

------
zackmorris
Just to play devil's advocate for a moment: I'm excited that there will
finally be some viable competition for GPUs, but am disappointed that this
isn't a general-purpose multiprocessor.

We're long overdue for a general-purpose CPU with say 1024 cores, that avoids
a central main memory, where each core can be independently programmed just
like any other CPU. Google's may count as a somewhat general-purpose DSP,
which is definitely a step forward. But no matter how mature or mainstream a
framework like TensorFlow gets, it can never replace full programmability.

Without seeing the internals, I'm going to have to give this a nay vote for
now. There are many other rather exciting problems that need to be opened up
to a new generation of tinkerers. Off that top of my head, it's things like:
content-addressable memory to provide high data locality (evolving various
interconnects instead of hardwiring them), exploring other types of general
vector processing like the kind MATLAB/Octave uses, and exploring other hill-
climbing algorithms than backpropagation/neural nets.

I picture something more like network topology-agnostic Docker containers
programmed in Elixer/Erlang/Go that can act as semi-autonomous agents and
switch into various modes in order to solve the problem at hand. I just find
that a much simpler metaphor to work with than OpenCL/CUDA/TensorFlow. Yes it
would take more silicon and would probably violate YAGNI, but only full
programmability gives us the freedom to explore the problem space at the level
that's going to be required to implement artificial general intelligence.

