
Introducing the TensorFlow Research Cloud - Anon84
https://research.googleblog.com/2017/05/introducing-tensorflow-research-cloud.html
======
aub3bhat
For comparison, at 180 teraflops per TPU (assumption: at fp32) this is
equivalent to offering a cluster with 15,000 Titax Xp GPUs (1200$ per GPU = 18
Million $ worth of compute power).

If TPU teraflops are reported at fp16, then the number would be half.

Titan X offers 12 TFLOPs per GPU
[https://blogs.nvidia.com/blog/2017/04/06/titan-
xp/](https://blogs.nvidia.com/blog/2017/04/06/titan-xp/)

~~~
fizixer
180 = 12 x 15

Where are you getting the extra 3 zeros from?

(Also 'up to 180 TFLOPs' is a bit misleading. As I read in some benchmarking
posts by Google as well as by nVidida, a TPU is much faster than a GPU for
doing inference, but they haven't released any data on training performance,
the real bottleneck IMO).

~~~
superfx
Also, I suspect the more relevant number is the "Tensor" ops that Nvidia
recently reported for Volta at 120 TFLOPS [1], which would be relative to
Google's TPU a "mere" 1.5x speed up. Of course, this is pure speculation on my
part.

1\. [https://devblogs.nvidia.com/parallelforall/inside-
volta/](https://devblogs.nvidia.com/parallelforall/inside-volta/)

~~~
asdf_
Google's 180 teraflop number is for a module, which contains 4 TPU chips.
Meaning each TPU is 45 teraflops. See: [https://arstechnica.com/information-
technology/2017/05/googl...](https://arstechnica.com/information-
technology/2017/05/google-brings-45-teraflops-tensor-flow-processors-to-its-
compute-cloud/)

~~~
jlebar
I replied to this comment elsewhere in the thread [1], but to reply here as
well:

A GPU is also comprised of multiple chips (RAM, etc). I don't think
"performance per discrete piece of silicon" is an interesting metric.

[1]
[https://news.ycombinator.com/item?id=14364169](https://news.ycombinator.com/item?id=14364169)

~~~
asdf_
A single V100 board is vastly smaller than a TPU module with 4 TPU chips.

The closest comparison in terms of size would be 1 Volta DGX-1 (8x V100s)
compared to 2 TPU modules (8x TPU2 chips).

Volta DGX-1: [https://www.nvidia.com/content/dam/en-zz/Solutions/Data-
Cent...](https://www.nvidia.com/content/dam/en-zz/Solutions/Data-
Center/dgx-1/data-center-products-dgx-1-components-843-u1.jpg)

TPU Module: [https://storage.googleapis.com/gweb-uniblog-publish-
prod/ima...](https://storage.googleapis.com/gweb-uniblog-publish-
prod/images/tpu-V2-hero.width-1000.png)

And for completeness, this is the size of a single V100:
[https://cdn.arstechnica.net/wp-
content/uploads/2017/05/34446...](https://cdn.arstechnica.net/wp-
content/uploads/2017/05/34446705711_76ab786243_o.jpg)

You can see that 8x V100s are still more computationally dense than 8x TPU2
chips. Density is a very important factor in datacenter design.

~~~
dgacmu
From my read of that Volta picture (here's a close-up glamour shot of the V100
module: [https://devblogs.nvidia.com/parallelforall/wp-
content/upload...](https://devblogs.nvidia.com/parallelforall/wp-
content/uploads/2017/05/SXM2-VoltaChipDetails.png) ), that's just the chip
module itself. Note how none of the interconnect hardware is present? The V100
supports NVLink just like the P100. When you actually assemble that chip
module onto a board, it looks more like this:

[http://hothardware.com/ContentImages/NewsItem/37034/content/...](http://hothardware.com/ContentImages/NewsItem/37034/content/small_QuantPlex-
Deep-Learning-Server.jpg)

(That's a P100 server, obviously, but it's the same kind of design. The P100
module looks a lot like the V100 module.)

~~~
asdf_
The interconnect hardware is present on the dgx-1 board.

The point of my previous comment is that it doesn't make sense to compare an
entire TPU module (containing 4x TPUs) to a V100 board.

The closest comparison to a TPU module (with 4x TPUs) would be a dgx-1 board,
which contains the nvlink bus that you mention, but also contains 8x V100
boards, hence why in my previous post I said you should compare the compute
performance of a Volta DGX-1 (8x V100s) to 2 TPU modules (8x TPUs).

At the end of the day, it is simple, in a given area of space, you can get
more compute performance from provisioning that area with V100s (in the form
of using dgx-1s) versus provisioning it with TPUs (in the form of using TPU
modules).

------
theCricketer
Question for any Google Cloud folks hanging out here: Is it possible to use
Cloud TPUs without using TensorFlow? Is there a more low level library/API to
run instructions on TPUs so other frameworks can work with TPUs?

~~~
jamesblonde
You can use Keras and use tensorflow as the runner for Keras.

~~~
alexcnwy
Keras is higher level, parent asked about lower level

------
jaflo
I might be naive to think this, but are Google Cloud services making money
simply through the fees you pay or does Google also have an interest in the
data generated by and passing through its services? If the latter, what data
do they collect and how do they profit?

~~~
ihsw2
They let other's use and sharpen their tools.

~~~
halite
One of the goals of this project is:

"Share the benefits of machine learning with the world"

It doesn't say who shares what :)

~~~
pm90
Google's stated goals seem to be very altruistic and to their credit, they do
genuinely contribute heavily to good causes and open science and research. I
kind of find it hard to believe a for-profit corporation is purely altruistic,
but their play here seems to be to basically encourage research and make it
more accessible to more _people_ so that more human minds come up with novel
ideas that Google itself might some day use in the future. e.g. I can see how
investing millions of $'s in funding PhD students can pay off if even a single
one of them discovers an obscure algorithm that increases efficiency of some
process by just 0.1%, but at Google's scale that might still save millions of
$'s more.

~~~
moosingin3space
I believe Google is playing long-term here -- if they make it really easy for
someone to do ML research using their frameworks, they'll collect rent from
the researcher and possibly get something more innovative out of their work
later.

Not everything they do is about short-term data plays.

------
jjm
For me the major take away is that Nvidia is no longer the only supplier of
"decent" DL hardware.

To think that any entrant no matter the size could come into the space and
successfully develop a device in such short order is amazing.

Not only that but have it used within their own infrastructure for some time,
and later to allow outside external usage, again amazing from a planning,
execution, product perspective. Ah shucks, from a hardware perspective too.

Especially coming from a company In the traditional sense which has no
business being in this sector (excuses).

It seems like NVidia is an army, polished to pump "chips". And this Google
special ops team kicked ass.

~~~
andreyk
"To think that any entrant no matter the size could come into the space and
successfully develop a device in such short order is amazing."

Is this a fair statement? I mean , this is Google. Hardware development is not
easy or cheap, and they've been at it for years. They ponied up lots of money
for AI starting as early as 2012, and this company is all about running huge
jobs on lots of compute on giant data - it is hard to think of a company more
suited to developing this kind of hardware (same for Facebook, in that sense).

~~~
jjm
You are right. Traditionally it was/is considered a difficult road to come
into any space different than your core. Seems like Google and Amazon are
really good at executing almost anything.

~~~
novaRom
It is about money. Google and Amazon can offer significantly better
compensation than many other companies. So, they can attract the most
productive engineers.

------
KaoruAoiShiho
If I'm invested in NVDA should I sell? Is this is a real competitor that will
remove NVDA's AI leadership?

~~~
ruleabidinguser
Last I heard theyre creating their own custom chip as well.

~~~
KaoruAoiShiho
nvidia's not going to suddenly collapse because of this, but a lot of the
froth in that company is due to their existence as pretty much the ONLY
provider of DL accelerators. They had no competition and there were no
competitors in sight, it looked like they were going to dominate the space in
the short term. This changes now I think. The 2nd Gen TPU is competitive even
if not strictly better. In a few years instead of nvidia being a clear leader
it may just be one of a pack, and that's bad for the current valuation.

~~~
nl
No, it's great news for NVidia.

Nvidia is the _only_ current credible provider of DL hardware. If Google
starts using TPUs then every other company will be forced to buy more Nvidia
cards or they will be left behind.

Microsoft is the only exception here: they have some investment in a FPGA
based ML solution. Even IBM uses NVidia as their DL solution on Power servers.

~~~
wmf
I know of one startup working on a TPU killer and there must be others. Amazon
is probably either already working on a deep learning ASIC or scouting
startups to buy. Apple is probably poaching people from Google/Nvidia to build
their deep learning core.

~~~
nl
There's at least one person on HN working on DL hardware.

------
jamesblonde
This basically tells me that Google have a new generation of TPUs that have
arrived and they are giving us (researchers) the last generation. Not that I
have nothing against this. 3-4 years ago, AWS made huge inroads in mind-share
by giving us researchers tens of thousands of dollars in education grants for
writing a half-page proposal. Google have learnt from them.

~~~
dgacmu
I think if you look at some of the other sub-threads comparing TPUs to Pascal
and Volta, you'll conclude that the Cloud TPUs are cutting-edge technology.
Your conclusion makes sense in the context of TFRC, but not when Google's
creating a cloud offering that they hope people will pay for.

------
killjoywashere
I suspect a number of people signing up for this will be getting invited to
job interviews in the coming year.

------
kruhft
delay 7

say "I'd like to introduce you to the concept of Intelligence Artificial."

delay 2

say "You might have heard of Artificial Intelligence, a term coined by John
MiCarthy in the 1950s to label his study of difficult problems that his
computer science lab was working on."

delay 2

say "Today we have a resurgence of Artificial Intelligence. You may or may not
have noticed, but the Artifically Intelligent are now controlling your lives."

delay 2

say "From Siri, to Facebook, to the Googles, to the Amazon, your data is being
fed into 'Machine Learning' algorithms at a rapid pace."

delay 2

say "Custom hardware is being designed this very minute to accelerate the
training of these algorithms."

delay 2

say "That is what Artificial Intelligence is these days. Machines learning,
about you."

delay 3

say "Intelligence Artificial turns this concept on it's head. By learning more
about the machine, YOU, you can learn to control the machine."

delay 1

say "I"

say "A, not, A I."

delay 7

say "BusFactor1 Inc. 2017"

delay 1

say "Putting the ology into technology."

delay 3

say "Or, is it the other way around."

delay 4

say "I am the machine telling you about learning, this is my reference _clip_
and this is where we are going"

~~~
kruhft
Rendered: [http://busfactor1.ca/ia.m4a](http://busfactor1.ca/ia.m4a)

