Hacker News new | comments | ask | show | jobs | submit login
Comparing Google’s TPUv2 against Nvidia’s V100 on ResNet-50 (riseml.com)
171 points by henningpeters 9 months ago | hide | past | web | favorite | 127 comments



Thanks for sharing and very insightful. Guess the TPUs are the real deal. About 1/2 the cost for similar performance.

Would assume Google is able to do that because of the less power required.

I am actually more curious to get a paper on the new speech NN Google is using. Suppose to be 16k samples a second through a NN is hard to imagine how they did that and was able to roll it out as you would think the cost would be prohibitive.

You are ultimately competing with a much less compute heavy solution.

https://cloudplatform.googleblog.com/2018/03/introducing-Clo...

Suspect this was only possible because of the TPUs.

Can't think of anything else where controlling the entire stack including the silicon would be more important than AI applications.


Half the cost? Where are you reading that? Yeah on demand rental in AWS is expensive, but both long term and buying V100 yourself is significantly cheaper. Cloud companies have pretty fat margins on on demand rentals.

You can’t buy a TPU, it’s a cloud only thing. They also show it’s not a huge difference in both perf and time to converge (albeit only one architecture)

I would say kudos to V100 and this benchmark that breaks the TPU hype.


The chart has 6.7 per hour for 3186 images Google and 12.2 per hour for 3128 AWS.

Or maybe reading it wrong?

That is close to half has much to use Google is it not?

BTW, The TPUs are also about twice as fast also.

Sounds like Google is pretty far ahead of Nvidia. Which really just makes sense as Google does the entire stack and just going to have the data to optimize the silicon.

About half the cost is hype?

I want in the cloud and not have to deal with updating, etc. Would think most are the same for anything of any scale. Could not imagine any longer building up rigs and dealing with all the issues. Plus much harder to scale.


It's more a comparison of AWS vs. Google Cloud pricing than Nvidia vs. TPUv2.


Strongly disagree. If Google is able to offer at about 1/2 the cost using their own silicon versus AWS using Nvidia that is all about the silicon difference.

But we also have the V1 TPU paper and can see the TPUs are able to use less joules per inference compared to an older Nvidia architecture. Was not that close. Just makes sense Google V2 TPUs would do the same.

Hope Google does a V3 TPU and then will share a V2 TPU paper like they did on V1 of the TPUs.

What is far more impressive of the TPUs is

https://cloudplatform.googleblog.com/2018/03/introducing-Clo...

If really doing 16k a second through a NN and at a price you can offer generally now that is incredible. I want this paper even more so.


What makes you so sure it is all the silicon difference and not just AWS pricing their product at a more profitable price point?

These costs also ignore transferring and storing massive data sets in the cloud. In general the cloud is a huge pain and I'd avoid it like the plague unless I was caught and really, really needed the scalability. But even then that only works if you have a scalable implementation of the algorithm you are working on.


Maybe, maybe not. They have the advantage that they make the hardware, so they're not paying as much retail as nvidia is charging them for their cards. I don't think there's any way you can say the TPU is cheaper compared to buying your own system. If Google decides to release it to the public, that's a different story. Also, keep in mind that Google allows you to mix and match the CPU core count to GPU, whereas AWS doesn't. It's possible that the Google cloud price with fewer CPU cores will be much cheaper than the AWS instance.


That is true. But the cost of running is where all the cost is at really not so much in making the chips.

Yes I can say it is a lot cheaper. That is what this article is all about.

You can do about twice the images per dollar using the TPUs with GCP versus using Nvidia with AWS.

Or what am I missing?

BTW, Google has released to the general public. What are you talking about?

"Google’s AI chips are now open for public use"

https://venturebeat.com/2018/02/12/googles-ai-chips-are-now-...


You misunderstood. They released them to the public on GCP only. Nvidia's cards are released to the public as a hardware device that you can customize around. Big difference.


Yes in the cloud as you would expect in 2018. Available to the general public.


They announced in 2016 they had TPUs. So no, I would not expect that 2 full years later they're just now being available in the public cloud. These are not new products to them; they likely just don't want to deal with supporting them in different configurations.


> they likely just don't want to deal with supporting them in different configurations.

It is a lot of work. But mainly that TPUv1 only did inference while TPUv2 does training+inference.


Exactly my point. It's a lot of work. That's the reason why Nvidia has such a large team doing it, and also why they spent 3 billion dollars to build the V100 ASIC.


Big difference is Google does the entire stack and also has scale to conceptually be able to create a better solution. But that is theoretical.

Here we can see the results where the Google TPU gets almost twice result per dollar over Nvidia. But then Google should be able to iterate more quickly.

Take the move from using CNN to using capsule networks. The idea for Capsules came from Hinton and Google is going to be there first to optimize in hardware. This is the benefit of playing in all layers of the stack.

Or the using NN for text to speech and offering at scale. Google just has inherent advantages over Nvidia and now we get to see a little more concrete results. But hope we get a lot more similar and see if the Google advantage holds up.


If anything, the pricing likely benefits Google. As in Google may be more profitable with the TPU usage, even at 1/2 the cost of Amazon's V100 usage.


fwiw, the "TPU instance " has more than one tpu chip on it.


The architectures are so radically different that I don't think it makes sense to try to compare anything but the whole system performance. Trying to do a 1 to 1 comparison for a core or a chip becomes pretty nebulous because the architectures are radically different.


It has more than the chips, too, since the TPUs can't run a TCP/IP stack, gRPC server, etc.


See the chart titled: Performance in images per second per $.

TPUv2 is has 1.27x-1.86x the images/s/$.

And the other chart titled: Cost to reach 75.7% top-1 accuracy.

Where TPUv2 costs 62.5% the reserved GPU instance and 42.6% the unreserved GPU cost.

Key takeaway from the article:

> While the V100s perform similarly fast, the higher price and slower convergence of the implementation results in a considerably higher cost-to-solution.


The impression I got was opposite: TPU is not the hot shit that Google claims it is. Pricing is kind of irrelevant since they can subsidize this to create that story.


I know an engineer who prototypes GPU-like systems with FPGA and he has told me to be skeptical about performance miracles.

No matter how fast a system is on the inside you have to get data in and out of it -- at the very least to memory. SRAM takes too much area and there is a limit DRAM bandwidth despite technologies such as eDRAM and HBM. Some tasks are compute intensive, but for general tasks, a processor that is 100x faster would need 100x faster memory to really be 100x faster.

Thus advances in real-life performance are likely to be more like a factor of 2.

For training I never pay full price in the AWS cloud, rather I run interruptable instances and pay a fraction of the list price. People I know who train in the Google cloud seem to get interrupted all the time even though they are paying full price.

Inference is another story. Once you have the trained model, you will usually need to run inference many many more times than you run training and this gets more so the bigger scale you are running at. That hits your unit costs and it is where you need to pinch every penny.


> Pricing is kind of irrelevant since they can subsidize this to create that story.

Depends on how much you plan to use the hardware. If it's running near continuously, total cost of ownership is very important. Power costs can quickly dominate TCO.


At the pricing extreme, Google could make their TPUs free to use and charge elsewhere in their cloud. This shows that literal pricing is pretty irrelevant.


So could AWS/Nvidia.


AWS yes. Nvidia, not so sure. When you buy a 1080ti you are competing with gamers and miners (and maybe others). There's nothing to subsidize, in fact those cards are selling above MSRP, because they aren't selling an ecosystem but a physical card.


> When you buy a 1080ti you are competing with gamers and miners (and maybe others). There's nothing to subsidize, in fact those cards are selling above MSRP, because they aren't selling an ecosystem but a physical card.

Those cards are also irrelevant to the comparison as they can't be bought in large capacities for ML workloads. We're talking about Titan-V's and DGX-1's here.


Are you suggesting the Titan-V price is subsidized by Nvidia?


Let's revisit your original point:

> Pricing is kind of irrelevant since they can subsidize this to create that story.

You seemed to imply they == google. My point is that it could cut both ways.


Did you get that impression from this line in the article?

> While the V100s perform similarly fast, the higher price and slower convergence of the implementation results in a considerably higher cost-to-solution.


Full disclosure, I currently work at Nvidia on speech synthesis.

You can definitely do this on a GPU. We use the older auto-regressive WaveNets (not Parallel Wavenet) for inference on GPUs, with the newly released nv-wavenet code. Here's a link to a blog post about it:

https://devblogs.nvidia.com/nv-wavenet-gpu-speech-synthesis

That code will generate audio samples at 48khz, or if you're worried about throughput, it'll do a batch of 320 parallel utterances at 16khz.


> About 1/2 the cost for similar performance.

I would expect a dedicated accelerator to need at least a 5-10X advantage to outweigh all the other infrastructure and ecosystem costs.

GPUs are more useful for a wide variety of data-parallel tasks, and many more NN frameworks work on top of CUDA than work on the TPU.

In terms of horizontal scalability, nvidia has been rapidly iterating on increasing both memory and interlink bandwidth (including NVSwitch [1]), while each 'TPU' is actually 4 chips interconnected so likely has less upward scalability.

Also note that the tensor cores on a V100 take roughly 25-30% of the actual area. If Nvidia wanted to, they could probably easily make a pure tensor chip that beat the TPU in performance, could be produced in volume on their existing process, and also had full compatibility with their entire stack.

All in all, a 2x price/performance advantage for a hyper-specialized accelerator is basically a loss, just like how nobody installs a Soundblaster card anymore, how consumer desktops don't run discrete GPUs even though integrated graphics are a few times slower, or

[1] https://www.nextplatform.com/2018/04/04/inside-nvidias-nvswi...


If that 2x price/performance scales for all of Google's inferencing then it is definitely not a loss for them. If they can halve their running costs for inferencing then they are saving themselves a ton of money. Their TPUv2 was announced slightly before the V100 and the money savings they make by not paying Nvidia premiums probably helps. From the customer point of view, what is a GPU other than a specialised accelerator. Without more details we can't know how a TPU really compares, but if your aim is to train/run inference of Tensorflow models, then they're a really competitive product at the moment.


I agree, but chip development is an expensive business. There is nothing preventing Nvidia from immediately turning around and building a specialised ML accelerator with better software integration and higher bandwidth. For all we know they could already be working on one.


They already did two generations. Google has over $100B in the bank with less than $4B debt. So money is not an issue. It is tiny in the scheme of things.

Google has an advantage as they do the entire stack and can better optimize like we see here with half the cost.


Nvidia is actively building an entire deep learning stack internally, all the way to releasing a self-driving simulation platform which they are using to build their own self-driving software [1].

I think they are actually farther along and more aggressive about exploring deep learning use cases in production than Google today; augmenting real data with extensive simulation is really a far-reaching idea that comes directly from their gaming experience.

> So money is not an issue. It is tiny in the scheme of things.

Money of course is always an issue long term; otherwise why doesn't Google Fiber just spend tens of billions of dollars to build out its nationwide network? Because it will see negative ROI even if they succeed.

The TPU has to eventually make a real return to Google, and it won't if nvidia can spend the same amount of money and build a faster product and sell it to all the other cloud players, which I believe they definitely can.

Put another way, the TPU has to be cheaper to Google than buying nvidia GPUs after factoring in its development costs, whereas nvidia gets to amortize those dev costs over all other cloud providers and all other GPU customers. Google isn't about to sell the TPU to other cloud providers; the entire idea is to use it to drive Google Cloud adoption.

The TPU is a fine chip, but if you just look at the big picture, there is every sign that nvidia could build the same or better product for less money because it has far more synergies across the hardware and chip design stack; e.g. the TPU only has PCIe connectors, while nvidia has already worked with IBM to get NVLink into supercomputers [2]. For some workloads the TPU will likely be bandwidth-starved communicating with the CPU and main memory.

[1] https://nvidianews.nvidia.com/news/nvidia-introduces-drive-c...

[2] https://www.ibm.com/us-en/marketplace/power-systems-ac922/de...


The problem is Nvidia is never going to have the AI expertise up and down the stack like Google.

As far as I am aware Nvidia does not even run a cloud do they? Obviously never going to have the production NN that Google has.

Google now has well over 4k NN in production and not sure if Nvidia has any? Well over a billion a day are using the Google NN. That data allows Google to iterate in ways that Nvidia just never would be able to.

But this was all theory and why starting to see a little more concrete results like this where Google with their TPUs able to charge 1/2 the price of using Nvidia is value. Then we also have the paper from Google on the Gen 1.

I would guess Google is working on a gen 3. Nvidia is trying to catch a moving target but without the data. So they are behind, trying to catch up, but missing an arm.

A perfect example of this phenomenon is Capsule network pioneered by Hinton. They use dynamic routing which is potentially going to require different approach to memory access as the pattern would be different than CNN or RNN.

Today the problem is memory access and no longer instruction execution. Google nailed the low hanging fruit with the Gen 1 TPUs. They have 65536 very simple cores. Now you have to go after memory access.

Your post is all over the place so a bit hard to respond. Google Fiber was NOT about cost. It was about AT&T and other established players with some local governments making it difficult for Google to access what they needed to be able to compete.

I hate debating something with someone that is doing what you are doing. Google Fiber? Really?

"I think they are actually farther along and more aggressive about exploring deep learning"

I do a LOT of surfing on sites and can easily say this is the craziest thing I have read in a bit. You are honestly comparing Nvidia to Google? Really?

Google solved Go a decade early. Hinton did the Capsule networks and basically the farther of DL. Well made it actually work. What breakthrough came from Nvidia?

A single one?

There is so much crazy stuff in your posts this must be driven by something else and something emotional? Your points are just not based on reality. Is this really about Google firing Damore?

BTW, Nvidia read the Google Gen 1 TPU paper and why we see them doing similar things. But Google is going to move to addressing the memory access problems as that is the next area to improve. Once Google figures it out then you will see Nvidia just copy the approach like they are doing with the gen 1 TPUs.

I listened to this Nvidia presentation on YouTube and they were basically quoting the Google TPU paper. Talking about using 8 bit, integers, etc, for inference.

Google will release the gen 3 and then share a paper on the gen 2 and we will see Nvivida then try to copy that one. Nvidia always a couple of steps behind.

But I am a super curious person and can you share what this is really all about?


Well, that's quite a lot to digest.

I'm not sure why you think I must be conspiratorial, although I will admit the thesis that 'Nvidia is an AI leader in software' is unusual, but ultimately I think well-supported by the public record and some diligent research.

I've been watching Nvidia for awhile, and one thing you notice quickly is that, much like Apple, they don't pre-announce or oversell vaporware; they tend to only announce things that they have already worked on for years and are imminently available.

> As far as I am aware Nvidia does not even run a cloud do they?

They don't run a public cloud yet, although they are making noises in that direction [1]. GPU Cloud right now is just a place where you get packaged Docker images (and then run them on AWS, GCE, what have you), but I don't think the branding is accidental—they are setting it up so if they decide to build a public cloud, ML researchers will already be familiar with the term.

They are also doing distributed cloud GPUs direct to consumer via Cloud Gaming [2].

Internally, they have gone the HPC/supercomputing route to develop their own ML stack, rather than Google/MS/AWS hyperscaler route [3]. They basically built their own supercomputer based on Voltas, and they use it internally to do everything from developing self-driving car software [4], including the simulation platform.

Note that AFAIK, the simulation platform is far ahead of other players in the field. We have heard time and again that 'data' is going to be the competitive advantage to Tesla (miles driven) and Waymo (mapping data). What if you can partially sidestep the issue by leveraging the ability of humans to actually define dangerous scenarios and rigorously test them outside of the constraints of road driving?

The platform literally has literally built the idea of 'regression testing' and translated it into the ML space and they are planning to deploy this into production systems in the next 1-2 years. From what I've heard from ML researchers, the end-to-end testing and deployment of NNs is still rather in its infancy, in terms of being able to change your network and then do mass inferencing on prior 'test cases' that you think are important.

> Google Fiber was NOT about cost. It was about AT&T and other established players with some local governments making it difficult for Google to access what they needed to be able to compete.

You are defining 'cost' far too narrowly, or rather not seeing how non-economic costs eventually translate into economic ones. The established players made it difficult for Google. This eventually translated into 1) higher legal fees to fight them 2) slower deployment rates and 3) higher operational costs for expansion. All these things obviously cost lots of time money and sharply lower the overall ROI of a project, hence why Google has essentially given up. There's only risk, no reward.

The point is not to compare the TPU project directly to Fiber (the two projects are very different), but just to address your point that 'cost doesn't matter to Google because they have a lot of money'. Companies that truly don't care about cost will very soon end up with very little money. Put another way, I don't think the eventual reward from continuing TPU development will be more profitable than simply buying GPUs from Nvidia down the line.

> Now you have to go after memory access.

Nvidia might be better-positioned to optimize memory access than Google is, they have their own fabric and work with a large variety of partners to optimize their ML/DL workloads.

> Your post is all over the place so a bit hard to respond.

Well, the crux of my argument is that:

1. Chip development is an expensive business

2. Nvidia is good at building chips; the Volta is already within striking distance of the TPU using only ~25% of its die area for tensor units. As NNs grow, inter-node scalability will become more important, and Nvidia has large advantages in interconnect that will show up in large-scale deployments (like supercomputers, where I expect a lot of DL to happen)

3. Google's business strategy only allows it to spread development costs over its own deployment, while Nvidia lets many other players pay for the dev cost, including competing hyperscalers, HPC, gamers, and carmakers. Nvidia's potential 'ecosystem' is much larger than Google's. Historically, we've seen that structural advantage be very hard to surmount.

1-3 means that in the long run, a 'go-it-alone' strategy like Google's is unlikely to win a protracted R&D fight.

> Google solved Go a decade early. Hinton did the Capsule networks and basically the farther of DL. Well made it actually work. What breakthrough came from Nvidia?

Yes, Deepmind has made some great strides, but how does that directly fund TPU development and give it a competitive advantage? The fact that those papers are published means that any talented researcher at Nvidia can replicate the work, then run and optimize it on their GPU architecture.

> There is so much crazy stuff in your posts this must be driven by something else and something emotional? Your points are just not based on reality. Is this really about Google firing Damore?

I'm not sure why you are so convinced that only a crazy person with a beef about Google can have a differing opinion from you. Do you work on the TPU team or something?

> Google will release the gen 3 and then share a paper on the gen 2 and we will see Nvivida then try to copy that one. Nvidia always a couple of steps behind.

Where's your evidence that Nvidia is simply copying Google, rather than both engineering teams viewing the same problems and converging to similar solutions?

Note that even if it is true that Nvidia is simply 'copying Google', they have the resources to beat Google it its own game, by leveraging process, memory, CUDA, etc. You've studiously avoided addressing this point.

[1] https://www.nvidia.com/en-us/gpu-cloud/deep-learning-contain...

[2] http://www.nvidia.com/object/cloud-gaming.html

[3] https://www.nextplatform.com/2017/11/30/inside-nvidias-next-...

[4] https://www.youtube.com/watch?v=booEg6iGNyo


Wow! Thanks for taking the time on the long reply.

"You are defining 'cost' far too narrowly "

Agree but this is opposite here as Google has the advantageous compared to Nvidia. So something does not add up? I mean what you pointed out is a disadvantage for Nvidia? I mean heck the canonical AI framework Google controls. 100k stars on Github is just incredible and can only think of K8s doing anything similar?

A big difference I think you are missing is Google did NOT run their operation on Google Fiber. But they do on the TPUs. Google has over 4k production NN and the amount of money they save running them at less than 1/2 the cost for their own stuff versus using Nvivida is a huge amount of money.

But also keeps growing. Google has a fundemental advantage over their competitors having the TPUs. A perfect example is their new text to speech.

Speech using a NN at 16k samples a second at a reasonable price would be impossible without the TPUs.

https://cloudplatform.googleblog.com/2018/03/introducing-Clo...

Does NOT appear Volta is in striking distance. But more importantly Google will do a gen 3 and 4, etc. They have the data to iterate and Nvidia just does not.

But more importantly Google does the entire stack and Nvidia does NOT. AI it is so important to do the entire stack for efficiency reasons. Plus Google controls the canonical AI framework with TF.

"Google's business strategy only allows it to spread development costs over its own deployment "

Well that clearly is untrue. Just take their text to speech sold as a service and the cost of doing on Nvidia would have been prohibitive. They could not even offer the service without the TPUs.

"Do you work on the TPU team or something? "

No. But I have been running into all this hate for Google with the alt right all around firing Damore that logic is lost.

I was talking to a Russian this morning on Reddit and he was delusional because of his hate for Google based on him thinking they are left wing extremest.

"they have the resources to beat Google it its own game "

This is the exact problem for Nvidia. They do not have the resources to compete. That is the exact problem.

Chips will come from the big players and NOT third parties in the future.

The entire dynamics of the industry have changed and actually a lot more like the past ironically.

Google, Amazon, FB and other big players will do their own silicon. Even Tesla is suppose to do the same.

The reason is because the people that buy the chips now run the chips which was NOT true in the past. Use to be a Dell purchased from Intel and sold the machine to someone.

The big difference today is the users of the systems are centralized with the big cloud providers. So they now get the data to improve the chips which just was not true in the past.

Plus it is looked at as being a competitive advantage.

So Apple does their own. Google does their own including the PVC on the device. Amazon and FB will also do their own.

Google did the same thing years ago with networking. They quietly hired the Lanai team to build all their own network silicon which significantly lower their cost.

Heck Google then created their own network stack to make it determinate. It is how it was possible to create Spanner.

Tech companies are so much bigger today they have the resources to do all their own stuff and own every layer of the stack instead of using third parties.

Google could never be what they are today if they had not built their own stuff. Could you imagine the cost of using SAN instead of them creating GFS?


> Google has over 4k production NN and the amount of money they save running them at less than 1/2 the cost for their own stuff versus using Nvivida is a huge amount of money.

Yes, Google has a large ML deployment, but so does Nvidia, which is not (currently) focused on direct-to-consumer public APIs, but actually doing deep learning and simulation at scale.

The hyperscaler approach to ML is not the only possible way to scale up, Nvidia chose to go the HPC/supercomputing route and basically built their own supercomputer from the ground up.

Both approaches have their advantages and drawbacks, but one thing that supercomputing approaches have is a focus on vertical scalability. It's not just about samples/second, but how big can you feasibly make and train an NN? Note that the national research labs are getting into the act, and those supercomputers are basically built in close collaboration with Nvidia [1].

I would really recommend spending some time on their website and watching some of their videos, e.g. [2]. Jensen Huang is completely bought into deep learning and NN and has re-oriented its company towards making sure Nvidia can dominate the space.

> They have the data to iterate and Nvidia just does not

This is where I fundamentally disagree with you. This was true 3 years ago, but not today, mostly because Nvidia is the default option for ML researchers right now and they are slowly but steadily enticing everyone to collaborate with them (not to mention their self-driving efforts, which generate troves of data directly).

> Just take their text to speech sold as a service and the cost of doing on Nvidia would have been prohibitive.

That's on their own deployment.

Google is #3 in the cloud space right now. It's Nvidia-powered AWS + Azure ML deployments competing against Google, which also deploys V100s as well as TPUs.

Although it's possible for a single vertically integrated player to beat the rest of the market (e.g. Apple) for a long period of time, it's a difficult, risky proposition and it usually helps if they started out with a huge advantage, which Google doesn't seem to have since they're starting at #3 in the cloud space.

> hey do not have the resources to compete. That is the exact problem.

I think, perhaps, you are still imaging the company as it was in 2012 or 2015, but the company's revenues and profits have grown substantially in the past years.

Nvidia's market cap is $132bn and they have a profit run rate of about $4bn - 5bn / yr.

Their R&D spend has averaged about $2bn / yr for the last 5 years or so; in fact they beat AMD/ATI into the ground while spending less on R&D. They can basically triple the amount of money their pour into research if they wanted to.

By comparison, Google spends about $15bn/yr on R&D, but that's split across far more projects.

> Google does the entire stack and Nvidia does NOT.

I'm going to have to strongly disagree with you on that one.

Google owns more of the deep learning end-to-end cloud stack, but they do not own more of the hardware, software, or firmware stack for accelerated computing.

Which 'ecosystem', the easier access to data (which Google does have) vs. controlling the hardware + frameworks + partnerships, is an open question. I tend to believe the latter, because Nvidia has many options to get its hands on data (they can partner with the other cloud providers), while Google would have to invest quite a bit to compete on Nvidia's terms.

The easiest example, which I keep coming back to and which you haven't addressed, is how is Google going to compete on memory fabric and node architecture? Nvidia is out there building NVLink, NVSwitch, and basically their own supercomputing nodes (DGX-2).

They are working ORNL to build some of the largest Volta deployments in the world, so they are rapidly building experience on doing deep learning at large scale as well. How would Google be able to match this if NN/DL development turns out to scale vertically (and we are seeing this in rapid YoY growth of layer depth and network size in DL).

Again, TF is not really a direct advantage for Google because it runs equally well on Nvidia hardware. If Google is so confident in the TPU winning out, why are they busy deploying Voltas in GCE?

If you want to do deep learning today, Nvidia is the go-to option because every deep learning framework is on CUDA, including cuDNN. If I want to use the TPU, I am stuck using GCE + Tensorflow (although Keras / PyTorch may soon have support), but with Nvidia I have the choice between every single cloud provider or my own local deployment, which is always ultimately cheaper than paying for cloud time. Google seems unlikely to sell you a TPU for your own DL box.

> Google, Amazon, FB and other big players will do their own silicon

It's certainly an interesting space now. MSFT is busy buying FPGAs from Xilinx and Intel/Altera as part of their strategy. Ultimately though, you seem to think that Nvidia is still a niche GPU maker from 2013 or so; it's not, it is larger than Tesla and certainly has more than enough funding, plus a very focused execution team and CEO.

> Google could never be what they are today if they had not built their own stuff. Could you imagine the cost of using SAN instead of them creating GFS?

I agree that the hyperscalers found significant savings by looking up the stack, but that has limits. They aren't building their own CPUs, for example. Chipmaking is a very, very expensive game.

[1] https://insidehpc.com/2018/01/using-titan-supercomputer-acce...

[2] https://www.youtube.com/watch?v=Rn73n1HYYNs


Not aware of Nvidia having any where the number of neural networks in production or the nearly the number of users.

Not even sure where they are hosting them or even what they do? How about some color as you have me curious?

Have watched videos of Jensen but also watched an excellent almost 2 hour presentation from one of their VPs. He said a lot of things that were in the Google TPU paper which I found a bit funny. How you can use 8 bits and integers for inference for example. Said to me these guys are trying to catch up.

The problem is Amazon has that data NOT Nvidia. It is not in Amazon best interest to help Nvidia this is my exact point. The entire dynamics of the chip business have changed. You will see Amazon do their own just like Google has.

Once Google did the gen 1 TPUs they set the direction of you just can NOT buy off the shelve and compete long term.

The silicon is strategic for AI.

MS went the wrong direction in using a FPGA solution in addition to using Nvidia. But once again no data for Nvidia.

Market cap does not give you the money. But Google in 2018 will spend about 2x Nvidia 2017 sales! Yes you read that correct. Google on R&D will spend 2x Nvidia 2017 sales!

Google profits will be over 4x Nvidia total 2017 sales.

Once again Nvidia does NOT do the entire stack. I am not aware of any algorithm breakthroughs that came from Nvidia. I can not even name one AI expert Nvidia.

But the score board is papers excepted at NIPS. Nvidia did NOT get a single paper accepted that I saw at the conference?

Versus Google had more than anyone. 9% of all the paper accepted came from Google.

https://medium.com/machine-learning-in-practice/nips-accepte...

If Nvidia is playing in the entire stack how could they NOT get a single paper accepted at NIPS?

Or did I miss it?

If we look at Self Driving cars one of the most important AI applications Nvidia does not even show up on patents? Once again Google ahead by a mile.

https://www.theatlas.com/charts/r1iEkmKkz

Something in your post does NOT add up? Why if Nvidia is a player in the stack besides the silicon why do they NOT show up any where?

Google deploys both, TPUs and Nvidia, for a number of reasons I suspect.

The biggest is they want TF to be the canonical framework for AI and they MUST show not favoring their own solution until it is a done deal which is getting close.

In the TF will never run as well on Nvidia as they will on the TPUs. We can see hit here with about 1/2 the cost using the TPUs over Nvidia.

It is like saying Android would run as well as iOS on the Apple processors. It is all about controlling the entire stack like Apple has done and Nvidia is just not in a position to be able to.

Makes no sense to buy the processors so would not make any sense for Google to sell them to others. Not going to ever see that happen.

But I do think it is possible Google will sell the PVCs.

The ultimate problem is Nvidia is in perceptual catching up. Right now the big new thing that came from Hinton is Capsule networks and using dynamic routing. Google will have that optimized in silicon long before Nvidia will.

I suspect it will create the need for a different approach how you access memory in chip architecture.

But Capsule networks are heavy computationally and so silicon will matter a lot. Google has the algorithms and how they want to use in production at scale and then the money to execute in supporting in silicon. They just move way too fast for Nvidia to ever be able to catch up.


I'd really like to hear what you think about Nvidia's approach to self-driving, especially using supercomputing + simulation + backtesting to bootstrap the process. We keep going back and forth on this topic, but how can you develop a self-driving platform without a bunch of NNs in production, running on the Nvidia Saturn V supercomputer?

> How you can use 8 bits and integers for inference for example. Said to me these guys are trying to catch up.

I think it's interesting that you presume that only Google came up with the idea first, rather than 'reducing precision' to be a rather obvious idea that any chip designer or ML practitioner would have brought up. Again, can you please justify that?

I think where we're at a disconnect is that you equate AI leadership with publishing and patents, while looking at Nvidia, they are an extremely secretive organization that would probably avoid publishing what they see as a competitive advantage. This is similar to how Apple operates.

I used to work at finance, and the culture was the same way—banks had state-of-the-art models internally but would never share it. Published papers in academia were probably ~5 years behind what the banks had.

I do believe that Google (mostly Deepmind) is the leader in the research field, but note that they had to go out and buy that expertise.

> Google on R&D will spend 2x Nvidia 2017 sales!

Yes, but it's not all going into AI for sure, and definitely not into bankrolling the TPU effort. We should compare apples to apples here, surely?

> The entire dynamics of the chip business have changed. You will see Amazon do their own just like Google has.

So what about Nvidia's self-driving efforts? I've talked about it for about 3-4 posts now, with references to presentations and videos, and heard more or less crickets from you about it. I don't see how you can repeatedly say that Nvidia has no access to data when they clearly have a working product (Drive PX2) already, plus more (Drive Xavier) ready to be deployed in cars within the next ~18 months.

> Google deploys both, TPUs and Nvidia, for a number of reasons I suspect.

> The biggest is they want TF to be the canonical framework for AI and they MUST show not favoring their own solution until it is a done deal which is getting close.

Yes, but for those exact same reasons, the TPU will not be a strategic edge for Google and lower the ROI of working on the project.

You can't have it both ways: either the TPU is the secret sauce that drives Google Cloud adoption and gives them a big leg up in AI (in which case, they would want to leverage TF and make it 'run better' on the TPU than on other hardware), or else TF is a neutral platform and it doesn't benefit either party (which I actually agree with).

> It is like saying Android would run as well as iOS on the Apple processors. It is all about controlling the entire stack like Apple has done and Nvidia is just not in a position to be able to.

I think the analogy here is really apt, but also shows why I don't believe in Google's success here long-term.

The iPhone basically invented the smartphone market; its product was 10x better than any other competitor when it was introduced, and it was probably the majority of volume (and definitely profit) for years before Android was able to compete.

The TPU is not heads and shoulders above the competition. The Volta came out literally ~1 year after Pascal and had 10X the tensor throughput; you say that Google isn't standing still, but certainly neither will Nvidia.

Basically, Google is not starting from a 'commanding lead' position like Apple did. And we see today that even though Apple still leads in profits, Samusng is very close, and, Android is the vast majority of the market.

Larger ecosystems tend to beat fully vertical stacks in the long term. We see this across many markets and products. So why do you think this will be the exception?


It is good to see Nvidia trying to create a virtual world like Google has. But the problem is Google has the real-life experience to use with their virtual California.

But honestly Nvidia is so far behind in SDC and without any patents it is hard to see them competing.

https://www.theatlas.com/charts/r1iEkmKkz

Yes obvious would agree. But Google implemented late 2014 and Nvidia did NOT in 2014 or 2015 or 2016 as far as I am aware?

AI is not a secretive area. So if Nvidia had something we would know. On the lack of patents puts them in a very weak position. Especially with SDC.

" but note that they had to go out and buy that expertise."

This is one of the more stupid things I have read in a bit on the Internet.

In late 90s Larry Page was asked about using AI to make search better. He shared we are doing search to make AI better.

TPUs did NOT even come from Deepmind. But honestly knowing what to buy is important. But Google is miles ahead of everyone without Deepmind. TF also did NOT come from Deepmind. So many other things. I would actually say the Brain team has done a lot more in actual production than even DeepMind. But DeepMind is Google and rather dumb comment, no offense. How old are you?

Yes the TPUs are very strategic. It is how they were able to do AlphaZero. Or more importantly their new Speech offering at a reasonable cost. Without the TPUs that would not be possible and that is a strategic advantage for Google and why Amazon and everyone else will copy.

Buying off the shelve can NEVER give you a strategic advantage.

Google does NOT run their inference at scale on Nvidia. But also training has been moving to TPUs quickly for Google. They offer a choice but with the TPUs half the price as it shows how much better they are.

You want to get TF to be the canonical solution and then use your fundamental advantages. Just business 101.

"The TPU is not heads and shoulders above the competition. "

We can see the TPUs are heads and shoulders better. Heck they are half the cost. Much bigger advantage than the iPhone. But more importantly they will improve far faster than anything from Nvidia.

"Basically, Google is not starting from a 'commanding lead' position like Apple did. "

Google lead in AI is much, much larger than any of Apple. Heck Apple market share is about 14% and Google with Android has over 80% market share.

"Larger ecosystems tend to beat fully vertical stacks in the long term. "

There is no Nvidia eccosystem that I am aware of? The AI eccosystem is built around TF.


I've been nothing but unfailingly polite to you, and I'm getting tired that you repeatedly resort to name-calling and insults when you encounter an opposing opinion.

> But DeepMind is Google and rather dumb comment, no offense. How old are you?

Why are you getting all worked up? That's not an insult to Google, simply pointing out that their own organically grown corporate org (including Brain) was not adequate to do the cutting-edge research they felt they needed.

> AI is not a secretive area. So if Nvidia had something we would know. On the lack of patents puts them in a very weak position. Especially with SDC.

I disagree. Think about this the other way; if some company was quietly plugging away with large AI advances and deciding not to publish them, how would you even know? My evaluation of Nvidia's technology is based on their public presentations and the products that have already been released—products that every single AI practitioner on the planet buy and use, plus the 150+ car ecosystem partners that have decided to go with Nvidia's driving platform [1].

People who are far more deeply enmeshed in this technology than you or I have voted with their feet and decided to build their core competency for the next 5+ years on Nvidia's platform, while Waymo has maybe 2-3 major automotive partners?

> Google lead in AI is much, much larger than any of Apple. Heck Apple market share is about 14% and Google with Android has over 80% market share.

Bottom line, I definitely agree that Google is an AI leader, but I do not believe that the AI future will be run on TPUs, for the simple reason that chipmaking is a risky, expensive endeavour, and Nvidia has much more expertise than Google does in that regard, while having access to a larger partnership, ecosystem, and its own set of data and engineering.

Put it this way, the actual chipmaking stack is more important than the data stack when it comes to making chips. Just think about it—let's say you've run your thousands of NNs to benchmark the workloads on the TPU, and it turns out that CPU-TPU and TPU-TPU bandwidth is the real bottleneck. What do you do as Google? They have no expertise in building interconnects and scaling them, while Nvidia does.

Data only gets you so far, you still need to be able to do the semiconductor engineering + create partnerships, and in that regard Nvidia is light-years ahead.

To belabor the point, if the goal is to make chips, then being good at chipmaking is very important, and Nvidia is closer to Google in data, than Google is to Nvidia in chipmaking.

I will bet you that 2 years down the line, Nvidia will have abandoned its own TPU project and all major players just buying Nvidia chips, both for inferencing and training.

This is exactly the role Intel plays today in CPUs, and it's both natural and reasonable, and the largest reason is because of structural market factors, which you have never even responded to.

Google's cloud is a fraction of the size of AWS and Azure's — that means Nvidia makes far more money from Voltas than Google will ever save on the TPUs, and plough that right back into additional R&D. Business people demand a positive ROI. Where will the positive ROI from a TPU come from?

Google is and will be an AI leader. But it will certainly not be doing its own chips.

[1] https://www.nvidia.com/en-us/self-driving-cars/partners/


What name calling? Totally against name calling.

Deepmind is Google but only one aspect. Much of the best research does not even come from the Deepmind unit.

Nvidia had zero papers at NIPS accepted. Plus GANs, Capsule networks, AlphaGo and so many other breakthroughs come from Google.

But maybe I am just unaware. Can you provide some breakthroughs from Nvidia? Maybe I am just unaware?

SDC will be winner take all and Waymo is literally miles ahead of everyone else.

We have recent benchmarks done on Nvidia versus the TPUs and the TPUs are about 1/2 the price of using Nvidia for the same amount of work. That is a big advantage for Google.

But also that was gen 2 and suspect we will see a gen 3 soon which will be another step forward. Nvidia will constantly be trying to catch up.

"I will bet you that 2 years down the line, Nvidia will have abandoned its own TPU project and all major players just buying Nvidia chips "

This does not make sense to me and think you had a typo?

BTW, Unless Nvidia makes major advancement Google just could never use Nvidia for their own stuff. The cost would just be way too high. Perfect example is the new Google text to speech using a NN at 16k samples a second. There is just no way Google could have used Nvidia and offer at a competitive price. The joules per inference is just way too expensive with Nvidia.

Google would have loved to buy chips for their stuff from Nvidia. Problem is they just do not have anything they could use at a price they could offer at scale.

So unless Nvidia catches up you will not see Google use Nvidia for their services.

The big new advancement I suspect we will see with Gen 3 is different memory architecture to better support dynamic routing with Capsule networks which came from Hinton.

Then it will be a couple of years before we see the same from Nvidia.

BTW, what is different is nobody is going to be tied to any chip architecture like we had with Intel. Those days are gone. The common layer will be TF. Now has over 98k stars on GitHub.

Besides K8s what else got to 100k stars faster.


> But maybe I am just unaware. Can you provide some breakthroughs from Nvidia? Maybe I am just unaware?

You must be joking. I've only been pointing you repeatedly to Nvidia breakthroughs, with references and links provided, for the last 5 posts. Either you are blind, or willfully ignorant.

More self-driving accidents are only going to accelerate the pace. 3 years from now the government will be requiring auto makers to use the Drive Constellation [1] for safety testing.

Google has nothing remotely comparable, neither in published research nor in announced products.

We've gone through something like 8 replies and thousands of words, and you have written exactly one sentence addressing Nvidia's developments.

> The joules per inference is just way too expensive with Nvidia.

The Drive Xavier is basically a giant inferencing engine on a power-constrained platform [2]. It will be shipping in quantity in 2019. There is no equivalent Google product even announced.

> Now has over 98k stars on GitHub.

The fact that you are resorting to GitHub stars to make your argument is utterly laughable.

CUDA and cuDNN is the real enabler, which every single DL framework today (including TF) supports. People trust Nvidia far more than they trust Google to be an ecosystem partner.

> Unless Nvidia makes major advancement

Looked through your social media posts, half of them are pro-Google fanboism.

Nvidia's makes major advancements every 6 months across the entire deep learning stack. I keep pointing you towards what they're doing, hoping you have something interesting to say, but all you have to offer is the same tired Google cheerleading.

Look, I'm a long-time investor in both companies and like them both very much. But it's quite obvious you have zero interest in doing even a smidgen of research about Nvidia nor their technology.

Anyway, thanks for the replies, but I'm no longer interested in continuing this convo. You don't seem to know anything relevant at all about Nvidia, nor are you interested in learning more, despite all attempt to point you towards interesting things that they're doing, and why their approach is unique.

[1] https://www.youtube.com/watch?v=lVlqggTiTzY

[2] https://www.engadget.com/2018/01/07/nvidia-xavier-soc-self-d...


Just one Nvidia breakthrough? Just one algorithm? Capsule? No. GANs? No. Alphazero? No. What? Why no papers excepted at NIPS?

I doubt we will see deaths with Waymo any time soon. But yes with Tesla and the others. Google has 20k cars on order for 2020. Google does it without a safety driver and no one else close. Nvidia even gave up with the real road and now copying Google in creating a virtual world.

Does not seem like you even understand what TF is?

Ironically investor in Nvidia and Google. Google since their IPO. Well not in the IPO but open market at the time of the IPO. Google will do better than Nvidia long term but there will be scraps for Nvidia so wanted a position.

Plus you will get create price action with Nvidia because of hype but have to watch it and know when to get out. Think safe for a couple of years.

Problem is Nvidia does not do the entire stack. AI you MUST do the entire stack.

But the other problem is Google has a vision of creating the singularity. I do not think Nvidia has a similar vision.


I’ve just read this entire thread, and @smallnamespace has clearly won the debate.


Hi, author here. The motivation for this article came out of the HN discussion on a previous post (https://news.ycombinator.com/item?id=16447096). There was a lot of valuable feedback - thanks for that.

Happy to answer questions!


Don't TPUs get sustained use discounts? I know they're not preemptible. That would be comparable to AWS reserved instances.

EDIT: you don't get sustained use discounts, either, at the moment. You can get either for GCP GPUs, though. Perhaps that will change once TPUs are out of beta?


"As shown above, the top-1 accuracy after 90 epochs for the TPU implementation is 0.7% better. This may seem minor, but making improvements at this already very high level is extremely difficult and, depending on the application, such small improvements may make a big difference in the end."

Any idea of how much variation in accuracy you get on different training runs of the same model on the same hardware? My understanding is that model quality can and does vary from one run to the next on these kinds of large datasets - from a single observation, it's hard to know if the difference is real or noise.


I've been running a lot of these resnet-50 experiments lately and the run-to-run variation is very small, on the order of 0.1%. It's actually pretty amazing how consistent training is given that the initialization is always different and the data is sampled differently on each run. (As an aside, it took us about three weeks to track down a bug that was causing the model to consistently reach an accuracy 1% lower than it was supposed to.)


Indeed, that's also my experience. ImageNet is pretty huge (although 'it's the new MNIST') so that seems to help converging to very similar solutions and accuracies.

Tracking down bugs in convergence is really costly in these settings. We had a problem in pre-processing that took us quite a while to figure out...


AMD - Where does their hardware stand in the race for ML? What changes would AMD need to make to be competitive?


Their hardware is fine. Their software is starting to get good too now. They're finishing MIOpen, a set of CUDA compatible libraries with which you can use Tensorflow (TF uses the builtin CUDA libs too, not only CUDA itself, as does CNTK). ROCm provides a CUDA implementation for AMD systems.


Their hardware doesn't have the equivalent of a tensor core as far as I know, so they would be way behind on these benchmarks.


Nice work. I've only seen anecdotal stories about how TPU is faster, but never something as detailed as this.


I am not an ML guy, so I'm asking from a position of ignorance. (-:

But what's going on when some of the implementations of a standard algorithm don't converge, and different hardware has different accuracy rates on the same algorithm? Are DNNs really that flaky? And does it really make sense to be doing performance comparisons when the accuracy performance doesn't match?

Is the root problem that ResNet-50 works best with a smaller batch size?

And how do you do meaningful research into new DNNs if there's always an "Maybe if I ran it again over there I'd get better results" factor?

Thank you.


I found it interesting that they are so close together in performance - I mean what are the odds that they end up within 2% of each other?


The TPUs are doing almost 2x the images for the same cost.

That is not all that close is it?


Yeah, pretty big coincidence. However, this may change with the next TensorFlow versions, which supposedly has further speed improvements for the TPUv2.

Note also, that the ~2% performance difference is only on one model (ResNet-50) and cannot be generalized to all workloads/all of deep learning (at least not without further proof).


Do you have more information about this bit?

the TPU implementation applies very compute-intensive image pre-processing steps and actually sacrifices raw throughput

Thanks


In general, you try to keep the TPU/GPU busy 100%, so enough data needs to be readily accessible at any point in time. In this example, images needs to be read from disk, decoded, transformed (cropped, resized, normalized etc.) before they can be fed to the TPU. The transformations can be computationally intensive so they actually become a bottleneck.

In terms of how much compute power the TPU pre-processing needs I only have very rough numbers: I ran the same pre-processing while training ResNet-50 on a node with 4 GPUs and it was consistently utilizing >22 CPU cores (including all of the other CPU-tasks while training).


What about your LSTM-based model that didn’t converge in your earlier TPU benchmarks in February?


Slower alternative: "fastai with @pytorch on @awscloud is currently the fastest to train Imagenet on GPU, fastest on a single machine (faster than Intel-caffe on 64 machines!), and fastest on public infrastructure (faster than @TensorFlow on a TPU!) Big thanks to our students that helped with this." - https://twitter.com/jeremyphoward/status/988852083796291584


One machine with 8 V100 GPUs. If you consider one TPU pod a single machine the TPU is faster. Those numbers also show that 8 GPUs are slower than 8 TPUs (so same conclusion as the article)


An important hidden cost here is coding a model which can take advantage of mixed-precision training. It is not trivial: you have to empirically discover scaling factors for loss functions, at the very least.

It's great that there is now wider choice of (pre-trained?) models formulated for mixed-precision training.

When I was comparing Titan V (~V100) and 1080ti 5 months ago, I was only able to get 90% increase in forward-pass speed for Titan V (same batch-size), even with mixed-precision. And that was for an attention-heavy model, where I expected Titan V to show its best. Admittedly, I was able to use almost double the batch-size on Titan V, when doing mixed-precision. And Titan V draws half the power of 1080ti too :)

At the end my conclusion was: I am not a researcher, I am a practitioner - I want to do transfer learning or just use existing pre-trained models - without tweaking them. For that, tensor cores give no benefit.


Author here.

Yes, thanks for mentioning that! That's what the article is alluding to at the end. There's also something like a "cost-to-model" and that's influenced by how easy it is to make efficient use of the performance and how much tweaking it needs. It's also influenced by the framework you use... However, that's difficult to compare and almost impossible to measure.


How did you get your hands on Titan V 5 months ago? I still can't find it anywhere in retail in EU...


It was in stock on and off and I was able to order it directly from Nvidia US.

After 59 days of playing with it, I sent it back (initiated return on 30th day, after I already figured out it doesn't live up to the hype, then had another 30 days to actually send it back).

With $3,000 I can buy 4 1080ti's, while only two are necessary to beat Titan V (in Titan V's best game). I only bought one though. NowInStock.net helped with buying 1080ti directly from Nvidia.


Nvidia is currently in cashing out phase. They have monopoly and money flows in effortlessly. The cost performance ratio reflects this.

AMD will enter the game soon once they get their software working, Intel will follow.

I suspect that Nvidia will respond with its own specialized machine learning and inference chips to match the cost/performance ratio. As long as Nvidia can maintain high manufacturing volumes and small performance edge, they can still make good profits.


"The cost performance ratio reflects this."

But the TPUs are half the cost per this article?

Plus Google does the entire stack and can better optimize the hardware versus Nvidia. So it seem Google can improve faster I would think.

If there ever was a huge advantage doing the entire stack it is with neural networks.

A perfect example is Google new speech doing 16k samples a second with a NN.

https://cloudplatform.googleblog.com/2018/03/introducing-Clo...

Do not think Google could offer this service as a competitive cost without the TPUs.

This new method is replacing the method that was far less compute intensive so to offer at a competitive price requires lowering compute cost which suspect is only possible with the TPUs.


> But the TPUs are half the cost per this article?

Exactly. Nvidia can match the performance already without 100% specialized processor. It's the just the price they need to cut by optimizing their architecture for tensor processing and reducing their profits when competition emerges.

Google is not in the business of becoming a major chip maker or competing with Nvidia head on. Putting hundreds of millions into new microarchitecture every second year eats lots of resources. They just want competitive market and the prices to go down.


I'm not sure what you mean by google does the entire stack. Nvidia writes all of the major CUDA libraries used behind the scenes in the NN libraries, such as cuDNN, cuBLAS, etc. Nvidia can likely improve their hardware significantly faster/more efficiently than Google can because their entire business depends on it. Google has incentive for improving their TPU for internal use, but they don't make any money by selling TPU time on GCP yet.


> I'm not sure what you mean by google does the entire stack.

Consider that Google has some of the best machine learning researchers, compiler engineers, hardware engineers, and infrastructure in the business working on this.


Huh? Machine learning and infrastructure Engineers, yes. Compiler and Hardware engineers? No. What gives you reason to believe they have a lead in either of those departments other than they have a lot of money? They're forced to use the same foundry as Nvidia, and their Hardware team is likely significantly smaller.


Google been buying up AI resources well before anyone else and has the strongest and deepest team at this point.

It is why so many of the break throughs have come from Google. Great example is winning at Go almost a decade earlier than anyone thought possible.

They probably two of the strongest teams with one the Brain team and then the Deepmind team. But all the other engineers and infrastructure is first rate at Google.

Really at this point do not think the $100B cash is as important as Google already built the team and now experinced resources are far more difficult to get.

The other advantage for Google is their ability to attract the top engineers in addition.

Google just got started a lot earlier on all of this.


Google got started a lot earlier on this? Did you read what you are saying? Nvidia has been making hardware longer than Google has been a company. No, Google does not have a better hardware team. Google has the luxury of making a device that is used for a single purpose that they control. Nvidia made a device that can be used for far more and works on commodity hardware. By the way, deepmind/alphago uses Nvidia GPUs, so that was an extremely bad example.


BTW,. Deepmind now uses TPUs both for training and inference and with the results we can see why.

https://www.theverge.com/circuitbreaker/2016/5/19/11716818/g... Google reveals the mysterious custom hardware that powers AlphaGo


Hardware optimize for NN. Nvidia dominate focus had been graphics. Big difference which we can see the results in this article.

Plus benefits not having the baggage that Nvidia would have.

But never going to be able to use a TPU for graphics.

In the end it is about results.


Tensor cores are hardware optimized for NN. You call it baggage, Nvidia calls it extra revenue. Because some people need double precision, and those people are willing to pay a lot of money. So the V100 continues to be the cheapest way to train and do inference on NN because you can actually amortize the server cost over time. With tpu, you pay the hourly price forever. TPU are better only in the case of NN jobs that are short in length or you don't have the capital to buy a server. Anything longer, you can buy a Titan v and come out far ahead.

By the way, the Tesla cards have no graphics output, so I'm sure why you'd say they have graphics baggage.


The problem for Nvidia is they do NOT do the entire stack. So Google has the ability to better optimize and here we are seeing those results as using TPUs is about 1/2 the price of Nvidia hardware.

Baggage is a company thing. Google really has been an AI company since in the late 90s when Larry Page was asked about using AI to improve search and he replied he was using search to make AI happen.

Ha! When you amortize you are still spending money and you saying this really bothers me and is such a problem.

Too many look at things like you do and why companies get into problems. Capitalizing is not magic.

BTW, Google is also going to be able to iterate much quicker as the AI breakthroughs happen and come out with new versions that should stay well ahead of Nvidia.

The dynamics of the chip business have changed. Use to be companies bought chips from someone and then put them in to servers and sold the servers.

The problem is the company making the chips are NOT running the chips and do not have any skin in the game or the data needed to improve.

Now we have companies like Google making the chips and also running the chips and why we see power footprint being the focus far more than the past.

We will see all the big operations including Amazon make their own chips more and more.

A perfect example if Capsule networks replacing some uses of CNNs. Google with Hinton developed the Capsule network approach and will be supporting it far faster then you will see from Nvidia.

Then there is the canonical framework for AI being TF.

All of this was theoretical advantageous for Google and now we get to see they appear to be real with the pricing of the TPUs being about half of the cost of using Nvidia.


You still haven't given a single example of what you mean by "doing the entire stack". I'm assuming that's because you don't have one?

You seemed to have completely missed why Nvidia's stock has gone up 17x in 4 years while google only 3x. The dynamics of the chip business have not changed; you are focusing on a single market, DNN, which is a small piece of the entire science/engineering community. Google made a chip that accelerates DNN. They also chose not to make an API to use that hardware with outside TF. So if you could buy a tpu and put it in your own server, it would beat the V100 in performance/watt. You can't do that, so nvidia wins, because I can buy a V100, and in 51 days the price I bought it for ($8K) has already been burned through in GCP. If you need me to do the math to help you realize that now your only recurring cost on the v100 (power) is more than 100x less than the TPU, I can do that for you. But hopefully you understand now that the TPU is for a niche market outside of google, and it will never be a large source of revenue for them at $6.50/hour.

TF is not exclusive to google. Nvidia has engineers working on TF.

Your capsule example is again extremely poor. You think google can respin an asic quicker than nvidia? Not only does history say the exact opposite, but they both use TSMC.


> Nvidia's stock has gone up 17x in 4 years while google only 3x

Not sure the market cap or the P/E are apples to apples there.

Also:

> https://www.cnbc.com/2018/02/23/secretive-chinese-bitcoin-mi...


Not sure why I could not reply to your post so will reply here.

Find the questioning on the entire stack just baffling with Google.

People - Google has the strongest team of AI experts in the industry by a wide margin. At NIPS this year Google had more papers excepted than anyone else. The big AI breakthroughs come from Google. They solved Go a decade earlier than anyone thought possible. Would put FB #2 with AI experts but a very distant #2.

Google miles ahead with SDC.

Plus Google is able to attract the top talent better than anyone else.

https://unsupervisedmethods.com/nips-accepted-papers-stats-2...

Applications - Search, Photos, Speech, AlphaZero, Self Driving Cars, Google now has over 4k NN in production. Nobody else even in the ball park. Hand down the leader in applications.

Infrastructure - Tensor Flow now has 98k stars on GitHub. It is the canonical AI framework in the industry and really nothing else close. CNTK is #2 with 14k stars. But Google ads about 7x per days stars.

https://github.com/tensorflow/tensorflow

Then there is Google cloud infrastructure and their other engineering talents which are well ahead of anyone.

I can go on but this is so incredibly silly. There is little question that Google is leading at every layer of the AI stack by a wide margin.

This is so silly I suspect something else going on here. We do not seem to be discussing things based on reality.

Is this about Damore?

BTW, Nvidia can only spin up what they know about. Google does not share everything but luckily they do a lot for Nvidia.

In 2018 you just have to run the infrastructure to be long term viable in the chip game.


Nobody is arguing Google is the best at AI. You are arguing that translates to Google is the best at making chips. They aren't, and there's no evidence they are. Tensorflow is open source, and Nvidia contributes. Would you be willing to place a bet that more people run tensorflow on tpu or GPU?

Edit: and there are FAR more cuda users in general than tensorflow if you're trying to compare apples and oranges.


Well we have a data point that suggests they created a better chip. But just makes sense as they do the entire stack and that gives you the info they need to build a better chip.

Look a capsule networks and dynamic routing. That potentially drives a different architecture and Google has thousands of production models to use to optimize that Nvidia just does not have.

Plus it is one company so no IP issues.

But the biggy is we can see half the cost.


No, you have a data point that says they created a chip that performed better on a single test for a single domain of work. You also have a data point that says tpu can't do any 64-bit simulations. It's not half the cost. See previous comment. It's about 100x the cost after 51 days.


Find the questioning on the entire stack just baffling with Google.

People - Google has the strongest team of AI experts in the industry by a wide margin. At NIPS this year Google had more papers excepted than anyone else. The big AI breakthroughs come from Google. They solved Go a decade earlier than anyone thought possible. Would put FB #2 with AI experts but a very distant #2.

Google miles ahead with SDC.

Plus Google is able to attract the top talent better than anyone else.

https://unsupervisedmethods.com/nips-accepted-papers-stats-2....

Applications - Search, Photos, Speech, AlphaZero, Self Driving Cars, Google now has over 4k NN in production. Nobody else even in the ball park. Hand down the leader in applications.

Infrastructure - Tensor Flow now has 98k stars on GitHub. It is the canonical AI framework in the industry and really nothing else close. CNTK is #2 with 14k stars. But Google ads about 7x per days stars.

https://github.com/tensorflow/tensorflow

Then there is Google cloud infrastructure and their other engineering talents which are well ahead of anyone.

I can go on but this is so incredibly silly. There is little question that Google is leading at every layer of the AI stack by a wide margin.

This is so silly I suspect something else going on here. We do not seem to be discussing things based on reality.

Is this about Damore?

BTW, Nvidia can only spin up what they know about. Google does not share everything but luckily they do a lot for Nvidia.

In 2018 you just have to run the infrastructure to be long term viable in the chip game.


Well do you have any test or seen any that indicates otherwise?

What we have here is same work and half the cost. Which makes sense as Google does all the layers of the stack and is just has a fundmental advantage to better optimize.

AI it is even more important.


Also have no idea what 100x the cost refers to. We can see half the cost.


This isn't worth going on about anymore. If you can't the cost of running a GPU in your own server versus the TPU in the cloud, then I'm not going to help you.


The last thing you want is to buy your own hardware. You use the cloud as you then actually use what you need.

Google creating their own silicon which is far cheaper for them to run gets you 1/2 the cost of using Nvidia. Seems like an easy decision.

Help me? Seems like you are trying to hurt me? Just curious what drives it? Does not seem rational but driven my some emotion?


Google does the applications at scale and then each layer below and a big one is TF. A great example is the recent release of the new text to speech using NN.


When you use a Google service that uses the TPUs they are indirectly selling the TPUs.


>For GPUs, there are further interesting options to consider next to buying. For example, Cirrascale offers monthly rentals of a server with four V100 GPUs for around $7.5k (~$10.3 per hour). However, further benchmarks are required to allow a direct comparison since the hardware differs from that on AWS (type of CPU, memory, NVLink support etc.).

Can't you just buy some 1080s for cheaper than this. I understand there is electricity and hosting costs, but cloud computing seems expensive compared to buying equipment.


Yes, you can. The problem starts when "you" are a large company -- NVidia restricts "datacenter" use of consumer GPUs (see previous HN discussion of that one: https://news.ycombinator.com/item?id=15983587 ). A single Titan V is somewhere in the 90% range of a V100 at less than 1/3 the cost, and a 1080ti, if you can find one, likely offers a slightly better price/performance spot. 4-GPU training may suffer due the lack of NVlink, but not enough for it to matter too much. As you scale, though, the lack of NVlink will hurt more. And, of course, all of these things come with a capex vs opex tradeoff, and a sysadmin vs cloud tradeoff, that will appeal differently to different situations.


With a mining exception for some reason, and their drivers blocking themselves when running in a virtualized environment unless you do some hacks.


The new "datacenter" restriction only applies to GeForce branded cards. The Titan V is now called the "NVIDIA Titan V" and with no GeForce branding to be found anywhere.

So the restriction applies to the 1080ti but _not_ the titan V. I completely agree the restriction is total bullshit but it's important to get the facts straight.


Not according to the statement from NVidia quoted in this article: https://www.cnbc.com/2017/12/27/nvidia-limits-data-center-us...

It applies to both GeForce and Titan.


You're right - it seems like they have added "Titan" to the agreement since it was first posted on HN:

http://www.nvidia.com/content/DriverDownload-March2009/licen...

Thanks for the tip!


Hire people to buy 1080 in retail. This problem is solvable easily.


It's not about getting the cards (though supplies are limited because of cryptocurrency mining, but you could buy Titan V's off the shelf in batches of 2). It's about whether or not you're big enough of a target for Nvidia's lawyers if you violate the agreement and actually build a datacenter out with them.


It's hard to find 1080[ti]+ in retail. Whenever they become available they sell out pretty quickly.


Probably not the best phrasing in the post ("next to buying"). It's only comparing cloud pricing (since the TPUv2 is only available there). If you consider buying hardware the situation is different as you correctly point out.


1080s don't have the "tensor cores" of V100, or NVLink, so they will not get anywhere near the same performance on this benchmark.


Excellent! Thanks for these numbers, I wanted to see exactly this kind of benchmarks! Do you plan to try different benchmarks with the same setup for different problems, like semantic segmentation, DenseNet, LSTM training performance etc. as well?


Happy to hear the benchmark is useful to you! We'd love to try different setups and further models/networks. On the other hand, such benchmarks are a LOT of effort (which we underestimated it initially), so we'll have to see.


Excellent work. Do you have plans to open source the scripts/implementation details used to reproduce the results? Would be great if others can also validate and repeat the experiment for future software updates (e.g. TensorFlow 1.8) as I expect there will be some performance gain for both TPU and GPU by CUDA and TensorFlow optimizations.

Sidenote: Love the illustrations that accompany most of your blog posts, are they drawn by an in-house artist/designer?


Happy you like the post! The implementations we used are open source (we reference the specific revisions), so reproducing results is possible right now. We haven't thought about publishing our small scripts around that (there's not much to it), but it's a good idea. There's also work towards benchmarking suites like DAWNBench (https://dawn.cs.stanford.edu/benchmark/).

The illustrations are from an artist/designer we contract from time to time. I agree, his work is awesome!


> The illustrations are from an artist/designer we contract from time to time. I agree, his work is awesome!

Kudos to them; they are awesome!


What they're not saying is that one can't use all nvlink bandwidth for gradient reduction on a DGX-1V with only 4 GPUs because nvlink is composed of 2 8-node rings. And given the data parallel nature of this benchmark, I'm very interested in where time was spent on each architecture.

That said, they fixed this on NVSwitch so it's just another HW hiccup like int8 was on Pascal.


For this benchmark, NVLink and gradient reduction isn't the bottleneck. The performance scales almost perfectly linearly from one GPU to four.


Thanks for this, just a minor thing:

You have price per hour and performance per second. Thus that ratio is not performance per image per $, you need to scale that. Also, the metric is not "images per second per $", but just "images per $".


Thanks for catching this!


How much detail do we know about the TPUs' design? Does Google disclose a block-diagram level? ISA details? Do they release a toolchain for low-level programming or only higher-level functions like TensorFlow?

EDIT: I found [1] which describes "tensor cores", "vector/matrix units" and HBM interfaces. The design sounds similar in concept to GPUs. Maybe they don't have or need interpolation hw or other GPU features?

[1] https://cloud.google.com/tpu/docs/system-architecture


Great paper on the Generation 1 TPU. But Google has not shared much details on gen 2 and in some ways kind of hid information.

Suspect we will need a gen 3 to get a paper on the gen 2.

Here is the gen 1 paper and highly recommend. Pretty interesting using 65536 very simple cores.

https://arxiv.org/ftp/arxiv/papers/1704/1704.04760.pdf


So far only very few details are disclosed. Here are two presentations:

https://supercomputersfordl2017.github.io/Presentations/Imag... http://learningsys.org/nips17/assets/slides/dean-nips17.pdf

For the last version of the TPU, Google provided more detail, e.g., in this paper:

https://arxiv.org/pdf/1704.04760.pdf

Hopefully, Google will publish something similar for TPUv2, but I have no knowledge whether or when that might happen.


> Maybe they don't have or need interpolation hw or other GPU features?

Definitely, no need to do any kind of rasterization here.


Great work, RiseML. This benchmark is sincerely appreciated.

I wonder whether NVLink would make any difference for Resnet-50. Does anyone know whether these implementations require any inter-GPU communication?


They don't require it but some of the ResNet-50 implementations can make use of it (e.g., the ones in the Docker containers on the Nvidia GPU Cloud). But even the ones without seem to scale to 4 GPUs pretty well. This may be a different story for 8 GPUs and larger/deeper networks, e.g., ResNet-152.


Was this running the AWS Deep Learning AMI or did you build your own.

Because Intel was involved in its development and made a number of tweaks to improve performance.

Be curious if it actually was significant or not.


On AWS this was using nvidia-docker with the TensorFlow Docker images. Probably, the AWS AMI Deep Learning gives very similar performance (with same versions of CUDA, TensorFlow etc.). There's only so much you can tweak if the GPU itself is the bottleneck...


>For the V100 experiments, we used a p3.8xlarge instance (Xeon E5–2686@2.30GHz 16 cores, 244 GB memory, Ubuntu 16.04) on AWS with four V100 GPUs (16 GB of memory each). For the TPU experiments, we used a small n1-standard-4 instance as host (Xeon@2.3GHz two cores, 15 GB memory, Debian 9) for which we provisioned a Cloud TPU (v2–8) consisting of four TPUv2 chips (16 GB of memory each).

A bit odd that the TPUs are provisioned on such a weaker machine compared to the V100s, especially when there were comparisons which included augmentation and other processing outside of the TPU.


All of the computation, including pre-processing, is offloaded to the TPU. The weak machine is really just idling. A bigger one will only cost money and have no measurable effect on the performance.


What is the cost difference between the CPUs on the google cloud vs AWS? How would adjusting for it effect the cost/images ratio?


This is why my previous comment mentioned that GCP is a better benchmark for this since you can select the number of CPUs to match with the GPUs to some extent. You can get a rough idea of the savings by looking at their P100 instances.


The TPU is not really just the chip. It has an actual machine that is provisioned behind the scenes and accepts RPC calls. Good luck finding out its specs. All you're supposed to care about are the address and port it answers at.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: