Hacker News new | comments | show | ask | jobs | submit login

Disclosure: I work on Google Cloud.

I want to highlight this paragraph from the post:

> Here at Google Cloud, we want to provide customers with the best cloud for every ML workload and will offer a variety of high-performance CPUs (including Intel Skylake) and GPUs (including NVIDIA’s Tesla V100) alongside Cloud TPUs.

We fundamentally want Google Cloud to be the best place to do computing. That includes AI/ML and so you’ll see us both invest in our own hardware, as well as provide the latest CPUs, GPUs, and so on. Don’t take this announcement as “Google is going to start excluding GPUs”, but rather that we’re adding an option that we’ve found internally to be an excellent balance of time-to-trained-model and cost. We’re still happily buying GPUs to offer to our Cloud customers, and as I said elsewhere the V100 is a great chip. All of this competition in hardware is great for folks who want to see ML progress in the years to come.

Any plans to support AMD GPUs and the Radeon Open Compute project? The AI/ML community really needs viable alternatives to NVIDIA, otherwise they will continue to flex pricing power. Google, via TensorFlow, is in a phenomenal position to promote open source alternatives to the proprietary Deep Learning software ecosystem that we see today with CUDA/CuDNN.

Google would happily accept patches to enable support for it.

AMD hopefully has a team writing such patches now. It makes business sense for them to do so.

Google is getting even more price gouging from Nvidia than the general public, and has even more incentive to level the playing field.

Or the opposite - they're getting nice savings in return for not actively developing or encouraging CUDA/cuDNN alternatives.

Did you guys ever reveal the internal math model of TPU 2?

We know V100 is FP16/FP32 on their tensor cores, when will you follow suit?

Edit: sort of, from https://www.theregister.co.uk/2017/12/14/google_tpu2_specs_i...

"32-bit floating-point precision math units for scalars and vectors, and 32-bit floating-point-precision matrix multiplication units with reduced precision for multipliers."

So what does "reduced" mean exactly?

We still don’t document it exactly, but [1] shows that bfloat16 is supported on lots of ops.

[1] https://cloud.google.com/tpu/docs/tensorflow-ops

That doesn’t prove that the chip operates at 16 bits. For example, we could do 18-bit multipliers (or anything >= 16) and still use 16-bit floats.

ATI demonstrated FP24 was frickin' awesome over a decade and a half ago. it wouldn't surprise me in the least if you went somewhere like that, but it perplexes me as to why you think that's secret sauce in any way long after ATI nearly destroyed NVIDIA with FP24 back in the early days of DirectX 9 and NV3x.

This isn’t exactly correct. ATI pulled a “fast one” and went with 24bit despite the initial DX9 spec called for 16/32 bit floats which NVIDIA followed.

Once DX9 was split into DX9b and c that “advantage” went away and NVIDIA proved that 16/32 bit was better, something that ATI also had to adopt once MSFT told them enough is enough.

24bit is only better as long as it can do everything 32bit can do and it’s advantageous to build a hardware with 24bit FPUs instead of 32bit FPUs that can also do 2x16bit ops per cycle.

Basically if the silicon cost allow you to put far more 24bit FPUs than 32/16bit ones.

And history proved that this isn’t the case.

For gaming eventually even 2:1 FPUs went away since they are costlier than only 32bit FPUs with promotion.

Maybe in the future we’ll have a 24bit FPU that can also do 3 8bit ops or 16bit+8bit op per cycle if it will be more beneficial than the current 2:1 16/32bit model.

I personally would stick to FP32 across the board for my ML efforts, but we have an entire cottage industry of people coming up with approximations to drive up perf and perf/W, all of which will prove irrelevant until Moore's Law runs out IMO. And even then, I'll still stick to FP32 personally. Speaking from direct experience, bulletproof mixed precision is tough.

I don't think it is secret sauce. If you're gonna let customers send operations to these TPU's, one could figure out what kind of multiplier is used almost immediately upon inspection of a few inputs and outputs.

>We fundamentally want Google Cloud to be the best place to do computing.

Lower. Network. Egress. Pricing. By. Two. Orders. Of. Magnitude.

Market rate is close to $1 per TB outbound. Your rate is $80-$120 per TB. That's just embarrassing.

> high-performance CPUs (including Intel Skylake)

Any plans for ryzen?

We’re always exploring the best hardware for the dollar. We’re a founding member of OpenPOWER and to your question about AMD parts, we’ve previously (publicly) run Opterons when they were the best choice. At this time, we don’t have any announcements to make :).

But I’d like to note that even if we were to use parts internally at Google (or not!), that for Cloud what matters is market demand. If there really was enormous customer demand for say ARM64, then we would look into it, even if the rest of Google wasn’t interested.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact