I want to highlight this paragraph from the post:
> Here at Google Cloud, we want to provide customers with the best cloud for every ML workload and will offer a variety of high-performance CPUs (including Intel Skylake) and GPUs (including NVIDIA’s Tesla V100) alongside Cloud TPUs.
We fundamentally want Google Cloud to be the best place to do computing. That includes AI/ML and so you’ll see us both invest in our own hardware, as well as provide the latest CPUs, GPUs, and so on. Don’t take this announcement as “Google is going to start excluding GPUs”, but rather that we’re adding an option that we’ve found internally to be an excellent balance of time-to-trained-model and cost. We’re still happily buying GPUs to offer to our Cloud customers, and as I said elsewhere the V100 is a great chip. All of this competition in hardware is great for folks who want to see ML progress in the years to come.
AMD hopefully has a team writing such patches now. It makes business sense for them to do so.
Google is getting even more price gouging from Nvidia than the general public, and has even more incentive to level the playing field.
We know V100 is FP16/FP32 on their tensor cores, when will you follow suit?
Edit: sort of, from https://www.theregister.co.uk/2017/12/14/google_tpu2_specs_i...
"32-bit floating-point precision math units for scalars and vectors, and 32-bit floating-point-precision matrix multiplication units with reduced precision for multipliers."
So what does "reduced" mean exactly?
Once DX9 was split into DX9b and c that “advantage” went away and NVIDIA proved that 16/32 bit was better, something that ATI also had to adopt once MSFT told them enough is enough.
24bit is only better as long as it can do everything 32bit can do and it’s advantageous to build a hardware with 24bit FPUs instead of 32bit FPUs that can also do 2x16bit ops per cycle.
Basically if the silicon cost allow you to put far more 24bit FPUs than 32/16bit ones.
And history proved that this isn’t the case.
For gaming eventually even 2:1 FPUs went away since they are costlier than only 32bit FPUs with promotion.
Maybe in the future we’ll have a 24bit FPU that can also do 3 8bit ops or 16bit+8bit op per cycle if it will be more beneficial than the current 2:1 16/32bit model.
Lower. Network. Egress. Pricing. By. Two. Orders. Of. Magnitude.
Market rate is close to $1 per TB outbound. Your rate is $80-$120 per TB. That's just embarrassing.
Any plans for ryzen?
But I’d like to note that even if we were to use parts internally at Google (or not!), that for Cloud what matters is market demand. If there really was enormous customer demand for say ARM64, then we would look into it, even if the rest of Google wasn’t interested.