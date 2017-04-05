* They actually deployed it in 2015, they're probably already hard at work on a new version!
* The TPU only operates on 8-bit integers (and 16-bit at half speed), whereas CPU/GPUs are 32-bit floating point. They point this out in the discussion section.
* Used via TensorFlow.
* They don't really break out hardware vs hardware for each model, it seems like the TPU suffers a lot whenever there's a really large number of weights and layers that it must handle - but they don't break out the performance on each model individually, so it's hard to see whether the TPU offers an advantage over the GPU for arbitrary networks.
[1] https://drive.google.com/file/d/0Bx4hafXDDq2EMzRNcy1vSUxtcEk...
