
In-Datacenter Performance Analysis of a Tensor Processing Unit​ - lorenzhs
https://drive.google.com/file/d/0Bx4hafXDDq2EMzRNcy1vSUxtcEk/view
======
lorenzhs
Figure 2 shows the floor plan; there's a large 256x256 8-bit matrix
multiplication unit taking up nearly 1/4 of the die area, and the rest of the
chip seems designed to feed it as quickly as possible. It operates at 700MHz
and can do a vector-matrix multiplication (1x256 times 256x256) in a single
cycle. Neat!

It's a very simple CISC, so there's not a lot of control logic needed (only 2%
of die area).

Overall this is very clearly designed for Tensorflow.

