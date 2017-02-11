Meanwhile, the major upcoming AI processors are taking two directions: larger numbers of simplified GPU-type cores (such as NVIDIA Xavier and Intel's Lake Crest/Nervana chips), and FPGAs.
Simplifying GPU cores means lower precision as fp32 and fp64 are overkill for neural networks and take up lots of silicon. The current NVIDIA Pascal added fp16 and byte operations like the DP4A convolution instruction[1]. Even smaller precision is practical (down to 1 bit with XNORnet[2], and the DoReFa paper[3] gives an excellent summary of the falloff in accuracy as you go to 8-4-2-1 bits for weights, activations, and gradients).
[1] https://devblogs.nvidia.com/parallelforall/mixed-precision-p...
[2] XNORnet, https://arxiv.org/abs/1603.05279
[3] DoReFa, https://arxiv.org/abs/1606.06160
