
Emergent Chip Vastly Accelerates Deep Neural Networks - dharma1
http://www.nextplatform.com/2015/12/08/emergent-chip-vastly-accelerates-deep-neural-networks/
======
dharma1
Here's the paper -
[http://arxiv.org/pdf/1510.00149.pdf](http://arxiv.org/pdf/1510.00149.pdf)

The compressed network achieves a decent speedup and energy saving on current
hardware too (desktop/mobile) without significant loss of accuracy

~~~
nl
Is this the correct paper? It seems to cover the (impressive!) speedups from
compression and encoding, but nothing about the EIE chip mentioned.

~~~
dharma1
Yep it only covers the compression/pruning. Here is a talk by the author,
covering the chip too
-[http://web.stanford.edu/class/ee380/Abstracts/160106.html](http://web.stanford.edu/class/ee380/Abstracts/160106.html)

------
jph
Summary: The chip's impressive speed gains are because it uses on-chip SRAM
instead of off-chip DRAM.

The chip is able to fit much more into SRAM because the chip uses network
compression, pruning, quantization, Huffman encoding, etc.

~~~
6502nerdface
Also, the on-board RAM is SRAM rather than DRAM.

~~~
fpgaminer
Where does it say that? On-board SRAM would be very strange, as it's
incredibly expensive and not at all performant compared to DRAM.

~~~
nl
From the paper's abstract: _This allows fitting the model into on-chip SRAM
cache rather than off-chip DRAM memory_

~~~
fpgaminer
Exactly, that says "on-chip SRAM" not "on-board SRAM". Very different things,
and my point stands.

~~~
nl
Pretty sure from the context the OP was saying on board the chip.

------
jerf
Is the embedded case really that interesting? Almost by definition, the
embedded device will be receiving a tiny fraction of the data in the world
that it may be concerned about. It seems unlikely to me that an embedded,
power-constrained device is going to "deep learn" anything all that useful
that wouldn't be better learned in something with more data and power
available. But I do mean this as a question, if anybody's got a really cool
use case in hand. (Please something more specific than text that boils down to
"something something sensor network internet of things local conditions
something".)

~~~
jessep
This isn't for training, is it? It's for using the results of training
immediately (inferring something), without the need for a network round trip
(as far as I understand it).

So, you might still send the request to the network to continue training the
model, but by the time you do, your answer has already computed on the local
machine for local consumption.

~~~
jerf
Thank you, everybody, that has helped me understand. Obvious in hindsight but
isn't that the way of these things.

------
nshm
Those interested in optimal neural network compression might consider the
paper "Bitwise Neural Networks" by Kim and Smaragdis
[http://paris.cs.illinois.edu/pubs/minje-
icmlw2015.pdf](http://paris.cs.illinois.edu/pubs/minje-icmlw2015.pdf) which
enables much better compression than simple quantization and pruning.

~~~
nharada
How do you mean "much better compression"? Won't replacing 32bit multiplies by
bitwise operations save 32x the memory[1]? Han et al. show not only 35-49x
improvement, but on much more difficult benchmarks (MNIST vs Alexnet/VGG).

Combining these two techniques would be really cool and if the bitwise network
can work with larger, more complex networks like VGG would be a massive game-
changer, allowing these nets to fit on almost any device.

[1] [http://minjekim.com/demo_bnn.html](http://minjekim.com/demo_bnn.html)

------
jimbokun
In the past, specialized architectures tended to lose out to the performance
gains of generic x86 chips. I'm thinking RISC vs. CISC, or Sun's attempt at a
custom Java-optimized processor.

Is that no longer the case? ARM architectures seem to be beating x86 for low
power, mobile devices. GPUs are being used for many easily parallelizable
workloads.

Is an overall slow down in Moore's law making chips designed for specific
tasks (like Deep Neural Nets), attractive again?

~~~
emcq
I'm not sure CISC beat RISC, except for complicating the instruction set
enough to keep costs high for competitors.

Underneath, Intel processors translate x86 to simpler RISC like microcode.
They could make a more efficient chip without this translation component, and
probably holds them back a bit in low power stuff.

~~~
cobaltblue
It's more like x86 beat everything, which is also saying that externally
CISC/RISC didn't matter so much. I like the section on RISC here:
[http://danluu.com/butler-lampson-1999/](http://danluu.com/butler-
lampson-1999/) Specifically part of the last paragraph: "It’s possible to
nitpick RISC being a no by saying that modern processors translate x86 ops
into RISC micro-ops internally, but if you listened to talk at the time,
people thought that having a external RISC ISA would be so much lower overhead
that RISC would win, which has clearly not happened. Moreover, modern chips
also do micro-op fusion in order to fuse operations into decidedly un-RISC-y
operations."

------
melted
Mark my words, in the next couple of years we'll see custom silicon that
massively improves performance per watt for DNNs by using fixed point,
quantization, and saturation arithmetic. The gain in performance per watt will
be at least an order of magnitude. This will make DNNs worthwhile in a lot
more classification problems where currently they are simply too slow.

~~~
nshm
Mark my words, DNNs are not really the most efficient structure for predicting
model due to the "distributed" representation that makes them good predictors,
but also makes them hard to train and resource-consuming to apply. In few
years DNNs will be replaced by more efficient models.

~~~
melted
You do realize you came up with this text using a deep biological neural
network, right?

~~~
nshm
Sure, but they aren't necessary the most efficient model possible

~~~
melted
Seems pretty efficient to me, at least per watt.

------
zhyan7109
This will be a game changer a few years down the line! Can't wait to see the
commercial SW stack complimenting this.

------
bbhill
These guys need to work on their dataviz. Could have made some of the tables
into graphs, really.

