
Intel Gets Serious About Neuromorphic, Cognitive Computing Future - Katydid
https://www.nextplatform.com/2017/02/11/intel-gets-serious-neuromorphic-cognitive-computing-future/
======
paulsutter
The article misses the core issue: neural network architectures are still in
flux. "Neuromorphic" chips are hardwired to one architecture, which makes them
power efficient but less flexible. When designs are more stable, such chips
could be more practical.

Meanwhile, upcoming (non-neuromorphic) AI processors are taking two
directions: larger numbers of simplified GPU-type cores (such as NVIDIA Xavier
and Intel's Lake Crest/Nervana chips), and FPGAs.

Simplifying cores means lower precision, as fp32 and fp64 are overkill for
neural networks and take up lots of silicon. The current NVIDIA Pascal added
fp16 and byte operations such as the DP4A convolution instruction[1]. Even
smaller precision is practical (down to 1 bit with XNORnet[2], and the DoReFa
paper[3] gives an excellent summary of the falloff in accuracy through
32-8-4-2-1 bits for weights, activations, and gradients).

[1] [https://devblogs.nvidia.com/parallelforall/mixed-
precision-p...](https://devblogs.nvidia.com/parallelforall/mixed-precision-
programming-cuda-8/)

[2] XNORnet,
[https://arxiv.org/abs/1603.05279](https://arxiv.org/abs/1603.05279)

[3] DoReFa,
[https://arxiv.org/abs/1606.06160](https://arxiv.org/abs/1606.06160)

~~~
deepnotderp
This is a common fallacy, although the architectures themselves may be
changing (quite a bit!), the basic computationally intensive operations, such
as convolutions and matrix multiplies aren't. The simple way is to switch from
FP32 to 16-bit fixed point, and you're good to go, and you just saved almost
10X power. This is the strategy Nervana/Intel, even Nvidia, and other startups
such as Wave are pursuing.

~~~
paulsutter
Exactly - evolving network architectures suggest using a non-neuromorphic
design (such as Xavier, Lake Crest), which trades less precision for more
cores given the same power/real estate.

Neuromorphic designs like IBM's True North are more hardwired, and that's the
limitation towards general purpose use. Yann LeCun's remarks on True North:

[https://www.facebook.com/yann.lecun/posts/10152184295832143](https://www.facebook.com/yann.lecun/posts/10152184295832143)

~~~
emcq
The reality is that TrueNorth can use CNNs and be trained with backprop to
achieve good performance with networks like ImageNet despite LeCun's comments
[0, 1]

LeCun's primary critique is that binary wont work; "to get good results on a
task like ImageNet you need about 8 bit of precision on the neuron states".
There was no evidence for his claim then, and now this is clearly false [2,
3].

LeCun's post is based more in pride than reason; he spends most of the time
talking about NeuFlow which at one point was a competitor to True North for
funding. In the end, NeuFlow never became a chip, but True North did.

[0] [https://papers.nips.cc/paper/5862-backpropagation-for-
energy...](https://papers.nips.cc/paper/5862-backpropagation-for-energy-
efficient-neuromorphic-computing)

[1]
[https://arxiv.org/pdf/1603.08270.pdf](https://arxiv.org/pdf/1603.08270.pdf)

[2] [https://arxiv.org/abs/1603.05279](https://arxiv.org/abs/1603.05279)

[3] [https://arxiv.org/abs/1602.02830](https://arxiv.org/abs/1602.02830)

~~~
deepnotderp
Yeah, this all sounds nice in practice, but look at the actual numbers:
[https://arxiv.org/abs/1603.08270](https://arxiv.org/abs/1603.08270)

1) Nothing on imagenet

2) They already fall to 83% accuracy on CIFAR-10! Imagine how bad imagenet
will be! If they string many chips together (exploding their power
consumption, since here comes the Von Neumann Bottleneck of data movement),
they get a paltry 89%.....

Meanwhile even squeezenet achieves better results.

~~~
emcq
I'm sure today we could create an _even better_ chip focused on the
advancements in neural networks and chip design, but it's awesome that a chip
from 2012 can still take us so far! AlexNet was just a baby then. I doubt any
CPU, GPU, FPGA, or DSP from 2012 would hold up as well.

I dont think you understand their architecture, and neither did LeCun. The Von
Neumann Bottleneck is a specific term referring to limited throughput between
data in memory and compute in the CPU. TrueNorth is not a Von Neumann
architecture and does not have this bottleneck. Memory is located adjacent to
compute elements in True North. For comparison, GPU's have very tiny amounts
of on chip memory, and have to spend lots of energy copying data back and
forth to off chip memory, which is why they are investing heavily in
approaches like HBM. FPGAs also dont have as much memory as an ASIC because
they need to dedicate space to reprogrammable logic, integrated ARM
cores/DSPs, etc.

The chips can be laid out in flexible topologies such as a grid. While it's
true that communication between chips is more power intensive than within a
chip, this cost is only occurred for the relatively small amount of traffic
sent between chips versus computed locally. Hierarchy and small world nature
of neural networks can mean that there is more local computation than you
would expect naively, and a grid can mean a spike routed from one core to the
furthest core would be O(sqrt(N)) instead of O(N).

~~~
deepnotderp
...

Pretty sure LeCun and I understand what the Von Neumann Bottleneck is thank
you very much.

The thing is though that TrueNorth isn't doing anything special by pouring a
ton of memory on die, and even in GPUs, CNN runtimes and energy consumption is
dominated by compute.

~~~
emcq
I appreciate healthy discussion of technical topics. However, I'm not sure
you're having this discussion in good faith. I wrote this response in case you
are.

LeCun never said anything about the Von Neumann Bottleneck. TrueNorth is not a
Von Neumann architecture; it does not have a memory bus; it does not have the
Von Neumann Bottleneck [0,2,3]. From wikipedia [1]:

"TrueNorth circumvents the von-Neumann-architecture bottlenecks and is very
energy-efficient, consuming 70 milliwatts, about 1/10,000th the power density
of conventional microprocessors"

If you disagree, please explain how you think the Von Neumann Bottleneck
applies here.

With regards to energy consumption keep in mind the smallest GPUs (TX1) are
~10W, typical FGPAs ~1W, versus 70mW for TrueNorth! It's popular to hate on
TrueNorth but you could throw 10 of them together and still be fantastically
more efficient than anything else today - that's super cool to me! It required
lots of special engineering effort to get right, such as building a lot of on
chip memory.

On chip memory is one of the most difficult components to get right,
minimizing transistors while not breaking physics. It's not as simple as
"pouring tons of memory on a die" and requires specialized engineers that
hand-layout these components. The event driven asynchronous nature of
TrueNorth is fairly unique and undoubtedly added complexity to the memory
design.

Do you have any references or evidence for CNN runtimes being mostly dominated
by compute? The operations performed in a CNN are more than just convolution;
for every input you multiply by a weight, you now have a memory bound problem,
which is much more expensive than ALU operations. Don't just take my word for
it, listen to Bill Dally (Chief Scientist at NVIDIA, Stanford CS prof, and
general computer architecture badass) [4]:

"State-of-the-art deep neural networks (DNNs) have hundreds of millions of
connections and are both computationally and memory intensive, making them
difficult to deploy on embedded systems with limited hardware resources and
power budgets. While custom hardware helps the computation, fetching weights
from DRAM is _two orders of magnitude more expensive than ALU operations, and
dominates the required power_."

This is what TrueNorth got right, and made its bet completing design before
AlexNet was published. That was a time where Hinton was viewed by the ML
community as a heretic talking about RBMs and backprop and hardly anyone
believed him. TrueNorth, like NNs at the time, gets some shade by doing things
differently that over time we're seeing validated by other researchers and
architectures incorporating.

I recommend reading [4] if you haven't already, as it is rich in insights for
building efficient NN architectures.

[0]
[https://en.wikipedia.org/wiki/Von_Neumann_architecture#Von_N...](https://en.wikipedia.org/wiki/Von_Neumann_architecture#Von_Neumann_bottleneck)

[1]
[https://en.wikipedia.org/wiki/TrueNorth](https://en.wikipedia.org/wiki/TrueNorth)

[2]
[http://ieeexplore.ieee.org/document/7229264/?reload=true&arn...](http://ieeexplore.ieee.org/document/7229264/?reload=true&arnumber=7229264)

[3] [http://www.research.ibm.com/articles/brain-
chip.shtml](http://www.research.ibm.com/articles/brain-chip.shtml)

[4]
[https://arxiv.org/pdf/1602.01528.pdf](https://arxiv.org/pdf/1602.01528.pdf)

~~~
p1esk
_GPUs (TX1) are ~10W, typical FGPAs ~1W, versus 70mW for TrueNorth!_

These numbers are meaningless. If you want to compare power consumption for
different chips, you need to make sure they:

1\. Perform the same task: running the same algorithm on the same data

2\. Use the same precision (number of bits) in both data storage, and
computation.

3\. Achieve the same accuracy on the benchmark.

4\. Run at the same speed (finish the benchmark at the same time). In other
words, look at energy per task, not per time.

If even a single one of these conditions is not met, you're comparing apples
to oranges. No valid comparisons have been made so far, that I know of.

p.s. The numbers you provided are off even ignoring my main point: typical
power consumption of an FPGA chip is 10-40W, and I don't know where you got
70mW for TrueNorth, and what it represents.

~~~
deepnotderp
Also those are teeny 32x32 images.

------
justinpombrio
Could someone explain what a neuromorphic chip is?

The article assumes we know, but I haven't heard of it. And the wikipedia
article on "neuromorphic engineering" talks about stuff like analog circuits,
copying how neurons work, and memristors, none of which seem that related.

~~~
nomailing
What I remember from my neuromorphic engineering course (or analog VLSI
course) is that we designed the silicon layout (with n and p doping regions)
in a way that the transistors are operating in the subthreshold regime in the
IV characteristics. If I remember correctly the IV characteristic is linear in
the subthreshold region? In contast in normal digital chips only the super-
threshold region is used (voltage above a certain saturation threshold
switches the transistor completely on). Using the subthreshold region it is
possible to implement spiking neuons with only very few transistors. It works
completely different than digital circuits. The connections between the
transistors don't transmit just 0's and 1's. Instead all wires transmit
analogue signals where the exact voltage matters. This makes these chips
extremely energy and space efficient. These chips can also work much faster
even in comparison to biological neurons (obviously using some assumptions and
simplifications, such as neglecting certain special kinds of ion channels
found in real neurons).

~~~
petra
Theoretically , Analog is by far the best for neural networks. But why aren't
we starting to see chips offered ? Heck even an old process like 130nm could
have some practical uses .

~~~
alexmlamb2
Could you implement an approximate matrix multiplication in a direct analogue
way? If so, I wonder why it hasn't been used for graphics cards.

~~~
p1esk
Yes: [https://arxiv.org/abs/1610.02091](https://arxiv.org/abs/1610.02091)

------
emcq
The near future is not in putting learning inside a chip. We're a long ways
off from the one shot learning needed to make a device actually interesting
with localized learning.

Instead, the future is recording and uploading your observations to the cloud,
with data scientist and neural net wizards training over this dataset on a
cluster with tons of GPUs, and then deploying an optimized model to scrappy
low precision inference chips.

This is why FPGA based designs will fail to be stunning. Specialized low
precision ASICs more similar to DSPs like Movidius' Myriad (in the Phantom
drones and Google's Project Tango devices), Google's TPU, upcoming Qualcomm
chips, or Nervana's will become increasingly popular.

------
espeed
Are these the Neuromorphic chips [1] Jeff Hawkins of Numenta [2] has been
talking about?

[1] Neuromorphic Chips
[https://www.technologyreview.com/s/526506/neuromorphic-
chips...](https://www.technologyreview.com/s/526506/neuromorphic-chips/)

[2] Numenta papers/videos [http://numenta.com/papers-videos-and-
more/](http://numenta.com/papers-videos-and-more/)

It would also be good to see a major chip manufacturer or cloud provider that
makes its own chips (Google/IBM) get serious about graph processing chips
[3,4] and moving beyond floating point [5].

[3] Novel Graph Processor Architecture
[https://www.ll.mit.edu/publications/journal/pdf/vol20_no1/20...](https://www.ll.mit.edu/publications/journal/pdf/vol20_no1/20_1_7_Song.pdf)

[4] Novel Graph Processor Architecture, Prototype System, and Results
[https://arxiv.org/pdf/1607.06541.pdf](https://arxiv.org/pdf/1607.06541.pdf)

[5] Stanford Seminar: Beyond Floating Point: Next Generation Computer
Arithmetic
[https://www.youtube.com/watch?v=aP0Y1uAA-2Y](https://www.youtube.com/watch?v=aP0Y1uAA-2Y)

------
m3kw9
Seem like a fancy name for parallel computing chip that specialize in
efficient parallel computing

~~~
adevine
It's that what it actually is? Not familiar with the field, so is
"neuromorphic chip design" really "we wired a bunch of GPUs together?"

Don't mean to minimize the work involved, just trying to decipher the
marketing speak.

~~~
mrstone
They are a chip architecture that communicates similarly to how neurons do -
i.e. spiking behaviours. Makes it easy to approximate some neural systems
(such as a neuromorphic retina).

------
return0
Do 'Neuromorphic' chips do anything useful? At least neural networks have well
known utility, but afaik, neuromorphic (i.e. heavily simplified models of
neurons that one only hopes - but cannot prove are correct) have no useful
applications - or even theoretical functions.

~~~
bobsil1
Low-power video object recognition for military drones, security cams, etc.

------
mrfusion
Would it be possible to build an analog neural network with hardware?

~~~
deepnotderp
Yes, I work at a startup that's doing this, but it's very dangerous with the
amounts of noise, which seem fine on MNIST and even CIFAR, but you die on
imagenet.

The key to circumventing this is very complicated and our "secret sauce".

~~~
38kkdiu
Can you say a bit more about what you mean by "it's very dangerous with the
amounts of noise"?

~~~
deepnotderp
Network accuracy crashes

------
aj7
For laughs, I searched this document for the term Nvidia.

