
IBM Chip Processes Data Similar to the Way Your Brain Does - finisterre
http://www.technologyreview.com/news/529691/ibm-chip-processes-data-similar-to-the-way-your-brain-does/
======
davmre
Yann LeCun (neural net pioneer and Facebook AI head) has a somewhat-skeptical
post about this chip:
[https://www.facebook.com/yann.lecun/posts/10152184295832143](https://www.facebook.com/yann.lecun/posts/10152184295832143).
His essential points:

1\. Building special-purpose hardware for neural nets is a good idea and
potentially very useful.

2\. The architecture implement by this IBM chip, spike-and-fire, is _not_ the
architecture used by the state-of-the-art convolutional networks, engineered
by Alex Krizhevsky and others, that have recently been smashing computer
vision benchmarks. Those networks allow for neuron outputs to assume
continuous values, not just binary on-or-off.

3\. It would be possible, though more expensive, to implement a state-of-the-
art convnet in hardware similar to what IBM has done here.

Of course, just because no one has shown state-of-the-art results with spike-
and-fire neurons doesn't mean that it's impossible! Real biological neurons
are spike-and-fire, though this doesn't mean the behavior of a computational
spike-and-fire 'neuron' is a reasonable approximation to that of a biological
neuron. And even if spike-and-fire networks are definitely worse, maybe there
are applications in which the power/budget/required accuracy tradeoffs favor a
hardware spike-and-fire network over a continuous convnet. But it would be
nice for IBM to provide benchmarks of their system on standard vision tasks,
e.g., ImageNet, to clarify what those tradeoffs are.

~~~
kastnerkyle
I find it interesting that no group (to my knowledge) has tried something
similar to [Do Deep Networks Need to Be
Deep?]([http://arxiv.org/abs/1312.6184](http://arxiv.org/abs/1312.6184)) for
ImageNet scale networks. There have been several results which show that the
knowledge learned in larger networks can be compressed and approximated using
small or even single layer nets. Extreme learning machines (ELM) can be seen
as another aspect of this. There have also been interesting results in the
"kernelization" of convnets [from Julian Mairal and
co.]([http://arxiv.org/abs/1406.3332](http://arxiv.org/abs/1406.3332)) that,
accompanied by the stong crossover between Gaussian processes and neural
networks from back in late 90s, point to the possibility of needing different
"representation power" for learning vs. predicting which may lead to the
ability to kernelize the knowledge of a trained net, ideally in closed form.

I am doing some experiments in this area, and would encourage anyone thinking
of doing hardware to look at this aspect before investing the R&D to do
hardware! If this knowledge can really be compressed it could be a massive
reduction in complexity to implement in hardware...

I am a bit biased on this topic (finishing a talk about this exact topic for
EuroScipy now) but I find the connections interesting at least.

------
pinkyand
For a technical article about the architecture , see :

[http://www.research.ibm.com/software/IBMResearch/multimedia/...](http://www.research.ibm.com/software/IBMResearch/multimedia/IJCNN2013.algorithms-
applications.pdf)

------
zackmorris
I'm very excited about this, as it's at least 2 decades overdue. When Pentiums
were getting popular in the mid 90s, I remember thinking that their deep
pipelines for branch prediction and large on-chip caches meant that fabs were
encountering difficulties with Moore's law and it was time to move to
multicore.

At the time, functional programming was not exactly mainstream and many of the
concurrency concepts we take for granted today from web programming were just
research. So of course nobody listened to ranters like me and the world plowed
its resources into GPUs and other limited use cases.

My take is that artificial general intelligence (AGI) has always been a
hardware problem (which really means a cost problem) because the enormous
wastefulness of chips today can’t be overcome with more-of-the-same thinking.
Somewhere we forgot that, no, it doesn’t take a billion transistors to make an
ALU, and no matter how many billion more you add, it’s just not going to go
any faster. Why are we doing this to ourselves when we have SO much chip area
available now and could scale performance linearly with cost? A picture is
worth a thousand words:

[http://www.extremetech.com/wp-
content/uploads/2014/08/IBM_Sy...](http://www.extremetech.com/wp-
content/uploads/2014/08/IBM_SyNAPSE_20140807_005.jpg)

I can understand how skeptics might think this will be difficult to program
etc, but what these new designs are really offering is reprogrammable
hardware. Sure, we only have ideas now about what network topologies could
saturate a chip like this, but just watch, very soon we’ll see some wizbang
stuff that throws the network out altogether and uses content addressable
storage or some other hash-based scheme so we can get back to thinking about
data, relationships and transformations.

What’s really exciting to me is that this chip will eventually become a
coprocessor and networks of these will be connected very cheaply, each
specializing in what are often thought of as difficult tasks. Computers are
about to become orders of magnitude smarter because we can begin throwing big
dumb programs at them like genetic algorithms and study the way that solutions
evolve. Whole swaths of computer science have been ignored simply due to their
inefficiencies, but soon that just won’t matter anymore.

~~~
sliverstorm
_I remember thinking that their deep pipelines for branch prediction and large
on-chip caches meant that fabs were encountering difficulties with Moore 's
law_

It's really a combination of memory latency and pipelining.

Memory latency is absolutely terrible compared to processor speed, and that
has nothing to do with Moore's law. It's 60ns to access main memory, which is
ballpark 150 cycles. If you have no caches, your 2.5Ghz processor is basically
throttled to 16Mhz. You can buy some back with high memory bandwidth and a
buffer (read many instructions at a time). But if you have no predictor, every
taken branch flushes the buffer and costs an extra 150 cycles- in heavily
branched code your performance approaches 8Mhz.

Then think about pipelining. We don't pipeline because Moore's law has ended.
We pipeline because a two-stage pipeline is 200% as fast as an otherwise
identical unpipleined chip. A sixteen-stage pipeline is 1600% as fast. Why the
hell _wouldn 't_ you pipeline? Now, of course in the real world branched code
can tank a deep pipeline. Which is where the branch predictor comes in, buying
back performance.

[http://stackoverflow.com/questions/4087280/approximate-
cost-...](http://stackoverflow.com/questions/4087280/approximate-cost-to-
access-various-caches-and-main-memory)

~~~
p1esk
>>> If you have no caches, your 2.5Ghz processor is basically throttled to
16Mhz.

No. This is only true if every instruction tries to access memory.

>>> We pipeline because a two-stage pipeline is 200% as fast as an otherwise
identical unpipleined chip. A sixteen-stage pipeline is 1600% as fast.

No. First of all, each stage in the pipeline will be equal to the slowest
stage. Second, there will be significant overhead of passing data through
pipeline registers, and of control logic for those registers.

The reason we saw 32 stage pipelines in P4 was mostly marketing: "megaherz
race" between AMD and Intel.

~~~
kevinnk
>>> No. This is only true if every instruction tries to access memory.

Every instruction must be loaded from memory in order to execute it. Hence
instruction caches.

~~~
p1esk
Yes, you're right, I missed that.

------
nightski
While the efficiency gains are nice and definitely welcome, it would be
interesting to see what the performance gains are over a GPU. The article
makes the chip sound somehow superior to existing implementations but really
this is just running the same neural network algorithms we know and love on
top of a more optimized hardware architecture.

Meaning I have no idea how this signals the beginning of a new era of more
intelligent computers as the chip provides nothing to advance the state of the
art on this front. Unless I am missing something?

~~~
zniperr
A difference is that a GPU uses a lot of power and takes up a lot of space. I
can imagine an optimized, energy-efficient chip would be useful in embedded
systems. Something like a Raspberry Pi for image processing maybe?

~~~
wmf
Like Tegra K1? GPUs are more energy efficient than normal CPUs for some tasks,
so getting lower absolute power consumption is just a matter of using fewer
cores.

------
ckluis
I wonder what the possibilities are for adding a neuromorphic chip to a normal
stack for specialized tasks such as the image/video recognition (cpu, gpu,
npu). GPUs are very similar in their need for specialized code vs cpus.

Just an uneducated wild-thought.

~~~
stdgy
This is something I'm interested in discovering as well. I view most of these
developments as modular components that could be used in conjunction with
existing processor pipelines. For instance, with these 'neural' chips, I could
imagine an existing processor querying the neural chip to look for particular
activation patterns. Though I'm not too sure on the language one would use to
specify which patterns to look for... Perhaps you could extract the parameters
from the neural chip itself through a learning process, which you'd then use
to bootstrap the process a bit and know what to look for? I'd imagine a lot of
formal research is still needed here.

Neat developments, excited to see how they shake out.

~~~
apw
One possibility is to use the neuromorphic chips as souped-up branch
predictors -- instead of predicting one bit, as in a branch predictor, predict
all bits relevant for speculative execution. This can effect large-scale
automatic parallelization.

See this paper at ASPLOS '14 for details:

[http://hips.seas.harvard.edu/content/asc-automatically-
scala...](http://hips.seas.harvard.edu/content/asc-automatically-scalable-
computation)

------
FD3SA
The interesting thing about this project is that they're using transistors to
physically simulate synapses and neurons, which is quite an inefficient
method. Transistors are expensive, and your brain has about 100 billion
neurons, and trillions of synapses.

A recent discovery by Leon Chua has shown that synapses and neurons can be
directly replicated using Memristors [1]. Memristors are passive devices which
may be much simpler to build in the scale of neurons compared to transistors.

1\.
[http://iopscience.iop.org/0022-3727/46/9/093001/](http://iopscience.iop.org/0022-3727/46/9/093001/)

~~~
stephenmm
Actually this chip is not _that_ far off. It has 5 billion transistors so with
a process shrink and a board with 10-20 of these chips it should be roughly
equivalent to the number of neurons in a human brain. Now think of your home
with about 100 of things connected to your network and your house will be
pretty darn smart!

~~~
alok-g
You are quite off here. A transistor is simply not as powerful as a neuron.
The article itself notes that the chip is capable of simulating "just over one
million \"neurons,\"".

------
skywhopper
Lots of problems with the way this is presented in the article. Though the
chip is patterned after a naive model of the human brain, the headline
assertion is far too bold. Additionally, while the Von Neumann architecture
can be characterized as bottlenecked and inefficient, it has also allowed for
extremely cheap computing. A processor with all of its memory on the chip
would not be inexpensive. Note this article never mentions the cost of the
chip nor its memory capacity.

The comparison of this chip's performance with that of a nearby traditionally-
chipped laptop is questionable. A couple of paragraphs later it says that the
chip is programmed using a simulator that runs on a traditional PC. So I'm
guessing the 100x slowdown is because the traditional PC is simulating the
neural-net hardware, rather than using optimized software of its own.

Yes, this is important research, but engineer-speak piped through hype
journalists will always paint an entirely unrealistic and overoptimistic
picture of what's really going on.

------
WhitneyLand
What percentage of readers know you could fill a football stadium with these
chips and for many tasks it wouldn't come close to a human brain with today's
knowledge of software? I love news like this just feels like analogies using
brains are easy to overhype.

~~~
Lambdanaut
I think work like this is very important. In the 1940s you could fill a
football stadium with about 50 ENIAC computers and you wouldn't have 1/1000th
the processing power of an Iphone. Your statement gives useful perspective in
one direction, but exponential improvement cannot be ignored. There can't be
any doubt that neuromorphic chips have a lot of wiggle room to explode in
capability in the coming decades.

~~~
ars
> have a lot of wiggle room to explode in capability in the coming decades.

Are you sure about that? CPU speeds have not improved in years. We appear to
have hit a maximum, at least for now. (Of course I can't predict the future,
but it's been years now and no change.)

~~~
Kurtz79
Clock Frequency != Speed.

Otherwise we would still be using (very cheap) Pentiums IV.

In a way, it's a testament to human ingenuity that CPUs have kept improving
they way they have when the brute force way of increasing performance was not
as viable as before.

------
mark_l_watson
Although IBM's hardware implementation does not support the current hotness in
neural models, I still think that this is a big deal, both for applications
with the current chip and also future improvements in even less required
energy and smaller and more dense chips.

I was on a DARPA neural network tools advisory panel for a year in the 1980s,
developed two commercial neural network products, and used them in several
interesting applications. I more or less left the field in the 1990s but I did
take Hinton's Coursera class two years ago and it is fun to keep up.

------
slashnull
Anyone got something more technical? I googled a bit and I can't seem to find
anything beyond marketoid handwaving

~~~
caycep
Look for papers by Carver Mead from Caltech in the '80s, these are all based
off of those concepts I think.

------
caycep
Wonder if they are then going into direct competition with Qualcomm and
Samsung; all these companies have quite active neuromorphic chip research
groups going.

~~~
soperj
They did it using a Samsung die manufacturing process if I'm not mistaken.

------
dctoedt
NY Times article, by John Markoff:
[http://www.nytimes.com/2014/08/08/science/new-computer-
chip-...](http://www.nytimes.com/2014/08/08/science/new-computer-chip-is-
designed-to-work-like-the-brain.html)

------
scientist
If you are a scientist, here is the Epistemio page for rating and reviewing
the scientific publication discussed here:
[http://www.epistemio.com/p/AJ09k7Yx](http://www.epistemio.com/p/AJ09k7Yx)

------
fla
Is it a sort of general purpose neural-network hardware ?

------
lispm
'IBM Chip Processes Data Similar to the Way Your Brain Does'

Interesting, I did not know that we already know how the brain 'processes
data'.

~~~
Houshalter
They are referring to the fact it uses a connectionist architecture rather
than a von Neumann one.

[https://en.wikipedia.org/wiki/Connectionism](https://en.wikipedia.org/wiki/Connectionism)

~~~
lispm
That says most nothing. The brain uses various forms of neural networks of
whose data processing we know relatively little. The IBM chip is at best
somehow 'inspired' by the brain. It's far from working like it.

~~~
Houshalter
More specifically it's also a spiking neural network. You could probably
program it to efficiently run very similar algorithms to human neurons.

------
beefman
vonsydov's link is not dead and I don't know why his comment was downvoted or
why I can't reply to it. There's nothing wrong with his link, though this one
may have been slightly better

[http://dx.doi.org/10.1126/science.1254642](http://dx.doi.org/10.1126/science.1254642)

More broadly, I don't understand why HN seems to prefer press pieces (so often
containing more inaccuracies than useful information) to the papers on which
they're based.

In this case, even if you can't access the full text, the single-paragraph
abstract contains all of the new information in the 12-paragraph Tech Review
story.

~~~
sp332
vonsydov's account has been hellbanned for 52 days.

