
Yann LeCun on the IBM neural net chip - luu
https://www.facebook.com/yann.lecun/posts/10152184295832143
======
oldmanLecun
What LeCun is missing is that the IBM chip was designed to be ultra low power
(63mW to do pattern recognition on a real time video). One of the novel
techniques this chip uses to achieve low power is to use spikes to send
information between neurons.

It is not clear how LeCun would propose to send information in his
convolutional neural networks in specialized hardware (from what I can tell,
he uses FPGAs which are very high power in comparison). In the worst case,
that approach sends data between neurons on each time step, which is very
inefficient in power. If they do something more clever, I would bet it would
start to like sending spikes.

It also seems very short sighted to say that if the hardware is not
specifically designed for convolutional neural networks, then it is not the
right architecture. It seems like the IBM chip does support convolutional
networks, but it might require a few extra time steps to average.

Biology has evolved to use spikes (across many different species). Perhaps
evolution didn't get the memo that non-spiking convolutional neural networks
is the only architecture worth building. Maybe it will take a few more
thousand generations before evolution catches up, but until then spiking
neuron architectures seem like a decent gambit ...

~~~
jbarrow
I tend to agree that evolution reaches a local optimum given enough time, but
it seems that this chip is geared towards machine learning rather than
biological accuracy. And currently integrate-and-fire spiking neurons don't
appear to work better on the data we're interested in.

In this light, and although CNNs aren't the only architecture, his criticisms
may be a little more reasonable.

~~~
oldmanLecun
From my understanding, the chip is geared towards implementing neural networks
with low power consumption, which makes using spikes a reasonable design
choice. So one can argue that using spikes is not about biological accuracy,
but power efficiency.

That is great that people want general purpose machine learning chips, the
question is how to do it in low power. My guess is that the right architecture
will be a mix of ML primitives mixed in with things like spikes (and perhaps
other primitives seen found in biology).

~~~
mjn
Yes, that's also my understanding. Important background is that this is not
(or at least not solely) a commercial initiative by IBM to produce a machine-
learning chip, though I'm sure they would love to sell some too. It's a DARPA
initiative to find a way to greatly reduce the power budget needed for large-
scale data-processing. And one of the starting hypotheses of this particular
program, SyNAPSE, is that sparseness in time, aka spikiness, is part of why
biological organisms seem capable of processing large amounts of video/etc.
data with lower power budgets than computers seem to require. Here's an
excerpt from their program statement [1]:

 _Current computers are limited by the amount of power required to process
large volumes of data. In contrast, biological neural systems, such as the
brain, process large volumes of information in complex ways while consuming
very little power. Power savings are achieved in neural systems by the sparse
utilizations of hardware resources in time and space. Since many real-world
problems are power limited and must process large volumes of data,
neuromorphic computers have significant promise._

That may or may not be a good hypothesis, but it seems interesting to
investigate. In any case, LeCun's real beef is with the DARPA program
managers: he thinks a different area of ANN research would've been a better
allocation of funds, because in his view this is not among the most promising
lines of research. Not an uncommon reaction to DARPA choices, and not always
wrong either, but DARPA's got the money.

[1]
[http://www.darpa.mil/Our_Work/DSO/Programs/Systems_of_Neurom...](http://www.darpa.mil/Our_Work/DSO/Programs/Systems_of_Neuromorphic_Adaptive_Plastic_Scalable_Electronics_%28SYNAPSE%29.aspx)

------
fiatmoney
IBM seems to be _really good_ at getting press releases published and patents
filed, and _really bad_ at actually turning any of that into purchasable
products or services. They've been gutting their core US businesses now for a
while (revenues down, costs down, profits up, and everything going to India)
and a lot of areas that should be strengths have been horribly mismanaged. I'm
curious to see whether they'll continue investing in these kinds of prestige
products, and wondering what their rationale is. Patent battles down the line?
Effectively part of their ad budget?

~~~
shadowmint
It be fair, in this case it's pretty clear this was a DARPA funded project.

It would make even less sense for them to have just said, 'oh no, we're busy
focusing on other things at the moment' when DARPA tried to start throwing
money at them.

------
sanxiyn
My understanding is that TrueNorth is a product of DARPA SyNAPSE program, and
spiking neuron is a program requirement, so IBM delivered what it was asked to
deliver.

------
azakai
> Now, what wrong with TrueNorth? My main criticism is that TrueNorth
> implements networks of integrate-and-fire spiking neurons. This type of
> neural net that has never been shown to yield accuracy anywhere close to
> state of the art on any task of interest (like, say recognizing objects from
> the ImageNet dataset). Spiking neurons have binary outputs (like neurons in
> the brain). The advantage of spiking neurons is that you don't need
> multipliers (since the neuron states are binary). But to get good results on
> a task like ImageNet you need about 8 bit of precision on the neuron states.
> To get this kind of precision with spiking neurons requires to wait multiple
> cycles so the spikes "average out". This slows down the overall computation.

Surely you can do this in the spatial domain instead of time? That is, each
neuron is one bit, so you have 8 each computing one of the 8 bits he says are
necessary? Perhaps the problem is using that value afterwards, I guess.

~~~
Houshalter
It's not clear to me how that would work. Sending 1 bit of information to 8
processors is not the same as sending 8 bits to one processor.

~~~
cowsandmilk
this neuron works by providing one bit of output. Lecun claims you need to
send the same signal to the machine 8 times to get 8 bits of output, so it is
slower than a neuron that can output 8 bits at once. The question is why you
can't send the signal to 8 different neurons and get your 8 bits in the same
time as one bit.

~~~
waps
The answer here is that you send different subsets of input pixels to
different neurons.

E.g. you have a 4x4 pixel image you want to feed in to the net and recognize.
You subdivide it into 2x2 images (of which you'll have 9), and you send each
of those 2x2 images to each neuron __. Then the output of the neurons is one
if they see waldo, zero if they don 't. Why not send everything to every
neuron ? That won't work, and they don't have enough inputs anyway in real
image sizes.

 __Actually you 'd send more than 9. You'd also include a 2x2 that only
includes the corner pixels, to achieve scale independence (recognizing a car
whether it's 5x5 pixels or 50x50). Then you'd send that 36 times, but each
time rotated by, say 10 degrees. That's how human vision works ("how God does
it") and, well, that's how AI is trying to do it. In humans it's not a full
haar cascade : we can only see small features using the centre of the retina,
and only large features using the rest, and we only allow for limited rotation
(meaning humans brains rotate the source image a limited number of times along
an exponential curve that only goes up to ~40 degrees rotation).

(This is comparable to a haar cascade)

Now spiking neural networks have suboptimal performance, true, but they have a
major advantage : they do unsupervised learning only. You show them a world,
and they will build their own model of the world (which isn't as good as our
state-of-the-art models for known "worlds" like ImageNet).

Here's what you're doing with spiking nets (more or less). You show them
ImageNet (or any dataset) and you keep showing it to them. It will build up an
internal model of what the world looks like. After training you train a second
algorithm that searches which neuron encodes what. So the idea is that one of
the neurons in the network will encode "I saw the letter A", another will
encode "I saw B", third will encode "I saw C", so you look which neuron does
that.

Spiking neural networks need time, which is another disadvantage. This is
"simulated" time, but it nevertheless requires computation to happen to
advance time. It takes spiking networks some amount of this simulated time to
recognize things (just like animals/humans need time). So to have it recognize
it you have to "put a picture in front of them for X time" (meaning keep
triggering their inputs in the same way for a while), then wait to see if any
of the identified neurons fires in the first, say, 5s after showing the
picture.

Where spiking networks wipe the floor with convolutional nets is on
unpredictable tasks. Suppose you were having an "open-ended" problem (like,
say, a lifeform has). And the environment changes. A convolutional net that
was trained will, quite simply, start giving random results. A spiking network
will do something. Not necessarily the right thing, but it will try things.

Say you were building a robot that has to deliver supplies across Iraq.
Convoluational nets won't adapt. Spiking models will adapt (assuming you let
them, and I imagine DARPA will let them). Problem with letting them adapt, of
course, is that you may lose control.

And before you say "what about morality ?", I would say that spiking nets are
actually more moral. In both cases, spiking or convolutional, you don't
actually know how it will respond to unpredictable stimuli. However, if a
convolutional net is confronted with something it wasn't trained for, it will
simply have random reactions (it's a robot, it'll send random instructions to
the higher levels, meaning if it has a gun, it will extremely likely fire the
gun, probably aimed at the first thing it recognizes), a spiking model will
try something (which, of course, may be "kill all humans", but it might also
decide to wait and see if there are hostile moves, or ...). The difference is
the spiking model won't simply lose control. I would argue that spiking models
will respond much more like soldiers would.

You should think of convolutional nets as classifiers. You train them to
answer a yes/no question, and then they can respond. Spiking neural nets are
more like puppies. You can train them to bark if they see a car, and then use
that to detect cars, but you can also train them to retrieve a ball. (In
practice you "read the mind" of the spiking model, and because it's stored in
memory, that's easy)

------
mdda
It's unclear how the 'wait multiple cycles' is such a deal-breaker. Even when
using a GPU (in the standard way) there are cycles that get used in processing
the different stages of a computation. But, more significantly, the Spiking
Neuron thing doesn't (necessarily) have to work at a low clock rate, or any
clock rate, since the same integration-through-time approach could even work
asynchronously, or with jitter, etc. It's a pretty robust & low-power design
(and evolution found it, who would have guessed?)

------
joe_the_user
LeCun mentions disadvantages of the chip and then mentions a special purpose
chip he has had a hand in producing. It doesn't seem surprising that he likes
the idea of special purpose chips.

Aside from the particular criticism he makes toward the IBM algorithm, it
seems to me that the approach of jumping from one special chip to another
abandons the advantages of a general purpose computer itself. If your
algorithm has to be cast in silicon each time, tuning the algorithm would
depend on the chip's lifecycle. Also, only those few who have the resources to
build a chip would be able to supply algorithms narrowing the number of minds
working on this.

That alternative I'd like to see is a general purpose highly parallel chip.

The one I know of is the Micron automaton chip.
[http://www.micron.com/about/innovations/automata-
processing](http://www.micron.com/about/innovations/automata-processing)

Anyone know of anything similar?

------
sphink
Reading the comments here suggests that it would be cool to use both of the
chips together. If it is true that spiking does well with unsupervised
learning, then you could feed a bunch of input to a spiker, then scan it for
components that could be mostly mimicked with a convolutional chip and reroute
the inputs/outputs. (Yeah, the interconnect would suck.) The point is not to
come up with some magic better-performing hybrid, but rather to explore an
intermediate point in the design space of augmenting/replacing actual neurons
with silicon. The IBM chip isn't that close to biology, but it's closer than a
convolutional network, and a convolutional network is a much smaller step than
a general purpose processor. We might learn about some simple augmentations
that are likely to work in practice.

Also, the whole "airplanes don't flap their wings" analogy can be taken too
far. Little flying things are qualitatively different from big flying things.
You'll notice that a lot of small artificial flying things are flapping, and
the biggest natural fliers tend to glide a lot. There are other reasons why
nature didn't evolve large fliers. (Although I'm willing to believe some large
fliers may occasionally have hot gases shooting out of their back ends, I do
not believe propulsion is their purpose.)

------
return0
The main criticism should be that these neurons are not like real neurons,
because integrate-and-fire is an oversimplification of neurons. So it's not
really like the brain at all. There is a lot of fanfare from IBM about it, but
truly we've had these models since the 80s. I think it's bad science to just
"build a machine with a shit ton of IF neurons and see if it does anything".

The fact that Truenorth can learn approximations is not really surprising, we
know that thresholded units can approximate well[1]. They should have
implemented compartmental neurons [2]

[1]
[http://en.wikipedia.org/wiki/Universal_approximation_theorem](http://en.wikipedia.org/wiki/Universal_approximation_theorem)
[2]
[http://en.wikipedia.org/wiki/Compartmental_modelling_of_dend...](http://en.wikipedia.org/wiki/Compartmental_modelling_of_dendrites)

~~~
JoeAltmaier
Bad science? Trying things? That's fundamentally what true science is all
about. Experimentation is where theories are supposed to come from. Remember
Nature just put a ton of neurons together to see if it did anything.

What would be called 'good science'? Reading articles and spinning tales about
what comes next? Regurgitating summaries of others' work?

~~~
return0
This money would be better spent in experiments to find out how neurons work.
IF neurons are well studied, and large scale models of the brain using IF
models have been done before[1]. The result? Nothing.

It would be an experiment if they were testing a new model. This is a
simulation.

[1]
[http://www.izhikevich.org/human_brain_simulation/Blue_Brain....](http://www.izhikevich.org/human_brain_simulation/Blue_Brain.htm#Simulation)
of Large-Scale Brain Models

~~~
JoeAltmaier
This experiment was different in some way? More 'neurons' this time? That
qualifies as an experiment.

~~~
sanxiyn
This experiment was different and inferior in almost all ways. Both less
neurons and simpler neurons than simulated before. We had simulated 100x more
neurons in more biological detail than what TrueNorth team did.

This is a great asynchronous circuit power efficiency research, and not a
neuroscience research at all.

------
jimfleming
Out of curiosity, does anyone know why they choose I&F instead of Izhikevich
neurons[1] which model more biological spike forms. Perhaps to meet the low
power consumption goals?

Also, how do convolutional neural networks model time? I thought that was one
of the benefits of spiking networks and STDP.

1\.
[http://www.izhikevich.org/publications/whichmod.pdf](http://www.izhikevich.org/publications/whichmod.pdf)

~~~
marmaduke
The Izhikevich neuron is a two-dimensional system where I&F are usually one-
dimensional, and the former has a larger number of parameters. Mathematical
analysis (and implementation perhaps) is easier for the simpler I&F.

------
itg
>My main criticism is that TrueNorth implements networks of integrate-and-fire
spiking neurons...Spiking neurons have binary outputs (like neurons in the
brain).

Isn't this better. LeCun is looking at this only from a machine learning
perspective.

~~~
modeless
Machine learning is the goal. This chip should be judged on its learning
performance, not on how well it adheres to an oversimplified and incomplete
model of how biological neurons might work in the brain.

~~~
oxtopus
FWIW, there is no learning on-chip. Machine learning is not the goal of this
project, nor is its success dependent on it's learning capabilities (at least
not at this phase). Where it does succeed, however, is in low-power
computation in an architecture that is scalable and fault tolerant. LeCunn is
criticizing an orange for not tasting like an apple.

~~~
modeless
Learning may not happen on-chip, but the network is still learned, and the
performance of the chip is dependent on the learning. The spiking architecture
of the chip means that the best learning algorithms can't be used. An ASIC
implementing a convolutional neural net could also be low-power, scalable, and
fault-tolerant, while taking advantage of the best currently known learning
algorithms and ultimately performing a lot better on real tasks.

------
leishulang
If this is the same team who did the "cat brain" at 2009, then I am on Yann's
side.

------
slurry
Published on Facebook? Really?

~~~
bigtones
Yeah that's what I thought - unless he actually works for Facebook. But then
the comments on his post are actually quite informative, so I guess each to
their own and maybe he has a great audience on FB.

~~~
sanxiyn
Yann LeCun _does_ work for Facebook.

