First Wave of Spiking Neural Network Hardware Hits

modeless · on Sept 13, 2018

82% accuracy on CIFAR-10? Is this a joke? State of the art on CIFAR-10 is 98.5% accuracy. That chart showing them near the top of a top-1 accuracy graph is borderline fraudulent.

Why not compare apples to apples? Take a CNN architecture that gets >95% accuracy, shrink it down until it only gets 82% accuracy, then run it on commercially available non-spiking NN hardware like a Movidius Myriad or Apple's Neural Engine and measure the speed and power consumption.

alfalfasprout · on Sept 13, 2018

I agree there's something very fishy here. Some of the points on that chart list the architectures, others don't. For instance, Movidius Myriad VPU 2 can run a variety of different architectures.

Also... LeNet isn't THAT bad on CIFAR-10. It can easily reach 75% after 200 epochs.

I think these specialized chips can be useful in mobile or low power applications, but for servers the flexibility of a GPU or reprogrammable FPGA is worthwhile. Advancements in ML are frequent and nobody wants to be stuck with hardware that can't run a newer architecture.

stochastic_monk · on Sept 13, 2018

The chart is only covering so-called VPUs. This architecture claims to be best in class, but it's definitely unfair to omit how well state of the art models perform on standard hardware (IE, [GCT]PUs).

modeless · on Sept 13, 2018

TX2 is on the chart and it is a GPU. But reading between the lines, the TX2 numbers on the chart are for ImageNet, not CIFAR-10. The TX2 is classifying 224x224 images into 1000 classes. Brainchip is classifying 32x32 images into 10 classes. But accuracy and power consumption are directly compared on the chart almost as if the tasks are similar. That's why I say it's borderline fraudulent.

stochastic_monk · on Sept 13, 2018

Thank you for clarifying. I only looked through several of the chips before assuming they were all the same. I agree, and I don't see what case would lead me to want to be stuck with one particular model for one kind of data, especially of the error rate is dozens of times higher.

chadmeister · on Sept 13, 2018

Oh wow that is a very important detail I had overlooked as well. Looking at them now the comparisons seem rather blatantly misleading.

buboard · on Sept 13, 2018

I don't get the appeal of neuromorphic spiking chips even in academia. We don't have a definite , final spiking neuron model, and these architectures are too constrained to use them for finding it. We also have very few real-world applications for spiking neurons that exceed their ANN counterparts. There are multiple groups working on them and apparently are racing to compete with each other, but for all of them, the cart is before the horse.

burning_hamster · on Sept 14, 2018

> We also have very few real-world applications for spiking neurons that exceed their ANN counterparts.

You are being generous. "Zero" does not qualify for "few".

On a more serious note:

> I don't get the appeal of neuromorphic spiking chips even in academia. We don't have a definite , final spiking neuron model, and these architectures are too constrained to use them for finding it.

I think at the moment there are some people that simply want to grab a bunch of the IP rights and position themselves in the hardware market. I think they know that their networks are not up to scratch yet, but they also believe that somebody will figure out how to implement backprop in a spiking network soon, and then they are in a good position to build something worthwhile. On the theory side, there are some papers coming pretty darn close that have come out in the last year.

emcq · on Sept 13, 2018

This article makes a lot of apples to oranges comparisons that are confusing:

1. The article makes the claim this is the first SNN processor, but then states they will first make an FPGA. They already reference existing processors, that have real debugged and fully functional ASICs, which already exist and are clearly the first SNNs. This is not the first SNN based FPGA design, nor the first functional SNN ASIC. For context, the existing SNN ASIC TrueNorth taped out in ~2012, and NeuroGrid in ~2010.

2. They compare Cifar-10 results to ImageNet on the same chart. This is not apples to apples, as an architecture can get bus bound with larger image patches, weights and activations, etc. Once it becomes bus bound these architectures can lose efficiency.

3. They talk about low power (<5W) architectures being compelling, but this does not include the TX2 (10-20W with the GPU going), and TrueNorth and NeuroGrid are at least an order of magnitude smaller (i.e. <<500mW). They omitted mobile chips and Qualcomm's Hexagon which is extremely compelling at the ~1-2W range.

4. One of the neat things about the TrueNorth and NeuroGrid architecture is that they are asynchronous; when little activity is happening the chip can draw less power. Even the TX2 has some property of this by dynamically scaling power for the GPU, and perhaps I missed this but it does not seem supported by this architecture. Idle power draw can be important!

Once you remove a few of those datapoints and compare apple to apple this architecture seems less compelling. For what reads like a sales pitch, they could do a better job being straightforward about why this makes sense. And if it doesnt make sense, it will be yet another failed "AI" chip.

greatabel · on Sept 13, 2018

I think sophistication of developer ecosystem and migration costs of moving code are important.

tehsauce · on Sept 13, 2018

Does anyone know how these things actually learn? I saw something in the article like "data passes over the network and creates reinforcement" but how well does this work, and how?

samhain · on Sept 13, 2018

That's a really good question... I know that normal network training is basically "y=mx+b, solve for m and b."

But then you add layers and transfer functions, so it's more like:

y1=f(m1y2)+b1

y2=f(m2yn)+b2

...

And then you solve for each mn and bn, using f^(-1), which is why smooth transfer functions are preferred, and then on some networks you can visualize the training space, with the derivative pointing toward the most optimal position.

But, a spiking neural network that isn't smooth seems like you wouldn't form clean gradients for training, so that's actually a really good question... Seems like it wouldn't work correctly, or would be incredibly difficult to train. Of course maybe "spiking" is a name for another part of the behavior, and not the transfer function itself.

buboard · on Sept 13, 2018

These architectures are hardware implementations of simple integrate-and-fire-related architectures. You can usually implement simple plasticity algorithms like spike-timing dependent plasticity.

They don't exceed ANNs at this point, but are closer to biological neurons. Some researchers are looking into how the backpropagation of errors might be implemented in spiking networks. Example: https://www.cell.com/neuron/abstract/S0896-6273(13)01127-6?c...

accurrent · on Sept 13, 2018

A common algorithm used in Spiking Neural Networks is called STDP (Stochastic Timing Dependent Plasticity). I do not know what exactly brainchip uses though. It is also possible to gradient descent your way through some SNNs although this is very inefficient. Also it is possible to directly convert many CNN architectures to equivalent spiking architectures. (i.e. Learn a CNN using gradient descent then approximate it as an SNN).