> Spiking Neural Networks (SNNs) have been an attractive option for deployment on devices with limited computing resources and lower power consumption because of the event-driven computing characteristic.
But training them requires new techniques compared to continuous neural networks — SNNs aren't differentiable and therefore you can't back-propagate (as I understand it; please correct me if I'm off here).
That you can't backpropagate is a common misconception. There is recent work (I am one of the co-authors), that derives a precise analog of the backpropagation algorithm in spiking neural networks (https://arxiv.org/abs/2009.08378). It computes exact gradients and requires only communication at spike times during the backward pass. The reason this works is because "x is not differentiable" lacks a statement "with respect to y". It turns out that since gradient computation is local, a gradient is defined almost everywhere. The gradient computation is only ill-defined at places where a spike would get added or deleted. This is similar to how in a ReLU network an exact zero input is "non-differentiable".
I've got a technique where I simply record every spike event, its tick, source & target in a SQL database. This allows for quick recursive queries for determining contributors to output. I feel if we stop chasing biological emulation and superscalar, we can start to get clever with more traditional data-wrangling methods.
I don't think the answer to a high quality SNN model is going to require purpose-built hardware or a supercomputer to run. I think you will see emergence even with extremely rudimentary single core CPU-bound models if everything else is done well.
Event-driven is a superpower when architected correctly in software. This means you can pick clever algorithms and lazily-evaluate the network. Implications being that networks that would ordinarily be impossible to simulate in real time can now be simulated this way. You could have neuron state in offline storage that is brought online as relevant action potentials are enqueued for future execution.
> The gradient computation is only ill-defined at places where a spike would get added or deleted.
I feel like I'm missing something here. Like if you do it naively, the gradient is zero when a spike isn't added or deleted. And infinite when it is. Which is completely unhelpful.
Now the "natural" solution is to invent some differentiable approximation of the spiking network, and compute derivates of that, and hope that the approximation is close enough that optimizing it leads to the spiking network learning something useful.
A more principled version might be to inject some noise into the network. This would mean that you have a probability of spiking in a certain pattern (or better, a class of patterns that all have the same semantic). You could differentiate the probability of correct output with respect to the weights and try to drive it towards 1.
People have tried using probabilistic SNNs because the gradient is well defined there. The issue is it’s computationally intractable to calculate so you still have to use an approximation and you’re thus not better off than the people using a surrogate gradient of the non-probabilistic SNN.
Trying to figure that out right now :), there is a tension between an algorithm like this and the biophysical reality of actual brains. My intuition is that whatever the brain does is not so much an approximation of any known algorithm, but rather an analog of such an algorithm. Now that we know how in a spiking neural network gradient computation looks like, we can ask what assumptions are violated in the brain, just like people have done before for the vanilla backpropagation algorithm. There is some recent concrete experimental evidence that points towards a solution.
There is nothing theoretical that stops anyone from trying that. It is just a matter of engineering and architecture. It is (in principle) possible to define "spiking transformers" and there is more than one way to do so, but this will not mean necessarily that you will immediately get good performance. Also current generation hardware doesn't as efficiently simulate SNN, so this is another constraint.
Maybe you're right, it's as good as it needs to be :)
I think it would be cool to have a better memory, ability to communicate and share ideas telepathically...but maybe it's a case of being careful what I wish for.
From the paper:
> Spiking Neural Networks (SNNs) have been an attractive option for deployment on devices with limited computing resources and lower power consumption because of the event-driven computing characteristic.
But training them requires new techniques compared to continuous neural networks — SNNs aren't differentiable and therefore you can't back-propagate (as I understand it; please correct me if I'm off here).