Caveat: the paper mainly focuses on the "standard quantum limit" which is the fundamental photon energy needed for the operations. If other things are taken into account (for example, modulation energy for the weights in this homodyne scheme, which scales with N² and not N, or the limits of the ADC) then the energy they are proposing is nowhere near possible. Furthermore, substantial alignment and packaging problems exist for free space optical systems, which prevents them from beating integrated approaches in the near term. In fact, it seems that Fathom Computing has potentially pivoted away from free space, based on the latest verbiage on their website, and they've been trying to get it to work for 3 years now.
However, it still presents an interesting case for the fact that the fundamental floor on optical scaling is absolutely tiny. It'll be interesting to see who wins in this space :)
Does the paper propose building these devices in free space? I got the sense this was all intended to be produced lithographically with waveguides, so alignment wouldn't be a problem.
The title is wrong about it being an integrated design. Here's an excerpt from the paper abstract:
"This paper presents a new type of photonic accelerator based on coherent detection that is scalable to large (N≳106) networks and can be operated at high (gigahertz) speeds and very low (subattojoule) energies per multiply and accumulate (MAC), using the massive spatial multiplexing enabled by standard free-space optical components"
This is not too surprising as analog techniques sometimes require much, much less power, and 8-bit resolution used in these GPUs. A prism can perform a Fourier transform more efficiently than an FPGA, as can a SAW device performing correlation. The first SAR radars used real-time optical processing with direct write to film.
A simple super-heterodyne receiver will run months off a single AA battery. Try that with SDR.
"If we don't worry about silly things like network accuracy, correctness, adc energy, detector energy and other weird things everyone else seems to worry about, then we get lower energy numbers than anyone else!"
When they use light to perform matrix multiplication, is the product deterministic, or an approximation? Digital computers spend an enormous amount of energy on always getting the same answer, so if an algorithm doesn't need that level of precision, it makes sense that an efficient hardware approximation could be practical.
It sounds like it's an approximation, but the abstract indicates that it can be used for training, not just inference, so it shouldn't matter a whole lot--if the training process includes the errors, the neural net will be trained to account for them.
In fact with current neural nets on floating point hardware, people use dropout which is just deliberately introducing random errors.
Which is why, only somewhat tongue in cheek, people sometimes describe the rounding errors from using lower-precision arithmetic as "free regularization".
So any noise from this being an analogue system would likely improve performance (at least in terms of training).
It's a popular misconception that neural nets don't really care about precision.
8- bit floats with 16 bit master copies show clear degradation even with careful swamping prevention and stochastic rounding. [0]
In practice, you want fp16 with fp32 master copies, and it turns out that if done improperly, even fp32 can cause accuracy losses sometimes [1].
So, you see, it's not actually the case that you can go proceed directly to analog computing without a worry about silly things like correctness, accuracy and precision.
They don't use gradient decent though. The learning mechanism used by biological neural networks has almost nothing in common with the one used by most artificial neural networks.
The learning mechanism[1] is actually relatively well understood (at least as far as neuroscience goes) but it isn't used for ANNs because Spiking Neural Networks(SNNs) don't map well onto conventional computing hardware whereas the matrix multiplication used by conventional ANNs does.
I agree that it's worth researching but currently it is quite difficult to compete with conventional ANNs due to the commonality of processors well suited for ANNs.
STDP is one of the early steps towards understanding the learning mechanism(s) in the brain, but we are a long way from enough understanding to actually reproduce it.
Not only we don't know how to train spiking networks, we don't even know how the information is encoded: pulse frequency or pulse timings or ...? No one knows. How can you compete with anything if you have no idea how it works?
Also, this has nothing to do with computing hardware. You can easily simulate anything your want on conventional processors. Huge computing clusters have been built for spiking model simulations, and nothing interesting came out of it. Invent the algorithm first, then we will build the hardware for it.
STDP based SNNs don't take much secret sauce to function reasonably well for basic tasks[1]. Some people have even started to apply reward modulated STDP to the problem which is nice to see since it's a more complete version of STDP[2]. Even so, the amount of research in the area is tiny. I agree that you need to figure out the whole working algorithm before trying to scale things up but to do that you need a research base and currently that is difficult because running large SNNs is very slow for an average researcher without access to specialized hardware. The primary factor which kicked off the deep learning revolution was the idea that suddenly you could train a DNN 10x faster using a GPU rather than a CPU. Without this revelation, most researchers just wouldn't bother because DL research took too long. GPUs are common and lots of people had access to them. SNNs obviously can be simulated on any CPU but for large models it's incredibly slow. SNNs have an incredible amount of potential paralellizeability but all of it is in the form of a massively asynchronous system where the order of events is critically important. CPUs and GPUs are unable to take advantage of most of the potential speedups inherent to this structure outside of specific cases where you have groups of non-interacting neurons. Due to the event based rather than signal based nature of SNNs, batched training also seems to be substantially less effective than in ANNs further complicating training.
The main problem with SNNs is not that they don't work well. It's not that they are slow to run on CPUs/GPUs. Those would indeed be temporary problems. The fundamental problem is we don't know the right abstraction level to imitate brain operations. It's far from clear that simulating spikes is necessary to implement brain's "learning algorithms". There are two main arguments for using spikes: "biological realism", and "energy efficiency". Neither is convincing: if you want to simulate a conventional CPU you don't want to simulate IV curves for transistors. You don't want to simulate CMOS gates. No, because the lowest abstraction level necessary to understand the computer operation is boolean logic. Anything below that is completely irrelevant. I strongly suspect that simulating spikes is below the lowest relevant abstraction level. By the way, Jeff Hawkins of Numenta agrees with me, and he's pretty strict about "biological plausibility" of his neocortex algorithms. As for energy efficiency - sure, spikes might be the most efficient way to compute and encode information given the constraints of biological brains. But why should we care about those constraints? We are building computing machinery in silicon, using vastly different technology, circuits, and tricks from what's available in "wetware". It does not make any sense to copy biological evolution tricks to improve efficiency when building things in silicon. Neuromorphic hardware papers always mention energy efficiency and then proceed to compare their analog spiking chips to digital (!) chips. That's ridiculous! I can't think of a single good argument why would spiking analog circuits be more energy efficient than non-spiking analog circuits, if we are talking about any existing computing hardware technology (or one that's likely to be developed in the foreseeable future).
Deep learning took off in 2012 not because faster hardware allowed us to develop good algorithms. The algorithms (gradient descent optimization, backpropagation, convolutional and recurrent layers) have been developed -- using slow hardware -- long before (Lecun demonstrated state of the art on MNIST back in 1998). The fast hardware allowed us to scale up the existing good algorithms. I don't see any good algorithms developed for SNNs. Perhaps these algorithms can be developed, perhaps faster hardware is indeed necessary, but as I argue above, the motivation to pursue this research is just not obvious to me.
Note that we shouldn't confuse this SNN research (such as the papers you cited), to efforts like Human Brain Project, where they're actually trying to derive higher level brain algorithms from accurate simulations of low level mechanics. Emphasis on accurate, because as any neuroscientist will tell you (e.g. [1]), these SNNs have very little to do with what's actually going on in a brain on any abstraction level.
One would hope that the production version contains health monitoring, because heat and age could similarly degrade an analog computer, and nobody wants a barrage of lawsuits after N years.
If the results depend on temperature, then it would be straightforward to climate control the processor; some cars already do this for batteries.
It's comparable to an old school analog computer (like the ones used long ago to simulate differential equations) using optics instead of analog electronics. They build a physical analogue of a neural network that evaluates the desired function.
Sounds amazing and scary at the same time... if it works, the amount of AI power that it concentrates to the company that implements it can get a monopoly in lots of industrial applications.
It'll like be pursued by other companies if it's beneficial, but currently this is theoretical, doesn't scale, and while they speak to power consumption there's no mention to computation speed (FLOPS).
Exciting yes, but like battery tech, there's a lot to address in that last mile to market.
It's actually highly inefficient to do (general) computation with photons. In contrast to electrons, photons really don't like to interact so you need a lot of optical power to implement an optical transistor-like, this was studied by Prof. David Miller some time ago (doi:10.1038/nphoton.2009.240, I think this is the correct citation am on the phone right now). Like others said, the device proposed in the paper here is more like an analog computer (with ML they have really seen a revival, both in electronics and optics), for a specific "calculation" they can be vastly more efficient, but they can't do general computation. The device here relies on interference and the photodetectors provide the nonlinearity, people have been researching these type of devices for neural networks for quite a while.
I wondered similar things at various points. I’m still waiting for the Ga chips the guys at Intel said were right around the corner...in 1999. My rudimentary guess is tech or manufacturing just isn’t there.
I wrote some unpublished fiction once about a super computer that used light as the interconnect fabric between nodes/CPUs, where each core could talk to any other core with virtually no latency and high bandwidth. I guess it's never been used (widely) because it's easy enough to run copper.
For electrical computers connected via optics and vice versa there is also a latency cost to converting the signal types that makes doing so at <rack sized distances a painfully bad choice. Not to mention the cost & power associated with converting light to lasers and back. That being said any real world supercomputer bigger than 1 rack has a significant amount of optical interconnect as it's easier to get a coherent signal farther with optics at the same power.
Nothing that visionary or even self aware, but imagine a giant sphere, sealed with an inside vacuum and also coated on the inside with IC's connected via copper for power and basic communication but each IC also had a nodule/transceiver that could communicate with every other IC via the light spectrum, because every IC had line of sight to the others, all submerged in liquid nitrogen or some other coolant. The premise was a giant campus like a school that users would be able to carry near-dumb terminals around, like smartphones with very little power but if they needed computational power all they had to do to tap into the sphere, by being in visible range of receptacles all around the campus both inside and out. And the sphere could borrow processing power from unused devices if they were available. So light was the practical connection for users, but also used inside the sphere.
As long as Moore's law keeps trucking along, there's no economic incentive to do so. This is similar to the explosion of Electron and other heavy technologies because there's no need to be efficient, or so some developers think. Of course, now that the law is slowing down, we might see more novel techniques, in both hardware and software design.
However, it still presents an interesting case for the fact that the fundamental floor on optical scaling is absolutely tiny. It'll be interesting to see who wins in this space :)