
Chip design drastically reduces energy needed to compute with light - rbanffy
http://news.mit.edu/2019/ai-chip-light-computing-faster-0605
======
zgao
Caveat: the paper mainly focuses on the "standard quantum limit" which is the
fundamental photon energy needed for the operations. If other things are taken
into account (for example, modulation energy for the weights in this homodyne
scheme, which scales with N² and not N, or the limits of the ADC) then the
energy they are proposing is nowhere near possible. Furthermore, substantial
alignment and packaging problems exist for free space optical systems, which
prevents them from beating integrated approaches in the near term. In fact, it
seems that Fathom Computing has potentially pivoted away from free space,
based on the latest verbiage on their website, and they've been trying to get
it to work for 3 years now.

However, it still presents an interesting case for the fact that the
_fundamental_ floor on optical scaling is absolutely tiny. It'll be
interesting to see who wins in this space :)

~~~
thatcherc
Does the paper propose building these devices in free space? I got the sense
this was all intended to be produced lithographically with waveguides, so
alignment wouldn't be a problem.

~~~
zgao
The title is wrong about it being an integrated design. Here's an excerpt from
the paper abstract:

"This paper presents a new type of photonic accelerator based on coherent
detection that is scalable to large (N≳106) networks and can be operated at
high (gigahertz) speeds and very low (subattojoule) energies per multiply and
accumulate (MAC), using the massive spatial multiplexing enabled by standard
free-space optical components"

------
madengr
This is not too surprising as analog techniques sometimes require much, much
less power, and 8-bit resolution used in these GPUs. A prism can perform a
Fourier transform more efficiently than an FPGA, as can a SAW device
performing correlation. The first SAR radars used real-time optical processing
with direct write to film.

A simple super-heterodyne receiver will run months off a single AA battery.
Try that with SDR.

------
deepnotderp
"If we don't worry about silly things like network accuracy, correctness, adc
energy, detector energy and other weird things everyone else seems to worry
about, then we get lower energy numbers than anyone else!"

------
p1mrx
When they use light to perform matrix multiplication, is the product
deterministic, or an approximation? Digital computers spend an enormous amount
of energy on always getting the same answer, so if an algorithm doesn't need
that level of precision, it makes sense that an efficient hardware
approximation could be practical.

~~~
dooglius
It sounds like it's an approximation, but the abstract indicates that it can
be used for training, not just inference, so it shouldn't matter a whole lot--
if the training process includes the errors, the neural net will be trained to
account for them.

~~~
currymj
In fact with current neural nets on floating point hardware, people use
dropout which is just deliberately introducing random errors.

Which is why, only somewhat tongue in cheek, people sometimes describe the
rounding errors from using lower-precision arithmetic as "free
regularization".

So any noise from this being an analogue system would likely improve
performance (at least in terms of training).

~~~
deepnotderp
It's a popular misconception that neural nets don't really care about
precision.

8- bit floats with 16 bit master copies show clear degradation even with
careful swamping prevention and stochastic rounding. [0]

In practice, you want fp16 with fp32 master copies, and it turns out that if
done improperly, even fp32 can cause accuracy losses sometimes [1].

So, you see, it's not actually the case that you can go proceed directly to
analog computing without a worry about silly things like correctness, accuracy
and precision.

[0] [https://papers.nips.cc/paper/7994-training-deep-neural-
netwo...](https://papers.nips.cc/paper/7994-training-deep-neural-networks-
with-8-bit-floating-point-numbers.pdf)

[1]
[https://arxiv.org/pdf/1710.03740.pdf](https://arxiv.org/pdf/1710.03740.pdf)

(links are in pdf, sorry mobile users)

~~~
p1esk
And yet we have analog neural nets in our heads which most likely don’t need
more than a few bits of precision.

~~~
thunderbird120
They don't use gradient decent though. The learning mechanism used by
biological neural networks has almost nothing in common with the one used by
most artificial neural networks.

~~~
p1esk
Then perhaps we should try to understand that learning mechanism. Currently
there's almost no communication between DL and neuroscience communities.

~~~
thunderbird120
The learning mechanism[1] is actually relatively well understood (at least as
far as neuroscience goes) but it isn't used for ANNs because Spiking Neural
Networks(SNNs) don't map well onto conventional computing hardware whereas the
matrix multiplication used by conventional ANNs does.

I agree that it's worth researching but currently it is quite difficult to
compete with conventional ANNs due to the commonality of processors well
suited for ANNs.

[1][https://en.wikipedia.org/wiki/Spike_timing_dependent_plastic...](https://en.wikipedia.org/wiki/Spike_timing_dependent_plasticity)

~~~
p1esk
STDP is one of the early steps towards understanding the learning mechanism(s)
in the brain, but we are a long way from enough understanding to actually
reproduce it.

Not only we don't know how to train spiking networks, we don't even know how
the information is encoded: pulse frequency or pulse timings or ...? No one
knows. How can you compete with anything if you have no idea how it works?

Also, this has nothing to do with computing hardware. You can easily simulate
anything your want on conventional processors. Huge computing clusters have
been built for spiking model simulations, and nothing interesting came out of
it. Invent the algorithm first, then we will build the hardware for it.

~~~
thunderbird120
STDP based SNNs don't take much secret sauce to function reasonably well for
basic tasks[1]. Some people have even started to apply reward modulated STDP
to the problem which is nice to see since it's a more complete version of
STDP[2]. Even so, the amount of research in the area is tiny. I agree that you
need to figure out the whole working algorithm before trying to scale things
up but to do that you need a research base and currently that is difficult
because running large SNNs is very slow for an average researcher without
access to specialized hardware. The primary factor which kicked off the deep
learning revolution was the idea that suddenly you could train a DNN 10x
faster using a GPU rather than a CPU. Without this revelation, most
researchers just wouldn't bother because DL research took too long. GPUs are
common and lots of people had access to them. SNNs obviously can be simulated
on any CPU but for large models it's incredibly slow. SNNs have an incredible
amount of potential paralellizeability but all of it is in the form of a
massively asynchronous system where the order of events is critically
important. CPUs and GPUs are unable to take advantage of most of the potential
speedups inherent to this structure outside of specific cases where you have
groups of non-interacting neurons. Due to the event based rather than signal
based nature of SNNs, batched training also seems to be substantially less
effective than in ANNs further complicating training.

[1][https://arxiv.org/pdf/1611.01421.pdf](https://arxiv.org/pdf/1611.01421.pdf)

[2][https://arxiv.org/pdf/1705.09132.pdf](https://arxiv.org/pdf/1705.09132.pdf)

~~~
p1esk
The main problem with SNNs is not that they don't work well. It's not that
they are slow to run on CPUs/GPUs. Those would indeed be temporary problems.
The fundamental problem is we don't know the right abstraction level to
imitate brain operations. It's far from clear that simulating spikes is
necessary to implement brain's "learning algorithms". There are two main
arguments for using spikes: "biological realism", and "energy efficiency".
Neither is convincing: if you want to simulate a conventional CPU you don't
want to simulate IV curves for transistors. You don't want to simulate CMOS
gates. No, because the lowest abstraction level necessary to understand the
computer operation is boolean logic. Anything below that is completely
irrelevant. I strongly suspect that simulating spikes is below the lowest
relevant abstraction level. By the way, Jeff Hawkins of Numenta agrees with
me, and he's pretty strict about "biological plausibility" of his neocortex
algorithms. As for energy efficiency - sure, spikes might be the most
efficient way to compute and encode information given the constraints of
biological brains. But why should we care about those constraints? We are
building computing machinery in silicon, using vastly different technology,
circuits, and tricks from what's available in "wetware". It does not make any
sense to copy biological evolution tricks to improve efficiency when building
things in silicon. Neuromorphic hardware papers always mention energy
efficiency and then proceed to compare their analog spiking chips to digital
(!) chips. That's ridiculous! I can't think of a single good argument why
would spiking analog circuits be more energy efficient than non-spiking analog
circuits, if we are talking about any existing computing hardware technology
(or one that's likely to be developed in the foreseeable future).

Deep learning took off in 2012 not because faster hardware allowed us to
develop good algorithms. The algorithms (gradient descent optimization,
backpropagation, convolutional and recurrent layers) have been developed --
using slow hardware -- long before (Lecun demonstrated state of the art on
MNIST back in 1998). The fast hardware allowed us to scale up the existing
good algorithms. I don't see any good algorithms developed for SNNs. Perhaps
these algorithms can be developed, perhaps faster hardware is indeed
necessary, but as I argue above, the motivation to pursue this research is
just not obvious to me.

Note that we shouldn't confuse this SNN research (such as the papers you
cited), to efforts like Human Brain Project, where they're actually trying to
derive higher level brain algorithms from _accurate_ simulations of low level
mechanics. Emphasis on accurate, because as any neuroscientist will tell you
(e.g. [1]), these SNNs have very little to do with what's actually going on in
a brain on any abstraction level.

[1] [https://spectrum.ieee.org/tech-
talk/semiconductors/devices/b...](https://spectrum.ieee.org/tech-
talk/semiconductors/devices/blue-brain-project-leader-angry-about-cat-brain)

------
continuations
Why does this photonic chip only work as neural networks? What stops this
technology from being made into a photonic CPU?

~~~
tntn
It's comparable to an old school analog computer (like the ones used long ago
to simulate differential equations) using optics instead of analog
electronics. They build a physical analogue of a neural network that evaluates
the desired function.

~~~
kylek
Not a physicist but I think the idea behind this (posted a while back) is
relevant

[https://penntoday.upenn.edu/news/penn-engineers-
demonstrate-...](https://penntoday.upenn.edu/news/penn-engineers-demonstrate-
metamaterials-can-solve-equations)

------
xiphias2
Sounds amazing and scary at the same time... if it works, the amount of AI
power that it concentrates to the company that implements it can get a
monopoly in lots of industrial applications.

~~~
uponcoffee
It'll like be pursued by other companies if it's beneficial, but currently
this is theoretical, doesn't scale, and while they speak to power consumption
there's no mention to computation speed (FLOPS).

Exciting yes, but like battery tech, there's a lot to address in that last
mile to market.

------
writepub
These are still results from the simulation phase. It's best to wait for
actual results from a functional optical chip

------
vturner
Often wondered why we don't use light (photons) for computing. It seems we
could control frequency and compute with it?

~~~
RandomTisk
I wrote some unpublished fiction once about a super computer that used light
as the interconnect fabric between nodes/CPUs, where each core could talk to
any other core with virtually no latency and high bandwidth. I guess it's
never been used (widely) because it's easy enough to run copper.

~~~
solarkraft
Could you elaborate on the work? Did you write about unexpected implications
of this technology?

~~~
RandomTisk
Nothing that visionary or even self aware, but imagine a giant sphere, sealed
with an inside vacuum and also coated on the inside with IC's connected via
copper for power and basic communication but each IC also had a
nodule/transceiver that could communicate with every other IC via the light
spectrum, because every IC had line of sight to the others, all submerged in
liquid nitrogen or some other coolant. The premise was a giant campus like a
school that users would be able to carry near-dumb terminals around, like
smartphones with very little power but if they needed computational power all
they had to do to tap into the sphere, by being in visible range of
receptacles all around the campus both inside and out. And the sphere could
borrow processing power from unused devices if they were available. So light
was the practical connection for users, but also used inside the sphere.

