
New Hardware for Massive Neural Networks (1988) [pdf] - Cieplak
https://papers.nips.cc/paper/22-new-hardware-for-massive-neural-networks.pdf
======
carapace
Oh hey! This looks _very_ interesting!

> Transient phenomena associated with forward biased silicon p \+ - n - n \+
> struc- tures at 4.2K show remarkable similarities with biological neurons.
> The devices play a role similar to the two-terminal switching elements in
> Hodgkin-Huxley equivalent circuit diagrams. The devices provide simpler and
> more realistic neuron emulation than transistors or op-amps. They have such
> low power and current requirements that they could be used in massive neural
> networks. Some observed properties of simple circuits containing the devices
> include action potentials, refractory periods, threshold behavior,
> excitation, inhibition, summation over synaptic inputs, synaptic weights,
> temporal integration, memory, network connectivity modification based on
> experience, pacemaker activity, firing thresholds, coupling to sensors with
> graded sig- nal outputs and the dependence of firing rate on input current.
> Transfer functions for simple artificial neurons with spiketrain inputs and
> spiketrain outputs have been measured and correlated with input coupling.

~~~
darkmighty
I suspect if you rely on certain analog properties (with some good precision)
then the elements won't scale down very well (down to current transistor
scale). At the very limit of element size I would think reliable elements
inevitably degrade to binary gates, which are the simplest elements in
behavior (at one point even binary gates become impossible of course due to
phenomena like leakage, quantum tunelling, etc).

The brain as far as I understand does so much with large, slow elements
(neurons) by having them fill a volume, be sparsely activated (i.e. mostly a
huge memory), and other advanced communication methods (temporal pulse
position modulation/frequency from spiking? neurotransmitters?).

Current ML is more densely activated, high-frequency networks. I'm not sure we
could revert to the brain-like architecture unless we could get the cost of
sillicon manufacturing several orders of magnitude down, enough that we could
just fabricate a large block of stacked complex elements. A large part of the
philosophy of nodes would need to be reworked (much lower frequency, lower
leakeage, lower power consumption), as processes are optimized for >100MHz
freqs; just so internal memory elements would keep at acceptable temperatures.
Currently you could fit about 2000 GPUs in a 10cm^3 space (assuming 1mm die
thickness), which would cost about $1.5M usd. And couldn't do much, because it
would quickly overheat on reasonable loads, and because I don't think we have
the technology to interconnect it all.

~~~
dkfellows
The brain definitely is able to use not just spiking frequency (which is
approximately the same as EEG voltage, though not really) but also spiking
patterns to encode information. We've observed some highly interesting phase
locked loops showing various higher-order patterns in simulations (we simply
don't have fine enough tools to look for the equivalent in the biology).

The variation in neurotransmitters allows for different sorts of activation,
typically with different physical parameters (size of activation, time over
which it decays) and multiple ways they interact with the other
neurotransmitters.

Sparse networks are not understood to anything like the same extent as dense
matrices. And another key property that most ML is missing is large numbers of
feedback loops. Again, that makes predicting behaviours extraordinarily
difficult.

------
p1esk
Another interesting quote:

> We estimate that a system with 10^11 active 10μm x 10μm elements (comparable
> to the number of neurons in the brain) all firing with an average pulse rate
> of 1KHz (corresponding to a high neuronal firing rate) would consume about
> 50 watts. The quiescent power drain for this system would be 0.1 milliwatts.

Note they are referring to 10μm process technology. Modern state of the art
technology would probably get the power consumption of such brain scale system
down to under a single watt.

------
stanfordkid
Wow super interesting. I'm assuming this would be a fixed network thought?
Could you adapt this hardware to change the weights and have the network learn
(rather than passively interpret input data?)

~~~
dkfellows
We can do dynamic networking — the routing tables for the hardware is
reloadable at runtime — but it's sufficiently difficult to do the routing
computations that we only do that when the simulation is stopped at the
moment. (We actually usually use the machine to compute its own routing
tables; that's the fastest approach since it is a massively parallel problem.)

However, the _effective_ network can route dynamically (by faking things on
top of an initially-zero-weight all-to-all connection pattern between two
neuron populations). One of our PhD students is working on this, and on the
types of dynamic online learning that this enables, modelling the dynamic
generation and removal of synapses that occurs in biological neurons. We also
support tuning of connection weights in response to the history of synaptic
activity via Spike Timing Dependent Plasticity (STDP), and have done for a few
years (using earlier generations of the hardware config).

~~~
dkfellows
FWIW, this is based on SpiNNaker, which is a system with a million CPU cores
and a custom low-power multicast network backplane. It's possible to do
simpler neural models with much less power than we do (and some of our
competitors do just that) but it's not at all clear that those simple models
are actually sufficiently biologically relevant to produce enough of the
phenomena that we care about. Having a system flexible enough to support a
dynamic research agenda is vital, but does increase the energy cost per neuron
and per synapse.

