This is really cool, but I was confused by the framing as a resistor network since I think that should be linear (to first order? I’m not an EE)
What they have is a transistor network, and they constrain all the transistors to the ohmic regime, so now the resistivity of an individual transistor can be some nonlinear function of its inputs, which is really cool, like detuning transistors to do analog computation instead of digital.
Once the training is complete, one thing I didn't see mentioned in the paper was how they maintain the charge on the gate capacitors, which is analogous to the weights in a traditional neural network if I'm understanding this correctly. Any practical implementation will need to have some practical way to refresh that on a continuous basis so that the weights don't drift. Was this perhaps mentioned somewhere and I missed it?
One can use MOS capacitors for this which essentially double up as flash drives so have stprage and refresh built in. Just like flash drives, you can only do limited amount of training on them.
>the framing as a resistor network since I think that should be linear
transistors are essentially non-linear, but they amplify and can be made to amplify linearly through the use of feedback resistors: if you divide the output voltage across a resistor pair which fixes the output as ratio to the input voltage, that geometric relationship will hold across broad range of input/outputs. for most applications you want linear amplification. (transistors work as a function of current, but passing the current through resistors yields a voltage measurement) a transistor can be thought of as resistive if you treat its voltage:current relationship as a measurement of resistance.
But seriously, my actual question is whether an ideal resistor network can compute nonlinear functions since the individual resistors are linear in their input, ignoring possible nonlinear effects like temperature dependence of the conductivity of the resistors
There is a separate interesting question of whether you could exploit the nonlinearity of a network of non-ideal resistors in practice. Given that tom7 was able to use floating point inaccuracy as a source of nonlinearity for ML, I’m guessing the answer is probably “yes, but now your system is incredibly sensitive to stuff like ambient temperature”.
Just spitballing: but I'm thinking about how slide rules are linear but can do calculations that, I believe?, are not limited to being linear. Due I imagine to logarithmic rulings and that logarithms can be added/subtracted in a linear fashion.
That's more or less what we have now. Slide rules and computers are linear systems which can compute nonlinear functions. The problem is that computing anything takes time and must be a linear set of instructions executed in series (multiplied by many parallel cores).
Using an analog approach could be vastly more efficient as operations are inherently parallel. You can fire off every neuron in a layer simultaneously and produce a result within nanoseconds, for basically any number of neurons. You could probably do the entire network as a single atomic operation, but that's a bit beyond my knowledge of neural networks
However, in a real resistor, resistivity varies with temperature, and current through the resistor produces heat. Therefore, real resistors are in fact nonlinear circuit devices. Technically speaking. In reality, whenever electrical engineers are confronted with this nonlinearity, it's a bug that must be smashed and not a feature to exploit.
Yeah, a purely resistive network will be linear in all respects (discounting thermal and related drift).
A network of transistors operating below saturation makes much, much more sense. It's really directly analogous to how we compute neuron activation in software, but inherently massively parallel.
essentially when there's an analog, or non-computational change in resistance, like a drift in value, then you can have non-linear function over that circuit
Here's a keynote talk by Andrea Liu, the lead of the project, it's a much better resource about one of the most exciting things going on in ML right now:
A couple years ago Veritasium did a video on analog computers, which included a segment on Mythic AI that uses NAND flash cells kind of "undervolted" as an analog computer to run neural networks.
Finding analog architectures for something that is largely continuous but quantized for digital circuitry right now (gradient descent) is pretty appealing. I’d love to see a toy network built out; I wonder how physically large this breadboard setup would have to be to get good results on MNIST, for instance.
As I understand it, they were able to train and classify XOR using 32 of their breadboard nodes, which visually looks roughly a 1 metre square. XOR can usually be done with 1 or 2 neurons. And MNIST digit classification with ~25,000.
Bringing all those numbers together we get 1 neuron = 32 nodes = 1sqm, would give a size of about a 900m square if breadboarded out for ordinary MNIST digits. (Assuming of course, no power transmission losses...!)
I'm hoping, if not outright assuming, that I've made some kind of catastrophic error here.
When memristor came into the scene, thought it would be a reasonable substrate for network based learning system … anyone know more about what happened after?
15 years later and nobody can make it in economical price and volume. Perhaps demand just wasn't there and we may see something take off on the coming years.
As the newcomer to the, very entrenched, block, I think the memristor has a lot of momentum to overcome. In a EE undergrad (2007, so it has been a bit) we spent plenty of time understanding resistors, capacitors, and inductors. We looked at example circuits and uses, we learned the math and theory... We developed intuition around them.
Memristors were the missing fourth, and "imagine what you could do with that!" My imagination did not extend very far. Everything was being built with those other three and the non-linear components.
It'll take a while to overcome that momentum.
I feel like IPv6 has a similar barrier. I'm mostly an infosec nerd and I've been through a lot of training and education. Never once seen IPv6 treated beyond, "it has more bytes, firewall it off".
The traditional problem that analog computers faced were that voltages could vary from run to run and thus give different results, to the point that analog computer manufacturers made their own power supplies and capacitors to extremely high tolerances.
It's not clear in the paper if this problem was addressed or if the rapid training possible meant that in practice they never had this issue.
IMHO this can become really cool if combined with mass customisation e.g. by using printed electronics. My colleague who will shortly defend his PhD is working on this [1]
- used genetic algorithms to have the FPGA identify first tones and then more complex audio sequences
- there was no clock or timer used
- when they found a good solution, they tried to copy over the FPGA "configuration" to another identical FPGA.
- that didn't work!
- they assumed it was b/c the genetic algorithm + no timer had found a quirk in the specific FPGA unit and used that to help improve the processing quality
was going to ask the same thing, "thermodynamic computing," was what I interpreted as an ASIC for training models, and then once you can train and run models on it, what do you need classical compute for.
Removing entropy from transistors is expensive - computers use just two states separated by large voltage differences. In AI, entropy isn't a problem as we don't care about repeatable results. Therefore, why not use more of the linear or even non linear range of the transistor for this purpose?
There won't be any energy efficiency improvements until they are able to make analog VLSI chips like Carver Mead's. Nice to see this idea is getting more recognition. The potential has been there for a long time, but the business went digital.
I read George Dyson’s “Analogia” with a bit of skepticism a few years ago, now all of the sudden it feels relevant (if you can make it past all of the chapters on kayak-building).
Conceptually I expect compiling NNs to hardware to eventually be done at a small scale at first. Imagine you have a relatively simple task with a fairly small NN which is being used a low latency, mass production application. If you could compile that sensor into a passive component array, that could be massive.
What they have is a transistor network, and they constrain all the transistors to the ohmic regime, so now the resistivity of an individual transistor can be some nonlinear function of its inputs, which is really cool, like detuning transistors to do analog computation instead of digital.
Here’s the preprint: https://arxiv.org/abs/2311.00537