Strangely, though - the memistor is hardly ever mentioned; it does have some downsides (mainly difficulty to miniaturize), so maybe that's it? Maybe somebody here can play with it...
Interestingly, on a side note - it's possible to homebrew a memristor as well:
You just helped me rediscover a lost aspect of my childhood. Sparkbangbuzz.com was one of a handful of science websites my 14 year old self couldn't get enough of. I'm pretty sure that's where I first learned of the leyden jar and was inspired to build one. I'm so glad to see that site still exists in all its Web 1.0 glory. :)
I'm excited by the potential of memristors, which are already being used in the lab for toy deep learning tasks -- for example, see https://www.nature.com/articles/s41467-017-02337-y . Over time, I would expect memristors to be used for increasingly larger, more challenging AI tasks.
The big deal about memristors for AI is that they have memory and therefore produce different readout outputs for different input sequences over time -- out-of-the-box, without requiring any training -- and in a way that can make sequences linearly separable under fairly general conditions. For instance, in the Nature paper I linked to above, the researchers took a network of memristors, which they call the "reservoir," added a linear layer with a SoftMax on top of the reservoir, trained this hybrid network on a lower-resolution variant of MNIST (feeding pixel values over time, as varying voltages), and achieved classification accuracy superior to a tiny neural net despite having only 1/90th the number of neurons.[1] Note that they only trained the added layer; they did not have to train the reservoir. Figure (a) in this image has a simplified diagram of the reservoir + added layer architecture: https://www.nature.com/articles/s41467-017-02337-y/figures/1 -- only the matrix Θ had to be learned. The potential, over time, is for having highly scalable hardware neural-net components for learning to recognize and work with sequences.
PS. For clarity's sake, I'm ignoring a lot of important details and playing fast and loose with language. If you're really curious about this, please read the IEEE article and the Nature paper.
Of course, because the nets we're talking about are really, really tiny.
As you can see for yourself in figure [1]c, the memristor net used in that paper has only 15 units (yes, fifteen): five memristors in the "reservoir" and 10 output neurons. The entire net is one linear transformation followed by a SoftMax, with a total of 5 × 10 = 50 parameters.
It's an impressive result that only hints at the things that will be possible when we have nets with millions or billions of memristor units.
I don't think it says anything about what a million or billion memristor network would be able to do. I don't think they scale in the same way normal machine learning algorithms do.
I was speaking recently to a veteran SV hardware engineer, with a masters in EE from Berkeley, who has worked at leading SV companies. They had never heard of a memristor, neither the device nor the term. Do I live in a bubble, or are you as surprised as I am?
Memristors are ultra cutting edge and mostly theoretical/still being studied in labs. If this person is an engineer in industry it makes sense they have not heard about it.
EE is a huge field. It's hard to keep up with new developments even if you have a broad foundation. That being said, memtransistors and memristors have been theorized and subsequently developed in research for quite some time now. Memrsistors were first theorized in the 50s and then in the 70s were proposed to be a fourth fundamental electrical circuit element alongside resistors, capacitors, and inductors.
Given that the existence of memristors was first predicted by a professor at Berkeley (Leon Chua), yes I am surprised. Surely he had to have heard of them.
Memristors are a pretty niche topic. Until somebody made some, they were an obscure topic in circuit theory. Even now that they exist, they've yet to be commercialized. It makes sense for a practicing EE who may not follow the tech press all that closely to not know about them.
Physics. You don't have to pay the speed of light transfer latency when you store a bit locally. Not sure of how much more efficient the memristor is over having transistors + local storage.
Hopefully someone close to neural networks can describe how useful this primitive is as a node in neural networks. If it is useful then, is this a better primitive than simply doing it in "software" using a GPU, for example?
Wouldn’t the memtransistor be faster though? Even if you use CUDA, you still have to compile down to assembly and probably use some kernel level module to communicate with the GPU.
A physical memtransistor will be a lot faster than emulating it in software, but you'll also be limited by the original connections and design of the chip. Not as easy to reconfigure as writing new code.
I mean, ideally I think it would be written as an FPGA-like piece of hardware where the connections can be synthesized from a high level description language. A la [0].
Then you will rapidly run into the same problems that FPGAs have, which is that all that connective tissue isn't free and has to pay for itself. That turns out to be a pretty steep bar to leap.
Well the problems of FPGA have more to do with synthesibility of the high level HDL and having to fit that generic representation in a predefined LUT model. its alleviated quite a bit when you are designing 'application specific' FPGA with different substrate. IIRC Mathstar tried to do this some time ago and so did Ambrics. at the time it suffered from a solution looking for problem syndrome & failed.
the big idea here is that IF your base element (memristor crossbar here) is suitable for such rapidly reconfigurable bus architecture (which it seem like it is) then you can use it to synthesis a single neuron directly. which is a huge leap over the next best GPU/TPU based architecture based on instruction fetch-decode-execute model. based on what I have read few years ago you can have a 20M neurons simulated with memristors in about a cm2 die. that is human level integration density even if you totally ignore the vast difference in switching rate (100Hz vs 1+GHz).
I'm not an expert, but my basic knowledge of electronic and with a little intuition, a neuron can hold memory while a transistor cannot. Since processors are based on a separated memory and data calculation, it makes sense that a standard unit of a processor that can do both retain memory and do calculation should scale better, just like it seem to do in a biological brain.
Even when you look at CPU performance, you can often pinpoint bottlenecks right at the amount of available L1 or L2 cache. Cache locality has almost always been the limitations of performance, because to process data, you must first access and write data. So if memory is more closely available, then everything should always be fast.
Also, remember you cannot do software on a GPU, because GPUs, even with CUDA or OpenCL, are not built to run software for the simple fact that GPUs don't do error correction. OpenCL and CUDA will only help when processing data that can be parallelized, so where the result will not risk to be jeopardized if errors accumulate.
I don't know about AMD, but NVIDIA GPUs also have internal protection. Tesla and Quadro GPUs have had internal error detection on registers and cache since Fermi at least.
But this is totally irrelevant anyway, the comparison we started with is to an analog alternative. Anything analog will have strictly worse noise and error problems. In neural networks, errors are probably not even a problem for analog implementation, so they definitely aren't a problem for GPU implementations.
A 1-bit SRAM cell consists of 6 transistors (or 4 transistors and 2 resistors). While a single transistor doesn't have a capacity for memory, it is trivial to build small circuits of transistors that have memory.
The fundamental problem in modern computers isn't memory: it's in moving data around. The speed of light in a vacuum gives you only a few cm of distance to move information in a single clock cycle, and the actual electronic propagation inside the processors is substantially slower. In fact, the governing factor of the size of L1 cache is the time it takes to actually read a value. At the scale of supercomputers, the topology of the interconnect has major implications for the actual performance on HPC applications.
Saying that bringing memory closer is the determining factor in speed ignores the fact that the size of memory has implications in the time to access it. The innovation in CPUs has been about minimizing latency essentially by developing better heuristics in what it might be. GPUs innovate by not trying to minimize latency but instead trying to overprovision cores and rely on batched memory access (consequently, GPUs are not good at handling codes that rely on irregular memory access patterns).
Now we have something called a "memtransistor". We also have something called a "memristor".
But invariably in these discussions, there's another device that exists that almost never seems to get a mention. It's called a "memistor":
https://en.wikipedia.org/wiki/Memistor
It was first developed in 1960, and used for a couple of related hardware neural network architectures - ADALINE and MADALINE.
What is also interesting about this device, is that it can be relatively easily built by a hobbyist, as shown by:
http://www-isl.stanford.edu/~widrow/papers/t1960anadaptive.p...
Strangely, though - the memistor is hardly ever mentioned; it does have some downsides (mainly difficulty to miniaturize), so maybe that's it? Maybe somebody here can play with it...
Interestingly, on a side note - it's possible to homebrew a memristor as well:
http://sparkbangbuzz.com/memristor/memristor.htm