Hacker News new | past | comments | ask | show | jobs | submit login

I have often mused that, in some ways, it seems like the transistor is really being wasted in AI applications. We use binary states in normal computing to reduce entropy. In AI this is less of a concern, so why not use more of the available voltage range? Basically, re-think the role of the transistor and re-design from the ground up - maybe NAND gates are not the ideal fundamental building block here?



People are working on that [1]. In some sense, it's a step back to analog computing. Add/multiply is possible to do directly in memory with voltages, but it's less versatile (and stable) than digital computing. So you can't do all calculations in a neural network that way, meaning some digital components will always be necessary. But I'm pretty sure analog will make a comeback for AI chips sooner or later.

[1] https://www.nature.com/articles/s41586-023-06337-5


Reminds me of my father saying something about how vacuum tubes are great integrators.


Chips are too. Opamps can add, multiply, subtract, divide, integrate and differentiate depending on how they're plugged in.


Hence the name 'operational' amplifier


Trinary however is an interesting middle; people have built trinary hardware long ago; it feels like you could make natively trinary hardware for something like this; it might even be quite a win.


People haven't built reliable ternary electronics, though. Soviets tried with Setun, but they eventually had to resort to emulating each trit with two hardware bits (and wasting one state out of the possible four).


If you are are using two bits anyway, you might as well represent (-2, -1, 0, 1) instead of ternary?


Sure, but then you lose the symmetry that makes trits so convenient for many things.


Can you make a "CMOS" three voltage level circuit though? One where the only current flow is when the state changes?

Im not in this field but that's a question that's been bugging me for a while. Off you can't do this wouldn't energy consumption balloon?


My friend was working on this in the mid-90s at Texas Instruments. Not sure what the underlying semiconductors were, but it did involve making ternary logic via voltage levels. Just searched a bit and found this TI datasheet which might be an example of it (high logic, low logic, high impedance), but maybe not: https://www.ti.com/lit/ds/symlink/sn74act534.pdf


Hadn't thought about it this way before, but given that LLMs are auto regressive (use their own data for next data), they're sensitive to error drift in ways that are rather similar to analog computers.


Analog computing for neural networks is always very tempting.

> We use binary states in normal computing to reduce entropy. In AI this is less of a concern, so why not use more of the available voltage range?

Transistors that are fully closed or fully open use basically no energy: they either have approximately zero current or approximately zero resistance.

Transistors that are partially open dissipate a lot of energy; because they have some current flowing at some resistance. They get hot.

In addition, modern transistors are so small and so fast that the number of electrons (or holes..) flowing through them in clock cycle is perhaps in the range of a few dozen to a hundred. So that gives you at most 7 bits (~log_2(128)) of precision to work with in an analog setting. In practice, quite a bit less because there's a lot of thermal noise. Say perhaps 4 bits.

Going from 1 bit per transistor to 4 bits (of analog precision) is not worth the drastically higher energy consumption nor the deviation from the mainstream of semi-conductor technological advances.


As someone who knows almost nothing about electronics I assume you’d want a transistor which can open in two ways: with positive and negative voltage. I’ve seen TNAND built out of normal transistors, not sure if such exotic ones would help even if they were physically possible.


That's for building ternary gates. They are still discrete, so it might be possible to do something here.

I was talking about analogue computing.


the reason why digital/numeric processing won is the power loss in the analog world. when design an analog circuit the next processing stage you add at the end has impact on the ones before it.

this then require a higher skill from the engineers/consumers.

if you want to avoid that you need to add op-amps with a gain of 1 at the boundary of each one, this also that care of the power loss at each stage.

the other part is that there's a limit of to the amount of useful information/computation you can do with analog processing too once you take into account voltage noise. when you do a comparison there are stages where analog win but also place where where digital wins.

I'll edit later this with a link to some papers that discuss these topics if I manage to find them in my mess.


Good explanation. When I was working at a semiconductor manufacturer, our thresholds were like 0 - 0.2V to 0.8 - 1.0V. Additionally, if you look at QLC SSDs, their longevity is hugely degraded. Analog computing is non-trivial, to say the least.


For the specific case of neural networks they seem to be very resistant to noise. That's why quantization works in the first place.


You also have literal power losses, as in waste heat, to deal with.

See https://news.ycombinator.com/item?id=39545817


The Veritasium Youtube channel did a video about this about a year ago: https://www.youtube.com/watch?v=GVsUOuSjvcg

They visit Texas company Mythic AI to discuss how they use flash memory for machine learning. There's a California company named Syntiant doing something similar.


I was thinking of this exact video, crazy to think that the principle is gaining momentum


It would be something of a full circle I feel went back to dedicated circuits for NNs - that's how they began life when Rosenblatt built his Perceptron.

I remember reading a review on the history in grad school (can't remember the paper) where the author stated that one of the initial interests in NNs by the military was their distributed nature. Even back then, people realized you could remove a neuron or break a connection and they would still work (and even today, dropout is a way of regularizing the network). The thinking was that being able to build a computer or automated device that could be damaged (radiation flipping bits, an impact destroying part of the circuit, etc) and still work would be an advantage given the perceived inevitably of nuclear war.

Compared to a normal von Neumann machine which is very fault intolerant - remove the CPU and no processing, no memory=no useful calculation, etc. One reason people may have avoided further attempts at physical neural networks is it's intrinsically more complex than von Neumann, since now your processing and memory is intertwined (the NN is the processor and the program and the memory at the same time).


>von Braun machine

von neumann? though it is funny to imagine von braun inventing computer architecture as a side hustle to inventing rocket science.


Oh fuck, thanks for catching that!


The US military’s interest in network robustness led to the internet if I’m not mistaken.

Also preceding the perceptron was the McCulloch & Pitts neuron, which is basically a digital gate. NNs and computing indeed have a long history together.


>maybe NAND gates are not the ideal fundamental building block here?

It's my long held opinion that LUTs (Look Up Tables) are the basis of computation for the future. I've been pondering this for a long time since George Gilder told us that wasting transistors was the winning strategy. What could be more wasteful than just making a huge grid of LUTs that all interconnect, with NO routing hardware?

As time goes by, the idea seems to have more and more merit. Imagine a grid of 4x4 bit look up tables, each connected to its neighbors, and clocked in 2 phases, to prevent race conditions. You eliminate the high speed long lines across chips that cause so much grief (except the clock signals, and bits to load the tables, which don't happen often).

What you lose in performance (in terms of latency), you make up for with the homogenous architecture that is easy to think about, can route around bad cells, and be compiled to almost instantly, thanks to the lack of special cases. You also don't ever have to worry about latency, it's constant.


It’s been a long time since I worked on FPGAs, but it sounds like FPGAs! What do you see as the main differences?


No routing, no fast lines that cut across the chip, which cut way down on latency, but make FPGAs harder to build, and especially hard to compile to once you want to use them.

All that routing hardware, and the special function units featured in many FPGAs are something you have to optimize the usage of, and route to. You end up with using solvers, simulated annealing, etc... instead of a straight compile to binary expressions, and mapping to the grid.

Latency minimization is the key to getting a design to run fast in an FPGA. In a BitGrid, you know the clock speed, you know the latency by just counting the steps in the graph. BitGrid performance is determined by how many answers/second you can get from a given chip. If you had a 1 Ghz rack of BitGrid chips that could run GPT-4, with a latency of 1 mSec per token, you'd think that was horrible, but you could run a million such streams in parallel.


I have heard of people trying to build analog AI devices but that seems like years ago, and no news has come out about it in recent times. Maybe it is harder than it seems. I bet it is expensive to regulate voltage so precisely and it's not a flexible enough scheme to be support training neural networks like we have now, which are highly reconfigurable. I've also heard of people trying to use analog computing for more mundane things. But no devices have hit the market after so many years so I'm assuming it is a super hard problem, maybe even intractible.


Perhaps another variation on the idea is to allow a higher error rate. For example, if a 0.01% error rate was acceptable in AI, perhaps the voltage range between states could be lowered (which has a quadratic relationship to power consumption) and clock speed could increase.


Bits are copyable without data loss. Analog properties of individual transistors are less so.


Yes, but the whole point of the link submitted to HN here is that in some applications, like machine learning, precision doesn't matter too much.

(However, analog computing is still a bad fit for machine learning, because it requires a lot more power.)


Exact copies aren't just about precision but also about reproducibility.


You can keep your weights in a discrete format for storage, but do inference and training in analog.


That only prevents analog copy degradation. It doesn't give you reproducibility. Reproducibility means running the same process twice with the same inputs and getting the same outputs. E.g. to later prove that something came from an LLM and not a human you could store the random seed and the input and then reproduce the output. But that only works if the network is digital.


This reminds me of this article[1] recently linked on HN, talking about how Intel had an analog chip for neural nets in the 90s, if I understood correctly

[1] https://thechipletter.substack.com/p/john-c-dvorak-on-intels...


It’s going to be funny if it turns out biology was right all along and we end up just copying it.


I have heard that the first commercial neural network chip (by Intel, in the 90s) was analog ?


It sure looks like this might pair well with ternary optical computing advances:

https://ieeexplore.ieee.org/document/9720446


Hmm, maybe some (signaling) inspiration from biology other than neural signaling.


Next Up: Quantum AI


let's use cells


We already do.


You could call them connection machine and perhaps have an llm trained on Feynman help with the design.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: