Currently, not at all. You need low latency, high bandwidth links between the GP...

bob1029 · on May 31, 2023

You could easily fit a 1T parameter model on a MacBook if you radically altered the architecture of the AI system.

Consider something like a spiking neural network with weights & state stored on an SSD using lazy-evaluation as action potentials propagate. 4TB SSD = ~1 trillion 32-bit FP weights and potentials. There are MacBook options that support up to 8TB. The other advantage with SNN - Training & using are basically the same thing. You don't have to move any bytes around. They just get mutated in place over time.

The trick is to reorganize this damn thing so you don't have to access all of the parameters at the same time... You may also find the GPU becomes a problem in an approach that uses a latency-sensitive time domain and/or event-based execution. It gets to be pretty difficult to process hundreds of millions of serialized action potentials per second when your hot loop has to go outside of L1 and screw with GPU memory. GPU isn't that far away, but ~2 nanoseconds is a hell of a lot closer than 30-100+ nanoseconds.

Edit: fixed my crappy math above.

treprinum · on May 31, 2023

That's been done already. See DeepSpeed ZeRO NVMe offload:

https://arxiv.org/abs/2101.06840

znagengast · on May 31, 2023

What if you split up the training down to the literal vector math, and treated every macbook like a thread in a gpu, with just one big computer acting as the orchestrator?

Filligree · on May 31, 2023

You would need each MacBook to have an internet connection capable of multiple terabytes per second, with sub millisecond latency to every other MacBook.

AnthonyMouse · on May 31, 2023

FWIW there are current devices that could fit a model of that size. We had servers that support TBs of RAM a decade ago (and today they're pretty cheap, although that much RAM is still a significant expense).

tmaly · on June 1, 2023

I have an even more of a stretch question.

What pieces of tech would need to be invented to make it possible to carry a 1T model around in a device the size of an iPhone?