Hacker News new | past | comments | ask | show | jobs | submit login

Currently, not at all. You need low latency, high bandwidth links between the GPUs to be able to shard the model usefully. There is no way you can fit an 1T (or whatever) parameter model on a MacBook, or any current device, so sharding is a requirement.

Even if it that problem disappeared, propagating the model weight updates between training steps poses an issue in itself. It's a lot of data, at this size.




You could easily fit a 1T parameter model on a MacBook if you radically altered the architecture of the AI system.

Consider something like a spiking neural network with weights & state stored on an SSD using lazy-evaluation as action potentials propagate. 4TB SSD = ~1 trillion 32-bit FP weights and potentials. There are MacBook options that support up to 8TB. The other advantage with SNN - Training & using are basically the same thing. You don't have to move any bytes around. They just get mutated in place over time.

The trick is to reorganize this damn thing so you don't have to access all of the parameters at the same time... You may also find the GPU becomes a problem in an approach that uses a latency-sensitive time domain and/or event-based execution. It gets to be pretty difficult to process hundreds of millions of serialized action potentials per second when your hot loop has to go outside of L1 and screw with GPU memory. GPU isn't that far away, but ~2 nanoseconds is a hell of a lot closer than 30-100+ nanoseconds.

Edit: fixed my crappy math above.


That's been done already. See DeepSpeed ZeRO NVMe offload:

https://arxiv.org/abs/2101.06840


What if you split up the training down to the literal vector math, and treated every macbook like a thread in a gpu, with just one big computer acting as the orchestrator?


You would need each MacBook to have an internet connection capable of multiple terabytes per second, with sub millisecond latency to every other MacBook.


FWIW there are current devices that could fit a model of that size. We had servers that support TBs of RAM a decade ago (and today they're pretty cheap, although that much RAM is still a significant expense).


I have an even more of a stretch question.

What pieces of tech would need to be invented to make it possible to carry a 1T model around in a device the size of an iPhone?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: