in my understanding the data centers are mostly for scaling so that many people can use an LLM service at a time and training so that training a new LLM’s weights won’t take months to years because of GPU constraints.
Its already possible to run an LLM off chips, of course depending on the LLM and the chip.
Its already possible to run an LLM off chips, of course depending on the LLM and the chip.