Well, their huge GPU clusters have "insane VRAM". Once you can actually load the...

		gundmc 2 days ago \| parent \| context \| favorite \| on: Ask HN: How can ChatGPT serve 700M users when I ca... Well, their huge GPU clusters have "insane VRAM". Once you can actually load the model without offloading, inference isn't all that computationally expensive for the most part.