With 128 GB strix halo, you can't do as big of a model as you would think. You c...

smilekzs · 2026-06-12T02:33:32 1781231612

A key consideration in favor of running your local LLM despite all the trouble: The commercial serving endpoint may not exist tomorrow, or at least not at the same price.

hedora · 2026-06-12T03:51:49 1781236309

My current rule of thumb is 1GB gets you 1B parameters with a big context. (Qwen 32B fits in 32GB with 200K+ contexts)

That’s with heavy compression of the weights and the context, of course.

I haven’t gone through model evaluation + shoehorning at 128GiB yet.