vafaeim's comments

vafaeim · 2026-02-12T15:37:24 1770910644

744B params is ~1.5TB VRAM (FP16). Even at int4, you need ~372GB just to load the weights (MoE sparsity saves FLOPs, not VRAM capacity). That's not a workstation, that's a rack with 5x H100s or a cluster of 8x RTX 6000 Adas.

The only real use cases here are strict data sovereignty (can't use US APIs) or using it as a teacher for distillation. Otherwise, the ROI on self-hosting is nonexistent.

Also, the disconnect between SOTA on Terminal bench and ~30% on Humanity's Last Exam suggests it overfitted on agent logs rather than learning deep reasoning.