744B params is ~1.5TB VRAM (FP16). Even at int4, you need ~372GB just to load the weights (MoE sparsity saves FLOPs, not VRAM capacity).
That's not a workstation, that's a rack with 5x H100s or a cluster of 8x RTX 6000 Adas.
The only real use cases here are strict data sovereignty (can't use US APIs) or using it as a teacher for distillation. Otherwise, the ROI on self-hosting is nonexistent.
Also, the disconnect between SOTA on Terminal bench and ~30% on Humanity's Last Exam suggests it overfitted on agent logs rather than learning deep reasoning.
The only real use cases here are strict data sovereignty (can't use US APIs) or using it as a teacher for distillation. Otherwise, the ROI on self-hosting is nonexistent.
Also, the disconnect between SOTA on Terminal bench and ~30% on Humanity's Last Exam suggests it overfitted on agent logs rather than learning deep reasoning.
reply