Hacker Newsnew | past | comments | ask | show | jobs | submit | vafaeim's commentslogin

744B params is ~1.5TB VRAM (FP16). Even at int4, you need ~372GB just to load the weights (MoE sparsity saves FLOPs, not VRAM capacity). That's not a workstation, that's a rack with 5x H100s or a cluster of 8x RTX 6000 Adas.

The only real use cases here are strict data sovereignty (can't use US APIs) or using it as a teacher for distillation. Otherwise, the ROI on self-hosting is nonexistent.

Also, the disconnect between SOTA on Terminal bench and ~30% on Humanity's Last Exam suggests it overfitted on agent logs rather than learning deep reasoning.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: