Thanks this is very helpful for planning out localLLM buy. Sounds like we are still at least 1 generation out (DDR6 500-700GB/s memory) from getting to that magic ~25-30TG/s. Nemotron 3 Super architecture sounds promising.
Medusa Halo is on my wishlist, but I'm hearing late 2027 :(
M5 Ultra may be a better near-term option, expected in June. Supposedly ~1.2 TB/s unified memory, unsure of whether Apple will revive the 512 GB SKU or limit to 256 GB, but the new Neural Engine in every GPU core should help dramatically. These were always compute limted rather than bandwidth limited, even in M3 Ultra era.
The big cost of course being that you're locked into Apple silicon and Apple's walled garden. You can still use MacOS without creating an Apple account... for now...
At least Apple Silicon holds resale value remarkably well.