"But GC already provides better throughput than manual memory allocation in practically all circumstances"
You could drive a bus through the exceptions let through with the "practically" in that claim. In the kinds of problems I solve the single biggest driver to better throughput is cache locality/branch prediction. Every time I go up a level in memory cache I lower my throughput.
There is nothing saying that GC based solutions couldn't get to the point where they are better about cache locality than manual allocation but they aren't there yet.