It wasn't explained in the benchmark, but the only reason I could imagine was the Iris chip worked as an L4 cache because the benchmark was not doing graphics stuff. That is what the Iris chip does, it sits right there in the socket with a whole bunch of memory available for the iGPU or work as L4 cache if available.
It's also a great way to do (almost) zero cost transfers from main memory to (i)GPU memory -- you'd do it at the latency of the L3/L4 boundary. With intel, that unlocks a few GFLOPs of processing power -- in theory, your code would have to be adapted to work this in a reasonable way, of course.
To sum things up, I agree with you, memory is a path that holds big speedups for processors. Don't know if "the Iris way" is the best path, but it indeed showed promise. Shame that Intel decided to lock it up for the ultrabook processors mostly.
My new Thinkpad has nvme and the difference is huge compared to my very fast desktop at work which has SATA connected SSD's.