I don't want to be a sourpuss, but there is a limit beyond which MPI stops scaling, when you look at the program runtime 60% of time is spent in MPI_Wait. Past 80k nodes you can't go much faster no matter how much money you got, I believe this is precisely the problem they address by intelligently grouping similar rays.
We do it occasionally for stupidly big (>64K) frames.
Doesn't work for deep though :)
I've never dealt with the new OpenEXR 2.0 deep stuff, but looking at the spec (http://www.openexr.com/openexrfilelayout.pdf page 13) the Deep Tiled stuff seems to totally be ready for this. I would have guessed that there might have been some global compression option that wouldn't be workable, but the design clearly does this all per tile instead.