They also mention that, "this result was driven by a combination of culling techniques including camera frustum, facing angles and distance, creating a variance of dense to coarse patches of sand for optimum efficiency."
So while they had grains each of ~5000 polygons available to them and positions that they could occupy, they cut down on that polygon count using a combination of techniques (many common in games too) to make it all work. So the polygons of any grains outside the camera frustum were ignored. The faces of the grains point away from, I presume, any light source and therefor not contributing to the path tracer were ignored and also some soft of LOD to reduce polygon count as distance.
As an amateur blender user I'm reminded of microdisplacement as a way for us mere mortals to achieve something in this space: