The trick is that there is an algorithm that describes the world so nothing needs to be stored until it is modified, and even then those changes are limited to a certain view distance. On the cpu side, a much lower resolution voxel data set is used for collision, pathfinding, etc (Also local to the camera)

