It seems like they've done two things here - the physical optical apparatus setup and the algorithm.
I'm wondering why the physical alignment is so important. Are camera distortion models and mapping, view projection, etc. just too slow or low quality to run?
I suppose I'll have to take a look at their paper later.