(I don't have any idea of what I'm talking about but) it occurs to me that if a robot does this for the frames of its video input with a regular camera, then on static environments the output of SLAM would be great.
Also, just predict what you will see after you move from the CAD scene, move, compare the actual new image with the predicted one, and dedicate most computing resources to what differs the most - now you have a robot with attention to unrecognized objects!
I had great fun re-constructing various parts of my high school. I learnt about the importance of scale, how to make levels "interesting" by making some desks fall over, and had great fun reconstructing the swimming pool, gym, and metalworking facilities.
When I showed people they were always concerned because there was a gun on the screen and I was walking around my high school. I lacked the ability to get rid of the gun, so I had to explain it every time. Some of the more stupid teachers couldn't get past the gun and see it for what it was (a student interested in programming - not a cry for help) but I ended up just not telling them anything and taking it to people with brains.
Later on I started re-constructing the Titanic from the original schematics, though I never got very far with this, because my machine only had 64MB of ram, and I used it up pretty quickly making complicated geometry.