Why would you want to limit yourself to passive cameras and make your life harder? This is like limiting yourself to flapping bird wings to make airplanes.
No, it's like limiting yourself to using skis to move down a ski slope. He's right: the roads are design to be navigated using vision. Signage, regulations, paint, curbs, etc. There's no proof that you could safely navigate the roads with LIDAR, but we prove every time we drive that you can do it with vision.
And sure, there might be a better way to get down a ski slope, but skis would be a pretty good starting point. And they guarantee you don't end up in an impossible situation because you're doing things a fundamentally different way than the system expects.
They're designed to be navigated using human vision, which has very different characteristics in terms of dynamic range, resolution, processing pipeline, inferring details about the scene based on past experiences, etc than machine vision.
Because not everyone can afford to spend $20k on extra sensors that make the car 1% safer. And holding back autonomous cars until they're perfect can kill more people than near-perfect autonomous cars. It's an economic tradeoff like any other.
His thesis is that relying on cameras makes it easier, since the entire preexisting road network is literally designed around optical navigability.
Adding other sensors isn't free. Every minute you spend on developing techniques to process inputs from other sensors, not to mention integrating their conclusions with that of other sensors, is time, money, and energy you could have used to improve your optical system.
I'm not saying I necessarily agree (though I find his position intuitively compelling), but he clearly thinks that it's easier, faster, and cheaper to bring an optical-only system to a point of reliability than it is to bring a mixed-sensor system to the same point.