Yeah, I find it odd that they're bringing up Elon's statement about LiDAR, but then completely ignore that they spoke about creating 3D models based on video. They even showed  how good of a 3d model they could create based on dat from their cameras. So they could just as well annotate in 3D.
An interesting intermediate case between a pure video system and a lidar is a structured light sensor like the Kinect. In those you project a pattern of features onto an object in infrared. Doesn't work so well in sunlight but be interested in learning if someone had ever tried to use that approach with ego motion.
Aren't those the types of walls, barriers, truck behinds that tesla's keep ramming into? :S
Then you'd get all that sweet, sweet depth data that lidar provides but cheaper and at a much higher resolution.
> One approach that has been discussed recently is to create a pointcloud using stereo cameras (similar to how our eyes use parallax to judge distance). So far this hasn’t proved to be a great alternative since you would need unrealistically high-resolution cameras to measure objects at any significant distance.
Doing some very rough math, assuming a pair of 4K cameras with 50 degree FOV on opposite sides of the vehicle (for maximum stereo separation) and assuming you could perfectly align the pixels from both cameras, it seems you could theoretically measure depth with a precision of +/-75 cm for an object 70 meters away (a typical braking distance at highway speeds.) In practice, I imagine most of the difficulty is in matching up the pixels from both cameras precisely enough.