Hacker News new | past | comments | ask | show | jobs | submit login

> taking consecutive frames (2D images) we can estimate per pixel depth

Yeah, I find it odd that they're bringing up Elon's statement about LiDAR, but then completely ignore that they spoke about creating 3D models based on video. They even showed [0] how good of a 3d model they could create based on dat from their cameras. So they could just as well annotate in 3D.

0: https://youtu.be/Ucp0TTmvqOE?t=8217

Egomotion is very useful but relies on being able to reliably extract features from objects which isn't always possible. Smooth, monochromatic walls do exist and it's imperative a car be able to avoid them. It is possible for a human to figure out (almost always) their shape and distance form visual cues but our brains are throwing far more computational horsepower at the task than even Tesla's new computer has available. But perhaps knowing when it doesn't know is sufficient for their purposes and probably an easier task.

An interesting intermediate case between a pure video system and a lidar is a structured light sensor like the Kinect. In those you project a pattern of features onto an object in infrared. Doesn't work so well in sunlight but be interested in learning if someone had ever tried to use that approach with ego motion.

"Smooth, monochromatic walls do exist and it's imperative a car be able to avoid them."

Aren't those the types of walls, barriers, truck behinds that tesla's keep ramming into? :S

Maybe I missed it, I only watched part of that 4 hour video, but why don't they do like humans do and geometrically construct a Z-buffer representation from 2 or more cameras.

Then you'd get all that sweet, sweet depth data that lidar provides but cheaper and at a much higher resolution.

That was briefly touched on in the article:

> One approach that has been discussed recently is to create a pointcloud using stereo cameras (similar to how our eyes use parallax to judge distance). So far this hasn’t proved to be a great alternative since you would need unrealistically high-resolution cameras to measure objects at any significant distance.

Doing some very rough math, assuming a pair of 4K cameras with 50 degree FOV on opposite sides of the vehicle (for maximum stereo separation) and assuming you could perfectly align the pixels from both cameras, it seems you could theoretically measure depth with a precision of +/-75 cm for an object 70 meters away (a typical braking distance at highway speeds.) In practice, I imagine most of the difficulty is in matching up the pixels from both cameras precisely enough.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact