Hacker News new | past | comments | ask | show | jobs | submit login
Camera vs. Lidar (scale.com)
137 points by 303space 66 days ago | hide | past | web | favorite | 138 comments

Cruise Automation handling double-parked cars with LIDAR.[1] They show the scan lines and some of the path planning. Busy city streets, lots of obstacles.

Waymo handling city traffic with LIDAR.[2] They show the scan lines and some of the path planning. Busy city streets, lots of obstacles.

Tesla self-driving demo, April 2019.[3] They show their display which puts pictures of cars and trucks on screen. No difficult obstacles are encountered. Recorded in the Palo Alto hills and on I-280 on a very quiet day. The only time it does anything at all hard is when it has to make a left turn from I-280 south onto Page Mill, where the through traffic does not stop. [3] Look at the display. Where's the cross traffic info?

Tesla's 2016 self driving video [5] is now known to have been made by trying over and over until they got a successful run with no human intervention. The 2019 demo looks similar. Although Tesla said they would, they never actually let reporters ride in the cars in full self driving mode.

[1] http://gmauthority.com/blog/2019/06/how-cruise-self-driving-...

[2] https://www.youtube.com/watch?v=B8R148hFxPw

[3] https://www.youtube.com/watch?v=nfIelJYOygY

[4] https://youtu.be/nfIelJYOygY?t=353

[5] https://player.vimeo.com/video/188105076

> Look at the display. Where's the cross traffic info"

Tesla's display does not render all of the data that the computer knows about.

Additionally this article is assuming the camera based solution for Tesla will be single-camera. Last I checked the actual solution is going to be stereo vision of multiple cameras (think one on each side of windshield) and using ML to combine that data. The Model 3 does not have that capability though because its three cameras are center mounted.

It’s always better to have multiple sensor modalities available.

This is the main takeaway. Unsurprising but interesting nonetheless. I'm working in the field and it confirms my experience.

However they have a big bias that need to be pointed out:

[...] we must be able to annotate this data at extremely high accuracy levels or the perception system’s performance will begin to regress.

Since Scale has a suite of data labeling products built for AV developers, [...]

Garbage in, garbage out; yes annotation quality matters. But they're neglecting very promising approaches that allow to leverage non-annotated datasets (typically standard rgb images) to train models, for example self-supervised learning from video. A great demonstration of the usefulness of self-supervision is monocular depth estimation: taking consecutive frames (2D images) we can estimate per pixel depth and camera ego-motion by training to wrap previous frames into future ones. The result is a model capable of predicting depth on individual 2D frames. See this paper [1][2] for example.

By using this kind of approach, we can lower the need for precisely annotated data.

[1] https://arxiv.org/abs/1904.04998

[2] more readable on mobile: https://www.arxiv-vanity.com/papers/1904.04998/

Edit: typo

> taking consecutive frames (2D images) we can estimate per pixel depth

Yeah, I find it odd that they're bringing up Elon's statement about LiDAR, but then completely ignore that they spoke about creating 3D models based on video. They even showed [0] how good of a 3d model they could create based on dat from their cameras. So they could just as well annotate in 3D.

0: https://youtu.be/Ucp0TTmvqOE?t=8217

Egomotion is very useful but relies on being able to reliably extract features from objects which isn't always possible. Smooth, monochromatic walls do exist and it's imperative a car be able to avoid them. It is possible for a human to figure out (almost always) their shape and distance form visual cues but our brains are throwing far more computational horsepower at the task than even Tesla's new computer has available. But perhaps knowing when it doesn't know is sufficient for their purposes and probably an easier task.

An interesting intermediate case between a pure video system and a lidar is a structured light sensor like the Kinect. In those you project a pattern of features onto an object in infrared. Doesn't work so well in sunlight but be interested in learning if someone had ever tried to use that approach with ego motion.

"Smooth, monochromatic walls do exist and it's imperative a car be able to avoid them."

Aren't those the types of walls, barriers, truck behinds that tesla's keep ramming into? :S

Maybe I missed it, I only watched part of that 4 hour video, but why don't they do like humans do and geometrically construct a Z-buffer representation from 2 or more cameras.

Then you'd get all that sweet, sweet depth data that lidar provides but cheaper and at a much higher resolution.

That was briefly touched on in the article:

> One approach that has been discussed recently is to create a pointcloud using stereo cameras (similar to how our eyes use parallax to judge distance). So far this hasn’t proved to be a great alternative since you would need unrealistically high-resolution cameras to measure objects at any significant distance.

Doing some very rough math, assuming a pair of 4K cameras with 50 degree FOV on opposite sides of the vehicle (for maximum stereo separation) and assuming you could perfectly align the pixels from both cameras, it seems you could theoretically measure depth with a precision of +/-75 cm for an object 70 meters away (a typical braking distance at highway speeds.) In practice, I imagine most of the difficulty is in matching up the pixels from both cameras precisely enough.

This completely neglects the fact that humans can build near perfect 3D representations of the world with 2D images stitched together with the parallax neural nets in our brain. This blogpost briefly mentions it in one line as a throwaway and says you'd need extremely high resolution cameras?? Doesn't make sense at all. Two cameras of any resolution spaced a regular distance apart should be able to build a better parallax 3D model than any one camera alone.

The first thing we need to remember is the self driving doesn't work like our brain. If they do then we don't need to train them with billions of images. So the main problem is not just building the 3d models. For example we don't crash into the car because we never seen that car model or that kind of vehicle before. Check https://cdn.technologyreview.com/i/images/bikeedgecasepredic... we never think that there is a bike infront of us.

Humans do lot more than just identifying an image or doing 3d reconstruction. We have context about the roads, we constantly predict the movement of other cars, we do know how to react based on the situation and most importantly we are not fooled by simple image occlusions. Essentially we have a gigantic correlation engine that takes decision based on comprehending different things happening on the road.

The AI algorithms we teach does not work in the same way as we do. They overly depend on the identifying the image. Lidar provides another signal to the system. It provides redundancy and allows the system to take the right decision. Take the above linked image for an example.

We may not need a lidar once the technology matures but at this stage it is a pretty important redundant system.

> So the main problem is not just building the 3d models

That's not relevant when discussing which technology to use to build the 3d models. Everything you said is accurate until the last few sentences. Lidar provide the same information (line of sight depth) as stereo cameras, just in a different way. The person you're responding to is talking about depth from stereo, not cognition.

> Lidar provide the same information (line of sight depth) as stereo cameras, just in a different way.

This is incorrect, the amount of parallax you need to get the same kind of accurate depth using camera is infeasible. Velodynes other common lidar now gets you points accurate at 150m+. Cameras can't do that, and if you use nets to guess you'll still make mistakes.

> The person you're responding to is talking about depth from stereo, not cognition.

You miss the point; saying human 3D reconstruction works because of sensors without world context is naive. The response was trying to capture that; human perception systems utilize context / background knowledge extensively.

> the amount of parallax you need to get the same kind of accurate depth using camera is infeasible. Velodynes other common lidar now gets you points accurate at 150m+

I meant they both just provide line of sight depth.

The point being made by the first comment is that human eyeballs placed one inch apart are currently the gold standard for the actual looking part. So the right set of cameras is by definition sufficient for the looking part of driving. The cameras just have to replace eyes well enough. The brain replacement is farther down the chain.

From the OP:

> humans can build near perfect 3D representations of the world with 2D images stitched together with the parallax neural nets in our brain

This is a statement about cognition. And the response addresses this.

Your response:

> The person you're responding to is talking about depth from stereo, not cognition.

I think this is the disconnect. The person _is_ talking about cognition. OP makes a claim about how humans see, connected to how the human brain works. Response explains why camera-based image recognition right now is a lot worse than your eyes (a big piece of the answer is your brain).

> The cameras just have to replace eye well enough

So yes this is nice in theory. But I also get the sense most people don't realize just how large the chasm is today between cameras and human eyes. They don't "just provide line of sight depth." Dynamic range, field of view, reliability even under conditions like high heat -- there are many other dimensions where they just aren't analogous yet.

> The first thing we need to remember is the self driving doesn't work like our brain. If they do then we don't need to train them with billions of images.

I had always assumed that the first few years of infancy was effectively a period of training a neural net (the brain) against a continuous series of images (everything seen).

Where is the bike example from? All these instances of recognition error are meaningless when they don’t come from actual production systems by auto makers. They don’t just slap OpenCV into a car.

Having a redundant system is the key here.

Also provides a reliable source of data, if humans have a LiDAR in their system then we would use it to improve our decisions.

I don’t see why we should limit the AV.

The human brain is horrible at building truly accurate 3D representations of the world. Our mental maps are constantly missing a magnitude of details while tricking us and creating approximations to fill in the blanks.

Easy examples of this are optical illusions, ghosts, and ufos. There is also "selective attention tests" where a majority of people miss glaringly obvious events right in front of them, when they're focusing on something else. Regular people also tend to bump into things, spill things, and trip, even when going 3 miles an hour (walking speed).

Exactly. We don't build detailed accurate 3D maps. We build fuzzy semantic 2.5-ish-D maps that are 99% metadata. And they work incredibly well.

But at the same time people don't think much about getting in their cars and driving to work or the grocery store.

So it seems that a truly accurate 3D representations of the world are not necessary, at least for driving. Perhaps it's the resolution? Looking at the samples in the article, they are just terribly fuzzy, with a narrow field of view. If I had to drive and only see the world through that kind of view, I don't think I would be doing very well.

People also crash all the time. I'd be OK with AI crashing even slightly less than humans. Rabid shock-media and various luddites aren't.

We don't just have 2D data though.

We learn objects representations by interacting with them over years in a multi modal fashion. Take for example a simple drinking glass: we know its material properties (it is transparent, solid, can hold liquids), its typical position (stay on a tabletop, upright with the open side on top), its usage (grab it with a hand and bring to mouth)...

We also make heavy use of the time dimension, as over a few seconds we see the same objects from different view points and possibly in different states.

Only after learning what a glass is can we easily recover its properties on a still 2D image.

So at least for learning (might be skippable at inference), it makes a lot of sense to me to have more than 2D still images.

You're not responding to what they said. The person you're responding to is talking about depth from stereo, not cognition. Lidar _also_ doesn't know what the glass feels like.

I am, I was not writing about cognition here.

All I'm saying is that even with stereo inputs, we're doing more than computing depth from the baseline between left/right images. Close one eye and you can still estimate relative objects positions, because you learned that roads are mostly planar and cars don't float but stand on the road. You know what the expected size of a car is compared to, say, a human, and if the car is visually smaller than the human, it must be more far away.

Lidar _also_ doesn't know what the glass feels like.

Yes I agree with you, lidar and most current vision sensors also suffer from this.

People who have good vision in one eye can usually get their drivers licence without problems. So the depth from stereo is not a necessary part of driving for humans.

It doesn't matter how you estimate depth, but you do have to estimate it to drive, and the first step before you can estimate is that your eyes (eye in your example) need to see pictures. Light entering the eye is an entirely different stage in the process than reasoning about said light.

Others have commented about the human aspect.

> Two cameras of any resolution spaced a regular distance apart should be able to build a better parallax 3D model than any one camera alone.

This is true if the platform isn't moving.

If you have the time dimension and you have good knowledge of motion between frames (difficult), you can use the two views as a virtual stereo pair. This is called monocular visual/inertial-SLAM. You can supplement with GPS, 2D lidar, odometry and IMU to probabalistically fuse everything together. There have been some nice results published over the years.

But in general yes, you'll always be better off if you have a proper stereo pair with a camera either side of the car.

> humans can build near perfect 3D representations of the world

The idea that the human brain has a "near perfect" 3D representation of one's surroundings seems inaccurate to me. There's a difference between near perfection and good enough that people don't often get hurt, when all of their surroundings are deliberately constructed to limit exposure to danger.

I write code for industrial equipment and often get the request to fix a problem with software. The question "Can a computer do X" is too easy to answer in the affirmative - "Yes, but less accurately and only most of the time, and with a lot of time and money" gets condensed to "Yes" quickly.

And it is indeed an impressive and heroic piece of work when you can fix sensor problems with clever filtering, or fix mechanical problems with clever control algorithms. But when designing new equipment or deciding a path to fix a bad design, you never want to hamstring yourself from the start with poor quality input data and output actuators. That approach only leads to pain.

Once you have lots of experience with a particular design - dozens of similar machines running successfully in production for years - then you can start looking for ways to be clever and improve performance over the default or save a little money.

I understand Elon's desire to get lots of data. But there will be a much greater chance of success if it starts with Lidar + cameras, and a decade down the road you can work on camera-only controls and compare what they calculated and would have done to what the Lidar measured and the car actually responded. Only when these are sufficiently close should you phase out the Lidar.

Remember, you're comparing bad input data going to the best neural net known in the universe (the human brain) with millenia of evolution and decades of training data to sensor inputs to brand new programming. Help out the computer with better input data.

For human level driving a human level understanding of these scene from purely visual information is quite good enough. The first problem, though, is that the human brain has far more processing power than any computer that can fit in a car and probably more than any single computer yet constructed (estimating even to a single order of magnitude is hard). We're also leveraging millions of years of evolution though I'm not entirely sure how much of a difference that makes given how different our ancestral environment was from driving a car.

The other thing is that we, ideally, want a computer to drive a car better than a human can. There's a lot to be gained from having precise rather than approximate notions or other objects' distances and speeds in terms of driving both safely and efficiently. Now, Tesla has also got that Radar which when fused with visual data will help somewhat but I'm not sure how far that can get them.

Yes, we can. We can do it with one eye too.

but it takes at least 10 years to train.

But most of the time we are not building a 3d map from points. we are building it from object inference.

There are many advantages that we have over machines:

o The eye seens much beter in the dark o It has a massive dynamic range, allowing us to see both light and dark things o it moves to where the threat is o if it's occluded it can move to get a better image o it has a massive database of objects in context o each object has a mass, dimension, speed and location it should be seen in

None of those are 3d maps, they are all inference, where one can derrive the threat/advantage based on history.

We can't make machines do that yet.

you are correct that two cameras allows for better 3d pointcloud making in some situations. but a moving single camera is better than a static multiview camera.

however even then the 3d map isn't all that great, and has a massive latency compared to lidar.

I think most of our ability to judge relative distance is based on our brains judgement of lighting, texture, inference, and sound. While having two eyes helps a lot, you can still navigate a complex office environment with one eye closed. It just takes a bit more care.

When I was younger I remember hearing about how we can do all these things because we have 2 eyes. And that depth perception is what gives us the ability to not walk into walls, and do other things including driving.

I have thought about this many times and often wondered why when closing one eye I am still able to function.

Sense then I have thought strongly that having depth perception is used for training some other part of our brain, and then only used to increase accuracy of our perception of reality.

Further proof of this is TV. Even on varying sized screens humans tend to do well figuring out the actual size of things displayed.

Take one class on perception, read one textbook, you'll immediately find that stereo perception isn't very important. Your brain uses a host of depth queues, and stereo vision is just one of them.

Some of them translate trivially to photos/TV/etc, like convergent lines or texture gradient. Some of them are surprisingly physical, like feedback from your eyes about vergence or focal distance.

Stereo is highly effective up close, say within 10 meters (yards). And it works faster than many modes. It's absolutely fantastic for catching things out of the air. Given our intraocular distance, it's basically garbage past, I dunno, 30m or something? (obviously it degrades smoothly across distance)

I've heard more than one academic (evolutionary cognitive psychologists, etc) speculate that the single biggest evolutionary advantage of having two eyes is to have a spare in the event of damage. That might well be just whimsy and exaggeration, but I think it puts a helpful alternate perspective on it (pun!).

I'm skeptical of the claim that a major reason for having two eyes is depth perception.

One reason why you're still able to function is that you don't rely on your sense of depth that much these days. i.e. You don't need to gage where a spear or arrow will land. Even in a car, you are effectively on a one dimensional track and only have to decided to go left or right.

If you only had one eye, then in situations where there is lots of pressure to perceive depth I think you'd have to move your head around a lot.

Which makes me wonder, which human activities demand the best depth perception?

See my sibling comment: with respect to stereo vision, its greatest strength is nearby fast-moving things, great for stuff like dodging or catching or punching.

If you wanna launch spears or arrows, depth perception is incredibly important, but stereo vision will not help. Not with this intraocular distance, anyway.

Humans can determine the size of objects because we look for references in the scene and we understand the context.

If a person is standing next to a bush then we roughly know their height since we know the range of sizes that a bush could grow to. Likewise the size of someone like Thanos from Avengers would look odd in a documentary but because its a superhero movie we assume that's normal.

Self driving cars to my knowledge do none of this.

Stereo depth perception is not that important. People born without it end up being able to navigate pretty well, dodge walls, climb stairs, etc. It just takes practice.

Fun trick: Look at a photograph with one eye closed. Your brain will do ... something and the picture will look 3d

That would be the same trick it does when you look at it with both eyes open...

About 10 years ago I went to an eye doctor with a small object in my eye, and she had to cover it after removing the small object.

Driving back home with 1 eye was scary even though I was going much slower. It is possible to drive with 1 eye, but much much harder than with 2 eyes.

Did you drive any further than just the way home? I would bet most people would adapt quite quickly.

No, I wasn’t experimenting, but I haven’t had any car accidents in my life, and I find that more valuable

In these modern times yes, there's little selective pressure keeping depth perception sharp. That doesn't mean most of our ability to judge depth is from monocular clues (though that could be true).


There are also depth cues from https://en.wikipedia.org/wiki/Vergence#Convergence, right? As in focusing on the object itself?

Wikipedia lists 18 different types of depth cues that humans use!


This seems like a bit of a double-edged sword. On the one hand, it means there's more than one way to achieve a 3D model of the world with cameras. On the other hand, it means that if what machines can do with cameras is going to match what we humans can do with our eyes, they will need to either advance along 18 different fronts or take some of those cues further than we can.

The most rudimentary life forms are little factories that build themselves. I think we should concentrate on making cars that build themselves and maybe then our technology will be sophisticated enough to consider looking into giving our cars human-like optical processing faculties.

Otherwise we'll just have to figure out how to build autonomous vehicles with the technology we have, which is pretty crappy in comparison to biology in a lot of ways still.

When a tree falls over a river, it creates a rudimentary bridge, as has happened for longer than humans have existed. Yet, while we can create huge suspension bridges from steel, we can't create wood.

This is getting into grey goo territory.

You cannot have false negatives. Ever. You cannot have a situation where the system doesn't see a pedestrian and runs over them at without noticing. So you need to make a very convincing argument that it can't happen.

With cameras and computer vision there's no way to prove it. There is always a chance that it will glitch out for a second and kill someone.

Autonomous vehicles don't need to be perfect drivers -- from it, they just need to be better than humans.

No, we accept humans as being imperfect but we do not accept machines as being imperfect. Yes this means that companies that have autonomous vehicles that have a lower accident rate than humans may still be completely unable to sell them because of legal issues and market perception.

We don’t know yet what the acceptance rate is for autonomous accidents - but I can guarantee it’s not the rational value of 1:1 or “as safe as humans”. They’ll need to do a lot better.

Yes and there's also a kind of lying with statistics that goes on. That human accident rate includes drunk drivers, very young drivers, very old drivers, etc.

The average accident rate is not your expected accident rate, if you are an average person who is not in those categories.

A million people die per year due to road deaths, about 40,000 of those in the USA.

If what you say is true then a future where robot cars kill 500,000 per year and 20,000 in the USA would be considered acceptable.

Yet we know this is absolutely not the case, no society will ever stand for such a massive death toll due to robot usage. Are there any industries today where robots are allowed to kill so many?

We accept deaths because of human failing as there is no other way, the alternative is no cars.

So for us to hand over the reins to robots they need to be near perfect, think the accident rates of the airline industry as the only acceptable goal.

> near perfect 3D representations of the world with 2D images

This is ridiculous.

I am sitting in front of a monitor right now. Please explain how I can perfectly determine the depth of it even though I can't see behind it ? I can move my ahead all around it to capture hundreds of different viewpoints but a car can't do that.

Nobody made a rule that says cars can't have cameras in more than one location.

When moving, cars can compare hundreds of different viewpoints. Multiple cameras provide for depth perception when stationary.

it’s too bad that cars can’t move to get additional points of view.

Not like a human can, actually. Fixed cameras are fixed relative to the bodywork, necks are not. OTOH, if I move my head it's usually to get around a blind spot, something cameras have less of an issue with.

Cameras do not perform saccades, for starters... The hardware isn't as analogous as it might seem.

They are refuting a claim that wasn't made. If they need Lidar to do better annotations, fine. You'd only need the lidar on data collection/R&D cars though, and could just use cameras on production cars.

The point Musk and others are making though is that the lidar on the market today has poor performance in weather. The cameras will struggle to a degree in weather as well, so not having good annotations when your dev car is driving though rain is exactly the time when you need the ground-truth to be as clean as possible.

With respect, thats not what they are claiming

They are saying that lidar enhances the perception system to get more accurate dimensions and rotations of objects to a greater distance.

this means that you can predict far better, allowing you, for example, to drive at night full speed.

Weather affects visual systems as well. The "ooo rain kills lidar" is noise at best. Visual cameras are crap at night.

There is a reason that the radar augmented depth perception demo is in bright light, no rain. Because it almost certainly doesn't work as well at night, and will probably need a separate model.

>>Visual cameras are crap at night.

Mitigated somewhat with headlights

Not really. Most cameras don't have the dynamic range needed to either full see where the beam is pointing, or cope with near objects.

Infra-red cameras work, but RGB not so much (well not in the <$400 per CCD price point)

Not fully mitigated for humans, it's more dangerous to drive at night.

Some of this feels very cherry-picked. They’re comparing lidar vs camera on snapshots, when a model will always be continuously built as the scene changes.

There’s also one instance where it gives lidar the advantage because it’s mounted on top of the car and can see over signs. What?!

I also feel that they make the 2D annotator's job very hard. I wore an eye patch yesterday (having fun with kids) and reality became extremely confusing. Our brain does not annotate on static 2D images. We annotate on stereoscopic video of moving objects.

Are they only utilizing one camera? Doesn't Tesla use multiple cameras and radar?

Seems like those shots were from just one camera.

This article is only considering static images. Lidar static "images" necessarily contain depth information so yeah obviously they'll have better depth estimates.

But that's really beside the point because the world is not static and any system attempting self-driving will need to take that into account.

Using parallax measurements which is what Tesla says they are doing, you can dramatically increase the estimates of depth measurements by comparing multiple frames of 2D images.

Also, just a reminder that Tesla is also using radar in conjunction with the cameras.

This was my question as well. How good are systems over a stream of data?

I am not expert in this field: how tracking actually works with a time dimension? These must be some sort of "state" carried over frame-by-frame? What is the "size" of this state? Objects just do not disappear and reappear for certain frames? This latter effect you can often see on many automatic labeling demos you find on GitHub.

I think it's completely clear at this point that Tesla(or more specifically, Elon Musk) is just simply lying about what their cars will be able to do in the future with existing hardware. Don't get me wrong - the existing "autopilot" is fantastically good. But it's not going to jump from where it is now to full self driving, no matter how many years or millions are poured into it.

Elon is an eternal optimist that truly believes that the tech will catch up with his goals.

To be fair, though: Elon is shipping actual vehicles. I mean, I think the GM and Waymo offerings look great too, but I can't buy one. Even granting the premise that LiDAR is going to be a firm requirement, it's not clear to me that Tesla is actually behind the curve on real products. Adding a LiDAR system (or buying one from Waymo) to a Tesla seems naively simpler than finishing and shipping a whole new system from scratch.

Shipping vehicles has been a solved problem for a century, shipping autonomous vehicles has not. Although they are taking the money for it, Tesla are not shipping autonomous vehicles.

>Elon is shipping actual vehicles. I mean, I think the GM and Waymo offerings look great too, but I can't buy one.

GM is shipping millions of cars annually, with the same FSD capability as any Tesla on the road. That is: zero FSD capability.

Right. All the major automakers have Level 2 capability now - lane keeping and auto-braking in traffic, plus maybe some extra stuff like auto-park or lane change. That's all Tesla's "autopilot", as shipped, does.

In that situation, what do you think would happen to customers who've paid $6000 for the 'Full Self-Driving Capability' ?

They get the upgrade for free, I'd hope? Kind of like they seem to feel the need to do regarding their new chip [0].

Less optimistic, people will probably be less eager to upgrade once those cars are several generations old by the point FSD will actually be available.

[0] https://www.theverge.com/2019/7/8/20685873/tesla-fsd-chip-up...

Well the problem is that as I understand it you need to get the FSD package right now to get the current advanced cruise control, which is kind of the party trick of a Tesla and a reason why I'd want one(that and the whole electric drive thing I guess).

And is the full self driving promise attached to the vehicle or the purchaser? In other words, if full self driving actually happens in 10 years, will the only people able to collect be original purchasers that still have that original car?

Seems like a great bet for Tesla either way. If they get to self driving soon, they'll make a mint and upgrading or even fully replacing those cars will be a drop in the bucket. If it takes them a decade, there probably won't be that many still around to make a claim.

It does make me sad that Tesla didn't just throw all the (affordable?) tech they could into their cars, and then figure it out later.

Eg, it seems like they are taking the "figure it out later" approach, but they limited what they can work with to just camera information. Which to me is a shame. I'd like to see Tesla's model with lots of inputs.

Then again, I don't ship these cars, so I'm probably being ignorant :)

To be fair, it seems like there's been a lot of innovation and miniaturization in the LIDAR space in the past year or two. Even if Tesla wanted to preemptively install LIDAR for future proofing, I'm guessing that addition to the car would be an eye sore.

There are lots of LIDAR startups. And there's Continental, the big European auto parts maker. They bought Advanced Scientific Concepts, which has a good but expensive flash LIDAR. (I saw the prototype on an optical bench 15 years ago.) They showed a prototype on a car in 2017.[1]

There are about 100 companies involved with automotive LIDAR.[2] Making LIDAR units cheaper looks within reach. Arguments are over which technology of several that work will be cheapest. Not whether it can be done. There are the rotating machinery guys. The flash LIDAR guys, divided into the "we can make CMOS do it" and "we can get InGaAs fabbed at reasonable cost" camps. There are the MEMS mirror people. All have working demo units.

But no car maker is prepared to order in quantity. Continental is an auto parts maker - when some auto manufacturer wants to order a few hundred thousand units, they'll crank up a production line and get the price down. There's no demand yet beyond the prototype level. The startups mostly want to get bought out by someone who can manufacture in volume. In the end, it's an auto part.

Once the units get cheaper, they can be better integrated into cars. The top 2cm or so of the windshield can be dedicated to sensors. Additional sensors near the headlights, looking sideways from the fenders, and backwards will complete the circle. The top-mounted rotating thing is a temporary measure until the price comes down.

[1] https://www.continental-automotive.com/en-gl/Landing-Pages/C...

[2] http://www.automotivelidar.com/

> Tesla didn't just throw all the (affordable?) tech they could into their cars

But they did. Since the late 2016 model S, each tesla comes with 1 radar unit, 8 cameras, 12 ultrasonic sensors and a replacable computer system.

At the start, these were not all used, but they have been used for more functions over time.

The model X got the same cameras, and my friend who has one said his car uses the side cameras to prevent the self-opening driver and passenger doors from dinging nearby cars.

They've designed a new faster computer (Hardware 3.0). The folks who paid for full self driving will get one of these swapped in when full self driving features require it.

That's a truly optimistic view of Elon, the pessimistic view is that's he's PT Barnum.

But that's a clearly inaccurate view. The cars work, very well. They have sold mass numbers, in opposition to car dealers, conservative oil and gas industry opposition, the automotive industry. They have little maintenance (far less than regular cars). They seem pretty safe overall. I've owned a number of high quality cars, such as an audi s4, and my tesla since 2012 (7 years ago they had all this stuff!) has worked well. These cars are made and driven by humans, so there will be some mechanical problems, people with crash them occasionally, they are filled with energy and can catch on fire occasionally.

It's easy to sell a product, when you charge less than it costs to produce. That's the silicon valley way.

I honestly can't say if this is sarcasm or not.

This is ignoring the Elephant in the room: The AI is not good enough often enough for general purpose AVs. In restricted settings it will be great (container terminals, warehouses...) but from every thing I have seen, from the outside as I am not a insider, the last little bit of safety seems unobtainable with neural networks. I so want to be wrong, and please tell me why I am. I want my next care to have a cocktail cabinet and drive smoothly enough to balance my champagne flute on the arm rest.

Cars will reach 50% and 75% and 95% autonomy but they won’t reach 100% unless we change infra to be controlled. So long as they are driving among humans on roads made for humans they will never be 100% autonomous. 100% autonomy might sound like just a little more than 95% but it’s not. At 100 is where a car can be built to not have a driver. Its passengers can be drunk or not know how to drive. It’s a huge difference from 95% or 99%.

I think when cars are 95 or 99% autonomous they will be sold with human remote control so there will be centers where manufacturers have hundreds of remote drives ready to intervene and handle the last 5% or 1% of situations. Ther race to AV profitability will be won by the manufacturer with the smallest army of backup drivers.

How do you expect human remote control to work reliably enough for safety critical situations when our existing cellular data network fails so frequently? What happens when a construction crew accidentally cuts through the backhaul fiber?

The handover will be after the car stops because it’s confused. If there is no cell network or no operator available the car is simply stranded on the side of the road, just as after a mechanical failure. Operators can’t help “unknown situations” while moving.

In many places like bridges, hills, and congested city streets there is literally no road shoulder, no safe place to stop. When existing cars break down in those locations they end up blocking a traffic lane and frequently get hit from the rear by a drunk or distracted driver.

Yes, the autonomous car will need to make a judgement whether to “limp” out of a situation it is unsure of, or whether to stay where it is. It’s weighing two risks against each other). This happens whether it’s 95 or 99.99% autonomous - it just happens with different frequency.

It could also be possible for the occupants of the car (if it has any!) to pick up a smartphone and guide the car to safety if needed. Part of the attraction of autonomous vehicles is that if can operate without occupants, however.

I had to drive once on a punctured tire that was deflating. No shoulders due to construction, and in fact I had to speed through the work zone to make sure I got to an exit while I still had some air.

The real world is a very very messy place.

Let's just drive multi-ton vehicles instead [1]. Highway driving might be easier to visually parse but higher speeds and probably less controlled kinematics (i.e. does the software know how to adjust for the cargo) give one pause:

[1] https://www.theverge.com/2019/8/15/20805994/ups-self-driving...

But remember that accuracy of drawing bounding boxes around objects in still frames is only very slightly related to actual self-driving ability, even if intuition suggests otherwise.

This is basically just an ad for Scale and Scale's services, which include... drawing bounding boxes around objects in still frames.

I think all of this discussion about whether or not LIDAR or cameras are better misses the point that really matters- Will cameras actually be good enough to get the job done? If they are then it doesn't matter which is better. You can always add additional sensors and get more information, but engineering has always been a cost vs benefit problem. If adding LIDAR doesn't give a significant benefit in scenarios that cameras are not already good enough, then they might not be worth the additional expense.

I'm going to go with the guy who seems to be successfully launching StarLink

The weird thing about this article is that it's only comparing annotation performance, which is important, but not what you should ultimately care about. If you trained a visual model using annotated lidar for ground truth, then you might expect better performance from the model than from human annotations of the image alone, and certainly better than a model trained on those annotations.

Seems like they are either incompetent or cooking the data. When converting from the image to a top view shape / outline, one would design and or train the system to adjust for perspective. Clearly they have not bothered to do that.

And the title is inflammatory. Nobody who understands the discussion is talking only about camera versus lidar. It’s more about camera+radar versus camera+radar+lidar, and other comparisons between other hybrid or standalone sensor combinations. It’s not as simple as one versus the other... surprised we still have to point this out to them.

What if we have sensors/cameras/etc along the road and they feed data to whichever car is there?

And if we also have cars share their sensor data?

Would that speed things up in terms of achieving full autonomy?

Yes, drastically. But with it you bring new fears of how that data will be used by the government. You could also achieve something similar by forcing all vehicles to have embedded sensors that real time share data.

So we could have highways offering 100% automomy and small village roads offering 20% autonomy.

Investment wise it wouldn't be impossible since roads are already expensive to build.

Are there any active research exploring this path? E.g. have cars to report about themselves or have a centralized control system (similar to air traffic control?)

Cooperative strategies open up a lot of new attack vectors. I don't trust those companies to design systems robust enough, especially if they'd have to build a standard together with competitors.

And in the end, your car has to be able to come to a safe stop and avoid dangers no matter the situation. Even with no other cars around or communication interrupted. To reliably achieve this will probably get you most of the way to "real" self driving, with humans/remote operators manually taking care of the few remaining cases.

I suspect this is yet another story sponsored by the Tesla shorts. Just saw an excellent two hour interview by Lex Fridman of George Hotz and he goes into details why he thinks camera will win over lidar.

But he also admits that presently Google is ahead of everyone in the race for level 5, but raises the question of whether they can ever do it economically enough to make money on it?

https://www.youtube.com/watch?v=iwcYp-XT7UI 2 hours!

Money quote is when Lex tells him, ""Some non-zero part of your brain has a madman in it"

I'd argue that is true of many of the greatest inventors of our time.

I believe Tesla’s also have radar.

I also listened to the podcast. George made it sound like the Lidar wasn’t being used for much. It augments the maps to help determine more precise location?

I noticed that too. My understanding of what he said was most AV companies were using the lidar purely for localization (on an HD map) and not for object detection. This was the opposite of my understanding, so his statement was very confusing to me. Anyone able to comment?

You can absolutely use lidar pointclouds for object detection. It can be hard with low resolution lidar in a cluttered environment, though.

Of course you can use it. The claim Hotz seemed to be making was that the major AV stacks weren't doing it, which would have been major news to me.

They have a fixed position radar which sees the distances and radial speeds of objects in a cone extending in front of the car. Because it can't tell you which direction an object is in they have to filter out objects that are radially stationary so as not to be triggered by things on the side of the road, overhead signs, etc. At least until these objects are close enough that the cone's width doesn't exceed the width of the car by much. Now, it's very useful when dealing with other moving cars which probably aren't hanging in the air above your direction of travel. But it doesn't help with everything.

Yes you’re right, they do.

For anyone else who skipped the article - this story likely isn't sponsored by Tesla as it takes a very critical view of Camera-only self driving sensors.

Edit: misread the parent comment

The author said "tesla shorts"

> sponsored by the Tesla shorts

i.e. people who are betting against Tesla.

Regardless of which side of the 'Efficacy of Lidar' argument one is, it's impressive how much impact Elon Musk's statement has on the industry.

>If your perception system is weak or inaccurate, your ability to forecast the future will be dramatically reduced.

This is reasoning is exactly backwards. If your perception system can forecast accurately, it simply must not be weak or inaccurate.

The question here is, what is important information for a system to perceive to make accurate forecasts? Lidar might help a bit... But we know it simply is not required.

I strongly recommend Lex Firdman’s recent interview with George Hotz where they covered this topic in some detail https://youtu.be/iwcYp-XT7UI

This completely fails to address Musk's argument: that for a L5 car you need to be able to drive in inclement weather where LIDAR does not work reliably.

Musk may be right or wrong, but this article is a non-sequitur.

Musk has made a lot of arguments over the years. His argument was that lidar is a crutch because people do it without lidar.

Except people don't drive reliably in inclement weather at all, so you don't really want that as the gold standard.

Training a car to be as good as average people driving in the rain/snow would be horrible.

Cameras don't work reliably in inclement weather either.

So what was Musk's point ?

there's 200-300k cars out there with just cameras + radar and their bet is the software catches up to the rhetoric. Adding lidar is next to impossible on the current fleet and adds to the cost if they start now. It is an early design bet which the coders now have to meet.

But no car sold yet (or in the coming decade) will have close to “full autonomy”. Not even close.

He’s been promising otherwise.

Hopefully no one bought a car hoping it will magically be “fully autonomous” in the future, given that no one knows whether it will even be possible with a research vehicle in 10 years

If they weren't hoping that, it would seem odd for them to pay several thousand dollars for a "Full Self Driving" option.

Yeah, I mean I hope people realize that just a few thousand and ”Full self driving” doesn’t imply ”Fully autonomous” in the sense that people can use their Tesla as an Uber and be drunk in the back seat! It just means that they’ll have the best autonomy that Tesla can provide which still means they have to pay attention. “Fully autonomous” to me is the point where my kid who can’t drive can use it as a taxi to school with no other driver etc. Musk doesn’t imagine that yet I hope.

Now I'm confused. At one point, Tesla literally had the following description for "Full self driving": "in the future, Model 3 will be capable of conducting trips with no action required by the person in the driver's seat".

Isn't that pretty much what you said "Full self driving" does NOT imply?

Conducting (some, specific/easy) trips with no action is easy. Even conducting most trips is likely doable within the near-ish future.

“Full autonomy” (at least to me) is being able to do any trip. Not just some or most. Because the key benefit is that the car can be empty, or the passenger drunk/blind/...

That’s what I think the crucial difference is between their marketed “full self driving” and true full autonomy.

I wasn't claiming Musk was right, just that the article is a non-sequitur

This is answered in the FAQ: https://news.ycombinator.com/newsfaq.html.

Suspicious of what? None of those got any comments. Sometimes stories slip off the radar before being noticed, the first time, or first several times. And HN even posts 'dupes' themselves sometimes, to give undercommented stories another go. see https://news.ycombinator.com/item?id=11662380

I submitted a blog post the other day that got 150 comments - I only noticed afterwards it had already been submitted 6 or 7 times before in the months preceding, each without attracting any comments.

No need to be suspicious. AFAIK, HN encourages folks to reshare if they think that a quality post slipped off without much engagement. I even got an email from someone at HN encouraging me to reshare my couple of years' old post again to see if it sticks this time around.

I side with Elon on this. Except he's kinda of a cheap bastard and my idea of using Lytro cameras instead of cheap ones used by Tesla won't actually fly with him. So yeah, use Lytro and you can forget about Lidar altogether.

I really like this idea, and had come up with it independently a few days ago. Light-field is very processor intensive, but oh so powerful.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact