
Camera vs. Lidar - 303space
https://scale.com/blog/is-elon-wrong-about-lidar
======
Animats
Cruise Automation handling double-parked cars with LIDAR.[1] They show the
scan lines and some of the path planning. Busy city streets, lots of
obstacles.

Waymo handling city traffic with LIDAR.[2] They show the scan lines and some
of the path planning. Busy city streets, lots of obstacles.

Tesla self-driving demo, April 2019.[3] They show their display which puts
pictures of cars and trucks on screen. No difficult obstacles are encountered.
Recorded in the Palo Alto hills and on I-280 on a very quiet day. The only
time it does anything at all hard is when it has to make a left turn from
I-280 south onto Page Mill, where the through traffic does not stop. [3] Look
at the display. Where's the cross traffic info?

Tesla's 2016 self driving video [5] is now known to have been made by trying
over and over until they got a successful run with no human intervention. The
2019 demo looks similar. Although Tesla said they would, they never actually
let reporters ride in the cars in full self driving mode.

[1] [http://gmauthority.com/blog/2019/06/how-cruise-self-
driving-...](http://gmauthority.com/blog/2019/06/how-cruise-self-driving-cars-
deal-with-double-wide-parking-video/)

[2]
[https://www.youtube.com/watch?v=B8R148hFxPw](https://www.youtube.com/watch?v=B8R148hFxPw)

[3]
[https://www.youtube.com/watch?v=nfIelJYOygY](https://www.youtube.com/watch?v=nfIelJYOygY)

[4] [https://youtu.be/nfIelJYOygY?t=353](https://youtu.be/nfIelJYOygY?t=353)

[5]
[https://player.vimeo.com/video/188105076](https://player.vimeo.com/video/188105076)

~~~
693471
> Look at the display. Where's the cross traffic info"

Tesla's display does not render all of the data that the computer knows about.

Additionally this article is assuming the camera based solution for Tesla will
be single-camera. Last I checked the actual solution is going to be stereo
vision of multiple cameras (think one on each side of windshield) and using ML
to combine that data. The Model 3 does not have that capability though because
its three cameras are center mounted.

------
m3at
_It’s always better to have multiple sensor modalities available._

This is the main takeaway. Unsurprising but interesting nonetheless. I'm
working in the field and it confirms my experience.

However they have a big bias that need to be pointed out:

 _[...] we must be able to annotate this data at extremely high accuracy
levels or the perception system’s performance will begin to regress._

 _Since Scale has a suite of data labeling products built for AV developers,
[...]_

Garbage in, garbage out; yes annotation quality matters. But they're
neglecting very promising approaches that allow to leverage non-annotated
datasets (typically standard rgb images) to train models, for example self-
supervised learning from video. A great demonstration of the usefulness of
self-supervision is monocular depth estimation: taking consecutive frames (2D
images) we can estimate per pixel depth and camera ego-motion by training to
wrap previous frames into future ones. The result is a model capable of
predicting depth on individual 2D frames. See this paper [1][2] for example.

By using this kind of approach, we can lower the need for precisely annotated
data.

[1] [https://arxiv.org/abs/1904.04998](https://arxiv.org/abs/1904.04998)

[2] more readable on mobile: [https://www.arxiv-
vanity.com/papers/1904.04998/](https://www.arxiv-
vanity.com/papers/1904.04998/)

Edit: typo

~~~
donkeyd
> taking consecutive frames (2D images) we can estimate per pixel depth

Yeah, I find it odd that they're bringing up Elon's statement about LiDAR, but
then completely ignore that they spoke about creating 3D models based on
video. They even showed [0] how good of a 3d model they could create based on
dat from their cameras. So they could just as well annotate in 3D.

0: [https://youtu.be/Ucp0TTmvqOE?t=8217](https://youtu.be/Ucp0TTmvqOE?t=8217)

~~~
Symmetry
Egomotion is very useful but relies on being able to reliably extract features
from objects which isn't always possible. Smooth, monochromatic walls do exist
and it's imperative a car be able to avoid them. It is possible for a human to
figure out (almost always) their shape and distance form visual cues but our
brains are throwing far more computational horsepower at the task than even
Tesla's new computer has available. But perhaps knowing when it doesn't know
is sufficient for their purposes and probably an easier task.

An interesting intermediate case between a pure video system and a lidar is a
structured light sensor like the Kinect. In those you project a pattern of
features onto an object in infrared. Doesn't work so well in sunlight but be
interested in learning if someone had ever tried to use that approach with ego
motion.

~~~
hrghfdbdsbg
"Smooth, monochromatic walls do exist and it's imperative a car be able to
avoid them."

Aren't those the types of walls, barriers, truck behinds that tesla's keep
ramming into? :S

------
tgog
This completely neglects the fact that humans can build near perfect 3D
representations of the world with 2D images stitched together with the
parallax neural nets in our brain. This blogpost briefly mentions it in one
line as a throwaway and says you'd need extremely high resolution cameras??
Doesn't make sense at all. Two cameras of any resolution spaced a regular
distance apart should be able to build a better parallax 3D model than any one
camera alone.

~~~
sairahul82
The first thing we need to remember is the self driving doesn't work like our
brain. If they do then we don't need to train them with billions of images. So
the main problem is not just building the 3d models. For example we don't
crash into the car because we never seen that car model or that kind of
vehicle before. Check
[https://cdn.technologyreview.com/i/images/bikeedgecasepredic...](https://cdn.technologyreview.com/i/images/bikeedgecasepredictions1.jpg?sw=959&cx=0&cy=63&cw=1280&ch=720)
we never think that there is a bike infront of us.

Humans do lot more than just identifying an image or doing 3d reconstruction.
We have context about the roads, we constantly predict the movement of other
cars, we do know how to react based on the situation and most importantly we
are not fooled by simple image occlusions. Essentially we have a gigantic
correlation engine that takes decision based on comprehending different things
happening on the road.

The AI algorithms we teach does not work in the same way as we do. They overly
depend on the identifying the image. Lidar provides another signal to the
system. It provides redundancy and allows the system to take the right
decision. Take the above linked image for an example.

We may not need a lidar once the technology matures but at this stage it is a
pretty important redundant system.

~~~
ebg13
> _So the main problem is not just building the 3d models_

That's not relevant when discussing which technology to use to build the 3d
models. Everything you said is accurate until the last few sentences. Lidar
provide the same information (line of sight depth) as stereo cameras, just in
a different way. The person you're responding to is talking about depth from
stereo, not cognition.

~~~
kajecounterhack
> Lidar provide the same information (line of sight depth) as stereo cameras,
> just in a different way.

This is incorrect, the amount of parallax you need to get the same kind of
accurate depth using camera is infeasible. Velodynes other common lidar now
gets you points accurate at 150m+. Cameras can't do that, and if you use nets
to guess you'll still make mistakes.

> The person you're responding to is talking about depth from stereo, not
> cognition.

You miss the point; saying human 3D reconstruction works because of sensors
without world context is naive. The response was trying to capture that; human
perception systems utilize context / background knowledge extensively.

~~~
ebg13
> _the amount of parallax you need to get the same kind of accurate depth
> using camera is infeasible. Velodynes other common lidar now gets you points
> accurate at 150m+_

I meant they both just provide line of sight depth.

The point being made by the first comment is that human eyeballs placed one
inch apart are currently the gold standard for the actual looking part. So the
right set of cameras is by definition sufficient for the looking part of
driving. The cameras just have to replace eyes well enough. The brain
replacement is farther down the chain.

~~~
kajecounterhack
From the OP:

> humans can build near perfect 3D representations of the world with 2D images
> stitched together with the parallax neural nets in our brain

This is a statement about cognition. And the response addresses this.

Your response:

> The person you're responding to is talking about depth from stereo, not
> cognition.

I think this is the disconnect. The person _is_ talking about cognition. OP
makes a claim about how humans see, connected to how the human brain works.
Response explains why camera-based image recognition right now is a lot worse
than your eyes (a big piece of the answer is your brain).

> The cameras just have to replace eye well enough

So yes this is nice in theory. But I also get the sense most people don't
realize just how large the chasm is today between cameras and human eyes. They
don't "just provide line of sight depth." Dynamic range, field of view,
reliability even under conditions like high heat -- there are many other
dimensions where they just aren't analogous yet.

------
cameldrv
They are refuting a claim that wasn't made. If they need Lidar to do better
annotations, fine. You'd only need the lidar on data collection/R&D cars
though, and could just use cameras on production cars.

The point Musk and others are making though is that the lidar on the market
today has poor performance in weather. The cameras will struggle to a degree
in weather as well, so not having good annotations when your dev car is
driving though rain is exactly the time when you need the ground-truth to be
as clean as possible.

~~~
KaiserPro
With respect, thats not what they are claiming

They are saying that lidar enhances the perception system to get more accurate
dimensions and rotations of objects to a greater distance.

this means that you can predict far better, allowing you, for example, to
drive at night full speed.

Weather affects visual systems as well. The "ooo rain kills lidar" is noise at
best. Visual cameras are crap at night.

There is a reason that the radar augmented depth perception demo is in bright
light, no rain. Because it almost certainly doesn't work as well at night, and
will probably need a separate model.

~~~
omilu
>>Visual cameras are crap at night.

Mitigated somewhat with headlights

~~~
KaiserPro
Not really. Most cameras don't have the dynamic range needed to either full
see where the beam is pointing, or cope with near objects.

Infra-red cameras work, but RGB not so much (well not in the <$400 per CCD
price point)

------
noneckbeard
Some of this feels very cherry-picked. They’re comparing lidar vs camera on
snapshots, when a model will always be continuously built as the scene
changes.

There’s also one instance where it gives lidar the advantage because it’s
mounted on top of the car and can see over signs. What?!

~~~
WhompingWindows
Are they only utilizing one camera? Doesn't Tesla use multiple cameras and
radar?

~~~
noneckbeard
Seems like those shots were from just one camera.

------
loourr
This article is only considering static images. Lidar static "images"
necessarily contain depth information so yeah obviously they'll have better
depth estimates.

But that's really beside the point because the world is not static and any
system attempting self-driving will need to take that into account.

Using parallax measurements which is what Tesla says they are doing, you can
dramatically increase the estimates of depth measurements by comparing
multiple frames of 2D images.

Also, just a reminder that Tesla is also using radar in conjunction with the
cameras.

~~~
miohtama
This was my question as well. How good are systems over a stream of data?

I am not expert in this field: how tracking actually works with a time
dimension? These must be some sort of "state" carried over frame-by-frame?
What is the "size" of this state? Objects just do not disappear and reappear
for certain frames? This latter effect you can often see on many automatic
labeling demos you find on GitHub.

------
gambiting
I think it's completely clear at this point that Tesla(or more specifically,
Elon Musk) is just simply lying about what their cars will be able to do in
the future with existing hardware. Don't get me wrong - the existing
"autopilot" is fantastically good. But it's not going to jump from where it is
now to full self driving, no matter how many years or millions are poured into
it.

~~~
heyflyguy
Elon is an eternal optimist that truly believes that the tech will catch up
with his goals.

~~~
ajross
To be fair, though: Elon is shipping actual vehicles. I mean, I think the GM
and Waymo offerings look great too, but I can't buy one. Even granting the
premise that LiDAR is going to be a firm requirement, it's not clear to me
that Tesla is actually behind the curve on real products. Adding a LiDAR
system (or buying one from Waymo) to a Tesla seems naively simpler than
finishing and shipping a whole new system from scratch.

~~~
asdkhadsj
It does make me sad that Tesla didn't just throw all the (affordable?) tech
they could into their cars, and then figure it out later.

Eg, it seems like they are taking the "figure it out later" approach, but they
limited what they can work with to just camera information. Which to me is a
shame. I'd like to see Tesla's model with lots of inputs.

Then again, I don't ship these cars, so I'm probably being ignorant :)

~~~
MrOwen
To be fair, it seems like there's been a lot of innovation and miniaturization
in the LIDAR space in the past year or two. Even if Tesla wanted to
preemptively install LIDAR for future proofing, I'm guessing that addition to
the car would be an eye sore.

~~~
Animats
There are lots of LIDAR startups. And there's Continental, the big European
auto parts maker. They bought Advanced Scientific Concepts, which has a good
but expensive flash LIDAR. (I saw the prototype on an optical bench 15 years
ago.) They showed a prototype on a car in 2017.[1]

There are about 100 companies involved with automotive LIDAR.[2] Making LIDAR
units cheaper looks within reach. Arguments are over which technology of
several that work will be cheapest. Not whether it can be done. There are the
rotating machinery guys. The flash LIDAR guys, divided into the "we can make
CMOS do it" and "we can get InGaAs fabbed at reasonable cost" camps. There are
the MEMS mirror people. All have working demo units.

But no car maker is prepared to order in quantity. Continental is an auto
parts maker - when some auto manufacturer wants to order a few hundred
thousand units, they'll crank up a production line and get the price down.
There's no demand yet beyond the prototype level. The startups mostly want to
get bought out by someone who can manufacture in volume. In the end, it's an
auto part.

Once the units get cheaper, they can be better integrated into cars. The top
2cm or so of the windshield can be dedicated to sensors. Additional sensors
near the headlights, looking sideways from the fenders, and backwards will
complete the circle. The top-mounted rotating thing is a temporary measure
until the price comes down.

[1] [https://www.continental-automotive.com/en-gl/Landing-
Pages/C...](https://www.continental-automotive.com/en-gl/Landing-
Pages/CAD/Automated-Driving/Enablers/3D-Flash-Lidar)

[2] [http://www.automotivelidar.com/](http://www.automotivelidar.com/)

------
worik
This is ignoring the Elephant in the room: The AI is not good enough often
enough for general purpose AVs. In restricted settings it will be great
(container terminals, warehouses...) but from every thing I have seen, from
the outside as I am not a insider, the last little bit of safety seems
unobtainable with neural networks. I so want to be wrong, and please tell me
why I am. I want my next care to have a cocktail cabinet and drive smoothly
enough to balance my champagne flute on the arm rest.

~~~
alkonaut
Cars will reach 50% and 75% and 95% autonomy but they won’t reach 100% unless
we change infra to be controlled. So long as they are driving among humans on
roads made for humans they will never be 100% autonomous. 100% autonomy might
sound like just a little more than 95% but it’s not. At 100 is where a car can
be built to not have a driver. Its passengers can be drunk or not know how to
drive. It’s a huge difference from 95% or 99%.

I think when cars are 95 or 99% autonomous they will be sold with human remote
control so there will be centers where manufacturers have hundreds of remote
drives ready to intervene and handle the last 5% or 1% of situations. Ther
race to AV profitability will be won by the manufacturer with the smallest
army of backup drivers.

~~~
nradov
How do you expect human remote control to work reliably enough for safety
critical situations when our existing cellular data network fails so
frequently? What happens when a construction crew accidentally cuts through
the backhaul fiber?

~~~
alkonaut
The handover will be after the car stops because it’s confused. If there is no
cell network or no operator available the car is simply stranded on the side
of the road, just as after a mechanical failure. Operators can’t help “unknown
situations” while moving.

~~~
nradov
In many places like bridges, hills, and congested city streets there is
literally no road shoulder, no safe place to stop. When existing cars break
down in those locations they end up blocking a traffic lane and frequently get
hit from the rear by a drunk or distracted driver.

~~~
alkonaut
Yes, the autonomous car will need to make a judgement whether to “limp” out of
a situation it is unsure of, or whether to stay where it is. It’s weighing two
risks against each other). This happens whether it’s 95 or 99.99% autonomous -
it just happens with different frequency.

It could also be possible for the occupants of the car (if it has any!) to
pick up a smartphone and guide the car to safety if needed. Part of the
attraction of autonomous vehicles is that if can operate _without_ occupants,
however.

------
trevyn
But remember that accuracy of drawing bounding boxes around objects in still
frames is only very slightly related to actual self-driving ability, even if
intuition suggests otherwise.

This is basically just an ad for Scale and Scale's services, which include...
drawing bounding boxes around objects in still frames.

------
eatporktoo
I think all of this discussion about whether or not LIDAR or cameras are
better misses the point that really matters- Will cameras actually be good
enough to get the job done? If they are then it doesn't matter which is
better. You can always add additional sensors and get more information, but
engineering has always been a cost vs benefit problem. If adding LIDAR doesn't
give a significant benefit in scenarios that cameras are not already good
enough, then they might not be worth the additional expense.

~~~
asdf21
I'm going to go with the guy who seems to be successfully launching StarLink

------
glalonde
The weird thing about this article is that it's only comparing annotation
performance, which is important, but not what you should ultimately care
about. If you trained a visual model using annotated lidar for ground truth,
then you might expect better performance from the model than from human
annotations of the image alone, and certainly better than a model trained on
those annotations.

------
natch
Seems like they are either incompetent or cooking the data. When converting
from the image to a top view shape / outline, one would design and or train
the system to adjust for perspective. Clearly they have not bothered to do
that.

And the title is inflammatory. Nobody who understands the discussion is
talking only about camera versus lidar. It’s more about camera+radar versus
camera+radar+lidar, and other comparisons between other hybrid or standalone
sensor combinations. It’s not as simple as one versus the other... surprised
we still have to point this out to them.

------
airnomad
What if we have sensors/cameras/etc along the road and they feed data to
whichever car is there?

And if we also have cars share their sensor data?

Would that speed things up in terms of achieving full autonomy?

~~~
govg
Yes, drastically. But with it you bring new fears of how that data will be
used by the government. You could also achieve something similar by forcing
all vehicles to have embedded sensors that real time share data.

~~~
airnomad
So we could have highways offering 100% automomy and small village roads
offering 20% autonomy.

Investment wise it wouldn't be impossible since roads are already expensive to
build.

------
rmason
I suspect this is yet another story sponsored by the Tesla shorts. Just saw an
excellent two hour interview by Lex Fridman of George Hotz and he goes into
details why he thinks camera will win over lidar.

But he also admits that presently Google is ahead of everyone in the race for
level 5, but raises the question of whether they can ever do it economically
enough to make money on it?

[https://www.youtube.com/watch?v=iwcYp-
XT7UI](https://www.youtube.com/watch?v=iwcYp-XT7UI) 2 hours!

Money quote is when Lex tells him, ""Some non-zero part of your brain has a
madman in it"

I'd argue that is true of many of the greatest inventors of our time.

~~~
melling
I believe Tesla’s also have radar.

I also listened to the podcast. George made it sound like the Lidar wasn’t
being used for much. It augments the maps to help determine more precise
location?

~~~
cameldrv
I noticed that too. My understanding of what he said was most AV companies
were using the lidar purely for localization (on an HD map) and not for object
detection. This was the opposite of my understanding, so his statement was
very confusing to me. Anyone able to comment?

~~~
Symmetry
You can absolutely use lidar pointclouds for object detection. It can be hard
with low resolution lidar in a cluttered environment, though.

~~~
cameldrv
Of course you _can_ use it. The claim Hotz seemed to be making was that the
major AV stacks weren't doing it, which would have been major news to me.

------
sidcool
Regardless of which side of the 'Efficacy of Lidar' argument one is, it's
impressive how much impact Elon Musk's statement has on the industry.

------
jamesrom
>If your perception system is weak or inaccurate, your ability to forecast the
future will be dramatically reduced.

This is reasoning is exactly backwards. If your perception system can forecast
accurately, it simply must not be weak or inaccurate.

The question here is, what is important information for a system to perceive
to make accurate forecasts? Lidar might help a bit... But we know it simply is
not required.

------
mark_l_watson
I strongly recommend Lex Firdman’s recent interview with George Hotz where
they covered this topic in some detail [https://youtu.be/iwcYp-
XT7UI](https://youtu.be/iwcYp-XT7UI)

------
aidenn0
This completely fails to address Musk's argument: that for a L5 car you need
to be able to drive in inclement weather where LIDAR does not work reliably.

Musk may be right or wrong, but this article is a non-sequitur.

~~~
threeseed
Cameras don't work reliably in inclement weather either.

So what was Musk's point ?

~~~
yo-scot-99
there's 200-300k cars out there with just cameras + radar and their bet is the
software catches up to the rhetoric. Adding lidar is next to impossible on the
current fleet and adds to the cost if they start now. It is an early design
bet which the coders now have to meet.

~~~
alkonaut
But no car sold yet (or in the coming decade) will have close to “full
autonomy”. Not even close.

~~~
macintux
He’s been promising otherwise.

~~~
alkonaut
Hopefully no one bought a car hoping it will magically be “fully autonomous”
in the future, given that no one knows whether it will even be possible with a
research vehicle in 10 years

~~~
microtherion
If they weren't hoping that, it would seem odd for them to pay several
thousand dollars for a "Full Self Driving" option.

~~~
alkonaut
Yeah, I mean I hope people realize that just a few thousand and ”Full self
driving” doesn’t imply ”Fully autonomous” in the sense that people can use
their Tesla as an Uber and be drunk in the back seat! It just means that
they’ll have the best autonomy that Tesla can provide which still means they
have to pay attention. “Fully autonomous” to me is the point where my kid who
can’t drive can use it as a taxi to school with no other driver etc. Musk
doesn’t imagine that yet I hope.

~~~
microtherion
Now I'm confused. At one point, Tesla literally had the following description
for "Full self driving": "in the future, Model 3 will be capable of conducting
trips with no action required by the person in the driver's seat".

Isn't that pretty much what you said "Full self driving" does NOT imply?

~~~
alkonaut
Conducting (some, specific/easy) trips with no action is easy. Even conducting
most trips is likely doable within the near-ish future.

“Full autonomy” (at least to me) is being able to do _any_ trip. Not just some
or most. Because the key benefit is that the car can be empty, or the
passenger drunk/blind/...

That’s what I think the crucial difference is between their marketed “full
self driving” and true full autonomy.

------
m463
I'm suspicious - this article has been posted several times in the last few
days. Does hacker news accept dupes?

[https://news.ycombinator.com/item?id=20677720](https://news.ycombinator.com/item?id=20677720)

[https://news.ycombinator.com/item?id=20680495](https://news.ycombinator.com/item?id=20680495)

[https://news.ycombinator.com/item?id=20683288](https://news.ycombinator.com/item?id=20683288)

[https://news.ycombinator.com/item?id=20686791](https://news.ycombinator.com/item?id=20686791)

[https://news.ycombinator.com/item?id=20705890](https://news.ycombinator.com/item?id=20705890)

~~~
dang
This is answered in the FAQ:
[https://news.ycombinator.com/newsfaq.html](https://news.ycombinator.com/newsfaq.html).

------
unnouinceput
I side with Elon on this. Except he's kinda of a cheap bastard and my idea of
using Lytro cameras instead of cheap ones used by Tesla won't actually fly
with him. So yeah, use Lytro and you can forget about Lidar altogether.

~~~
rpmisms
I really like this idea, and had come up with it independently a few days ago.
Light-field is very processor intensive, but oh so powerful.

