Hacker Newsnew | past | comments | ask | show | jobs | submit | ra7's commentslogin

> The insight driving the program, Naga said, is that the limiting factor for AV development is no longer the underlying technology. “The bottleneck is data,” he said. “[Companies like Waymo] need to go around and collect the data, collect different scenarios. You may be able to say: in San Francisco, ‘At this school intersection, I want some data at this time of day so I can train my models.’ The problem for all these companies is access to that data, because they don’t have the capital to deploy the cars and go collect all this information.”

You can’t be the CTO of Uber wanting to do AVs, and get the data collection requirement shockingly wrong.

Waymo’s bottleneck has never been data. When they want data about a school intersection in SF at a certain time of day, they just... synthetically generate it and simulate: https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-f...

Waymo is able to deploy with less (but targeted and high quality) data collection by having world class simulation capabilities. Not that they haven't collected huge amounts of data as it's no doubt important (I've heard their onboard storage is transferred and emptied every few days), it's just not a bottleneck. They have the most efficient operation in the AV industry.

The best example of why data collection isn’t the bottleneck is Tesla. They boast about billions of miles of data, yet they’re struggling to put out fully autonomous vehicles.


> When they want data about a school intersection in SF at a certain time of day, they just... synthetically generate it and simulate

I think it's more about detecting changes to the world. You need boots on the ground, so to speak, to see that new speed limit sign or the new lane paint. The Waymo vehicle can no doubt react to changes in the world when it encounters them, relaying them back to the mothership, but it's better to know about them in advance.


Most AVs, definitely Waymo vehicles, are self mapping. They can detect environment changes and relay it to the entire fleet. That's because they map using the same vehicles as the fleet.


>You need boots on the ground, so to speak, to see that new speed limit sign or the new lane paint.

It'll shock you to know that you can simply get this from governments, some even provide this in API form


It probably won't shock you to know that those sources of data can be months to even years delayed from what's actually out in the world.


> or the new lane paint.

I'd be surprised if this is a thing outside the biggest US (and European, for that matter) cities, judging from Google StreetView there are lots of streets in US cities/towns with almost no paint lines at all.


Do you mean in the API? I live in an European country and I don't think I ever saw an asphalt road without paint lines. This varies a lot between countries though.


Small country side roads routinely lack a central line in Sweden. Even smaller roads can lack the side lines too. And I'm talking asphalt roads here still. The same happens on many residential streets in towns and cities.

But sure, it would be rare to have a large road or street without markings. But most roads aren't large. Most travelled kilometers happen on large roads, but that is not the same thing as most roads. And many individual journeys would involve at least a little bit of small roads at the beginning, end or both.

And of course, if they are covered with snow and ice during the winter you can't see the markings anyway.


Many American roads don't have lines. Residential roads, parking lots, many business driveways have limited markings.

Then there's roads with just the center line markers with no road should markings.

Then there's a whole class of roads of lines over "demarked" old lines that weren't demarked well, or lines fading that should've been painted a long time ago.

I'm surprised you've never seen a non-perfect road?



Here in Bucharest there are quite a lot of big boulevards that do not have them, either because they haven’t been repainted over in a long time or because they laid new asphalt without bothering to repaint the lines (this happens a lot, unfortunately, and is very frustrating).


no visual data, you need picture data for that. companies like NC tech do it for like $1m a city. or thereabouts.


That’s dumb then. It shows it’s just brute force rather than AI.

A human doesn’t need to be shown every single road that exists in order to drive.


That's true, but the human can do a much better job planning for the journey if they know what to expect along the way.

One example, from the end of the journey: knowing in advance where the actual entrance to the business is, or the specific curb cut that leads to the residence, makes it easier and far less error prone to decide exactly where the journey should end. Even humans have a hard time figuring out the right access point for a business or residence. This is a job for an offline process, fed by as many data sources as possible.


Just a bunch of sophisticated if statements, I guess.


Yeah I'm not so sure this CTO is on the mark here, but to be fair, I do think some of this IRL long tail/edge case data is important for Waymo. The simulation software is super interesting to me - the real world can be so chaotic, and even if they could generate every possible real life case, there needs to be validation on whether the Waymo driver is responding in the optimal way. They certainly haven't solved this problem, you can see some of their growing pains in all of these articles - floods in Austin, more and more interactions with emergency vehicles that first responders seem to believe are getting worse, etc.

Tesla on the other hand has billions of miles of data, yet because there is a limit to camera-only techniques, that data isn't that useful is it? They have no ground truth data to evaluate their camera system on, which is why sometimes you see those Teslas driving around with lidar rigs mounted on them. Going camera-only is just asking for trouble.


I agree real world data is important for Waymo. I didn't mean to say it wasn't, so I've edited my comment to reflect that. It's just that data is not some magic bullet to achieve self driving like Tesla and others suggest.

Of course, Waymo still has much more room for improvement. But it's much more efficient to supplement less but higher quality IRL data with large amounts of synthetic data, than to run a million data collection vehicles 24x7 because most IRL data is boring and useless.

Waymo said 6 years ago they simulate 20 million miles every single day [1]. Clearly, it's working for them given their scale of deployment right now.

[1] https://waymo.com/blog/2020/04/off-road-but-not-offline--sim...


Although most of the real-world data is probably boring, collecting more of it likely makes discovering rare edge cases more likely. But since they happen rarely, I imagine that after discovering them, they would then need to figure out how to simulate them.


> The best example of why data collection isn’t the bottleneck is Tesla.

Exactly. plus any delivery company/dashcam company can provide a bunch of data where ever there is any sizeable population.

About 8 years ago, that data would have been really valuable, but at best its nice to have.

the only thing that is valuable is the breadth of different cars, but even then its not that much of a differentiator.


The biggest difference, is Uber has vehicles around the world. So there's more data from countries with different rules from the US. Signage is definitely different between the US and Europe.


I.. am amused by the confidence on display, but I can't say that I am not concerned that people are confidently stating that real world data is not useful, because it can be just simulated. One would think that, by now at least, we know that simulation is at best an imperfect copy.

And I don't like the idea of even more data being harvested and used.. I just find the dismissal.. odd.


“Real world data is not the bottleneck” != “Real world data is not useful”

No one is suggesting the latter.


Parent's post noted that it is not a bottleneck, because it can be readily simulated ( and thus not useful ). I am not sure if QED is too much in this case, but I stand by my amusement. Or are you arguing that real world data is somehow less useful than simulated data? It is very confusing. I would accuse of nitpicking, but I just noticed you are the parent:D You can certainly speak for yourself.


> and thus not useful

Again, I’m not suggesting this. Bottleneck has a specific meaning. It means Waymo is limited by not having the ability to collect data. Well, clearly that’s not true because Waymo already has a reasonably scaled deployment across a dozen cities that no one else has and can handle millions of scenarios.

Real world data is absolutely required, but more of it doesn’t give you magical self driving ability as Uber’s CTO suggests. If it were the case, you’d see Tesla achieve fully autonomous driving years ago.


I accept your argument. I may have been a little too nitpicky and you do have several good points.


> The best example of why data collection isn’t the bottleneck is Tesla. They boast about billions of miles of data, yet they’re struggling to put out fully autonomous vehicles.

Well, TBF, the tesla data was complete garbage with earlier vehicles. They had cheap and somewhat bad cameras in the earlier vehicles that was only somewhat recently updated. And even then, I don't think Tesla is at the end of their hardware journey. I think they don't think that either, which is why they've gone to a subscription only model for self driving vehicles.

Waymo, on the other hand, has gathered less data, but more high quality data. They do the expensive mapping of a city which is a big part of why their vehicles have early on been able to do some pretty impressive feats. The drawback is getting that high quality data takes a lot of time and resources.


> And even then, I don't think Tesla is at the end of their hardware journey.

I dunno about that. Tesla seems completely adrift, pretending to pivot with random forays into humanoid robotics or whatever, to the point that I wouldn't be surprised if they exited the consumer vehicle space altogether within the next decade. They have no answer for Chinese competitors.


I recently watched some videos related to the production of cybercab, which has now started public testing. They’ve still done some great engineering, to the point that the car is now assembled like a matchbox car. All the drive components are contained in a single package for a FWD configuration that the body just drops down on. The car now has no controls besides the screen and door pulls. The materials are all lower cost and they even found a way to skip painting the cars. All of this should help them cut costs significantly.

As far as the self driving, they may be far off still, it’s hard for me to get a read on that and this vehicle is a bet that they will be able to achieve it - right down to the braille in the cabin, so maybe that’s why they still fail. The thing I will say is that despite the PR disaster that the CEO is, which gives us that feeling that the company has lost its mind, it seems they are still quietly doing some advanced engineering.


Well, let me rephrase, the previous stated goals of Tesla around self driving cars isn't complete with the current hardware.


Didn't they need the data from the 200 million miles or so from actual driving before they could get to the generative model though? Data isn't everything, as you point out with Telsa (mainly because they decided to forego using lidar it would seem), but it is pretty fundamental.


IIRC, they had clocked 20 million real world miles before starting to scale their deployment. But they were also driving 20 million miles in the simulator every day: https://waymo.com/blog/2020/04/off-road-but-not-offline--sim...


> before they could get to the generative model though?

Is that the right kind of model for this particular application?


Waymo might very well be missing specific kinds of data (e.g more incidents/accidents, near-collisions etc)

Also, Uber’s data might be useful for eval, not training (e.g « here is how Waymo would behave vs human drivers therefore it is safer »)


> Waymo might very well be missing specific kinds of data (e.g more incidents/accidents, near-collisions etc)

Accidents and near-collisions are exactly the kind of scenarios perfect for simulation. You don't test them out in the real world and risk injuries/deaths. You need to have confidence they're handled before you deploy.


Again, how do you know you've handled it correctly without ground truth? Simulation without ground truth is a garbage in garbage out situation.


I find the idea of learning from simulated data so unintuitive. How can you radically improve your model with just your model? I take it people do it, so it must work, but i just don’t understand it at all.


Well there's a world simulation model and then the driving model.

You can imagine improving i.e. a specialized math model (problem in, theorem out) with a normal LLM that knows lots of problems and theorems generally.


I think people are skipping over the fact that Google has had cars driving around taking photos for 20 years. I imagine that was used to build the world model in the first place.


They're two different models - you can use the world model to train (or test like Wayve) a different car-driving model.

The world model is basically intended as a more true-to-life simulator.


"You can’t be the CTO of Uber wanting to do AVs, and get the data collection requirement shockingly wrong."

Problem 1: Cost and privacy constrain limit data collection.

Problem 2: It makes not much sense to collect and store data that you already have. Yet you don't know that when collecting if it is useful or not.

Problem 3: P2P in urban setting fails at edge cases which by definition are rare to collect.

All of these problems limit AV scaling.


Yes, the way to make these things safer is to make up data and simulate on that.

Do you hear yourself?


That’s literally how it works right now, so yeah.


>Mapping out every intersection, sign, and signal Before our Waymo Driver begins operating in a new area, we first map the territory with incredible detail, from lane markers to stop signs to curbs and crosswalks. Then, instead of relying solely on external data such as GPS which can lose signal strength, the Waymo Driver uses these highly detailed custom maps, matched with real-time sensor data and artificial intelligence (AI) to determine its exact road location at all times.

https://waymo.com/waymo-driver/

That AI part is doing a lot of heavy lifting. They're using real data. We already know synthetic data is dangerous. Explains a lot of if you think it's more reliant on that than real data.


Mapping and simulation have very different purposes. Doesn’t look like you’re familiar with the basics of AV technology. Explains a lot why you’re confused about how real world and simulated data is used.


the word synthetic says it doesn't exist in reality.


Do you know anything about engineering?


This is fascinating. I feel like this is converging into the concept of a traditional "IDE". So much of your setup reminds me of IDEs indexing, doing static analysis, building ASTs, etc. before a developer starts writing code.


Yes, there is a parallel here. Now, some of those "indexing" steps can be performed by an LLM.

And that does not prevent mixing and matching the two, as some comments in this thread suggest.

Anyway, it's a great time for production coding.


> Before Waymo deploys in a new city, it deploys a huge fleet of cars that spend months of driving completely supervised, presumably to construct a detailed LIDAR map of the city.

Not entirely true. From their recent "road trips" last year, the trend is they just deploy less than 10 cars in a city for a few weeks (3-4 weeks from what I recall) for mapping and validating. Then they come back after a few months to setup infrastructure for ride hailing (depot, charging, maintenance, etc.) and start service.


Waymo drives 4 million miles every week (500k+ miles each day). Vast majority of those collisions are when Waymos were stationary (they don’t redact narrative in crash reports like Tesla does, so you know what happened). That is an incredible safety record.


> Tesla beating Waymo

Heard this for a decade now, but I’m sure this year will be different!


I didn't say this year, but lets bet on it?


Nothing says confidence like a prediction with an unspecified timeline.


Propose a bet with concrete details and resolutions so we can bet.

For instance, Would you like to bet 1000 dollars Tesla has more unsupervised self driving robotaxis than Waymo at the start of 2027?


We all know Tesla likes to play smoke and mirrors game with vehicle numbers — they have 300+ "robotaxis" but only 7 of them are unsupervised [1], and they shut down when it rains [2].

So let's use a metric that unequivocally shows who is 'winning'. I'm confident Waymo will have more paid rides per week than Tesla at the start of 2027 (I'll give you 2028 if you want). No other metric indicates scale better than passenger trips. If you have more robotaxis or you are in more cities, it will show up in the trip count.

I'll give $1000 to a charity of your choosing if Tesla beats Waymo in this metric. Fully unsupervised trips only, does not include trips with a safety driver or a monitor in a passenger seat, none of the usual games they like to play.

[1] https://robotaxitracker.com/?provider=tesla

[2] https://x.com/ethanmckanna/status/2022803049551372395


OK. Let's do 2028. My charity is myself. I'll also send you 1k if I'm wrong


There’s also one where Tesla hit a parked truck:

“13781-13644 Street, Heavy truck, No injuries, Proceeding Straight (Heavy truck: parked), 4mph, contact area: left”


They don’t have remote drivers. Your own link says that.

> The Waymo Driver does not rely solely on the inputs it receives from the fleet response agent and it is in control of the vehicle at all times.

> The Waymo Driver evaluates the input from fleet response and independently remains in control of driving.


Pay close attention to the wording: "The Waymo Driver ... remains in control of driving". That means it applies the controls needed to go from point A to point B on its own. However, it does not choose point A and point B on its own: a human chooses them. That's autonomous path planning, but not autonomous navigation, and certainly not "fully autonomous" anything.

Waymo prevaricates about the "influence" the human operator has on the path taken by the Waymo Driver [1] but it is clear there are situations that the Waymo Driver cannot choose point A and point B on its own, at least safely, otherwise Waymo would not be paying for humans to do it. They'd let the system do it on its own. It can't. It's not "fully autonomous".

We can play with words and accept whatever terminological obfuscation Waymo wants to impose in order to pimp its wares, or we can accept that current systems have limitations, and choose to understand the real SOTA over marketing.

_____________

[1] Fleet response can influence the Waymo Driver's path, whether indirectly through indicating lane closures, explicitly requesting the AV use a particular lane, or, in the most complex scenarios, explicitly proposing a path for the vehicle to consider idib.


These videos from Waymo shows what kind of guidance they provide:

https://youtube.com/watch?v=T0WtBFEfAyo

https://youtube.com/watch?v=elpQPbJXpfY

Notice how the system itself reasons about the scene and asks for help with possible options.

This whole story is a nothingburger. The only “news” here is that the operators are in Philippines.


> Tesla is executing the strategy that most quickly scales to 100% of the population.

So, uh… where is this “scale” then? This “strategy” has been bandied about for better part of a decade. Why are they still in a tiny geofence in Austin with chase cars?

Waymo is doing it right now. Half a million rides every week, expansion to a dozen new cities. Tesla does a few hundred in a tiny area.

Scale is assessed by looking at concrete numbers, not by “strategies” that haven’t materialized for a decade.


We know Waymo reduced their LiDAR price from $75,000 to ~$7500 back in 2017 when they started designing them in-house: https://arstechnica.com/cars/2017/01/googles-waymo-invests-i...

That was 2 generations of hardware ago (4th gen Chrysler Pacificas). They are about to introduce 6th gen hardware. It's a safe bet that it's much cheaper now, given how mass produced LiDARs cost ~$200.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: