> In effect, every Lyft vehicle in operation today, with a smartphone on the das...

darawk · on Oct 23, 2018

> Another instance of the "more data is better" fallacy.

More data is better. The fact that humans have better algorithms that need less data does absolutely nothing to negate this.

skywhopper · on Oct 23, 2018

Depends on whether it’s the right data, or meaningful data. The idea that Lyft needs this acquisition in order to implement video capture from its drivers’ cell phones is laughable so something else is going on with this acquisition. But to your point, the assumption that this is even a problem of “not enough data” is questionable at this point. How to turn that data into results is something no one has come close to figuring out yet.

darawk · on Oct 24, 2018

> Depends on whether it’s the right data, or meaningful data.

Street level mapping data isn't relevant or meaningful? Basically every company working on this problem seems to pretty strongly disagree with you.

> But to your point, the assumption that this is even a problem of “not enough data” is questionable at this point. How to turn that data into results is something no one has come close to figuring out yet.

This is trivially false. Given infinite data, all possible situations would be represented in the data, and the solutions applied in those situations could be copied exactly, something that existing algorithms are completely capable of doing.

YeGoblynQueenne · on Oct 24, 2018

>> This is trivially false. Given infinite data, all possible situations would be represented in the data, and the solutions applied in those situations could be copied exactly, something that existing algorithms are completely capable of doing.

In principle. In practice, you'd need infinite time and infinite storage.

Btw, do you have to add stuff like "This is trivially false" to your comments? It doesn't make your comments sound more right, only less well considered.

darawk · on Oct 24, 2018

> In principle. In practice, you'd need infinite time and infinite storage.

That is irrelevant.

> Btw, do you have to add stuff like "This is trivially false" to your comments? It doesn't make your comments sound more right, only less well considered.

Trivial in the mathematical sense. As in, there is a trivial counter-example to your point. Citing infinity is a 'trivial' case. I'm using 'trivial' to describe my counter-example, not his error.

YeGoblynQueenne · on Oct 24, 2018

If I may be frank: don't use language in the mathematical sense if you are not in a maths classroom or you dont' know maths.

mijamo · on Oct 24, 2018

Given infinote fata, infine storage and infinite computing you would be right. In practice it means you are wrong. Feeding more data does not necessarily help given a finite amount of computing power.

YeGoblynQueenne · on Oct 24, 2018

More data is necessary with current technoloogy, in the sense that modern statistical machine learning algorithms are very bad at generalising to unseen data, and the only way to overcome this is to give them more examples.

There are machine learning techniques that generalise well from few data, but they are not very well known in the industry.

Also, though more speculatively, I think the idea of "lots of data" is attractive to marketing departments. There's something about algorithms that need huge amounts of data and compute, that only a select few companies can use. I guess it gives bragging rights, of a sort: "we got the biggest data around. Buy our stuff!".

But like I say, that's speculation on my part.

darawk · on Oct 24, 2018

> More data is necessary with current technoloogy, in the sense that modern statistical machine learning algorithms are very bad at generalising to unseen data, and the only way to overcome this is to give them more examples.

Precisely.

> There are machine learning techniques that generalise well from few data, but they are not very well known in the industry.

Sure, and we'd all love to be using those. But even if you generalize well from small datasets, you still generalize better from larger ones.

> Also, though more speculatively, I think the idea of "lots of data" is attractive to marketing departments. There's something about algorithms that need huge amounts of data and compute, that only a select few companies can use. I guess it gives bragging rights, of a sort: "we got the biggest data around. Buy our stuff!".

It may be attractive to marketing departments, but it is also essential to data science projects like this.

YeGoblynQueenne · on Oct 24, 2018

>> Sure, and we'd all love to be using those. But even if you generalize well from small datasets, you still generalize better from larger ones.

Not to my knowledge. What techniques did you have in mind that work like that?

darawk · on Oct 25, 2018

Literally all of them? Linear regression, neural networks, KNN, I could just enumerate all ML methods here, but I think the foregoing is sufficient.

YeGoblynQueenne · on Oct 26, 2018

I'm sorry, I don't understand. Which of the above generalises well from small datasets?

darawk · on Oct 26, 2018

Who said they do? I said they generalize better from larger datasets. The entire point of this discussion is that more data is better.

YeGoblynQueenne · on Oct 26, 2018

I was referring to this part of our exchange:

ME: There are machine learning techniques that generalise well from few data, but they are not very well known in the industry.

YOU: Sure, and we'd all love to be using those. But even if you generalize well from small datasets, you still generalize better from larger ones.

That is not how those techniques work to my knowledge, so I was asking which ones you had in mind.

darawk · on Oct 26, 2018

My point is that they all generalize better from larger datasets. Size is relative and some techniques work better with more or less data. Linear regression, for instance, can work quite well with much less data than a neural net. It just depends on the complexity of the problem.

YeGoblynQueenne · on Oct 26, 2018

>> My point is that they all generalize better from larger datasets.

Like I say, this is not the case. There are learning algorithms that generalise so well from few data that their performance can improve only marginally with increasing amounts of data, or not at all.

I appreciate that you probably have no idea what I'm talking about. I certainly don't mean linear regression.

darawk · on Oct 27, 2018

> Like I say, this is not the case. There are learning algorithms that generalise so well from few data that their performance can improve only marginally with increasing amounts of data, or not at all.

Erm, no. Not unless they are solving the problem perfectly.

> I appreciate that you probably have no idea what I'm talking about. I certainly don't mean linear regression.

I work in the field. I'm quite certain i'm familiar with whatever it is that you think you're talking about.

The category of algorithms that attempt to learn things from few examples is called 'One shot learning'. It's usually in the context of image classification, but it applies equally well elsewhere. These algorithms still learns better from more data.

Do feel free to share an example of an algorithm that generalizes better from less data. I'll wait.

YeGoblynQueenne · on Oct 27, 2018

>> Erm, no. Not unless they are solving the problem perfectly.

Well, yes, that's what I mean.

I gave an example here a while ago, of how a Meta-Interpretive Learning algorithm, Metagol, can learn the aⁿbⁿ grammar perfectly from 4 positive examples:

https://news.ycombinator.com/item?id=17837055

That's typical of Metagol, as well as other algorithms in Inductive Logic Programming, the broader sub-field of machine learning that MIL belongs to.

>> Do feel free to share an example of an algorithm that generalizes better from less data. I'll wait.

To clarify, my claim is that there are algorithms that learn adquately from few data and therefore don't "need" more data. Not that less data is better.

That said, there are theoretical results that suggest that a larger hypothesis space increases the chance of the learner overfitting to noise. So what is really needed in order to improve generalisation is not more data, but more relevant data. Then again, that is the subject of my current PhD so I might just be interpreting everything through the lens of my research (as is typical for PhD students).

You work in the field? What do you do?

toast0 · on Oct 24, 2018

A small amount of the right data is better than lots of the wrong data. Collecting a lot of some data, because it's easy to collect isn't very helpful if it turns out to be the wrong data.

It would likely be more informative to instrument a few cars with some advanced sensor package and let well ranked drivers drive them around than to try to gather data from smartphones in existing cars, but I suppose it depends on what the end use is.

darawk · on Oct 24, 2018

This is mapping data, not driving data. You need both.

agumonkey · on Oct 24, 2018

More data is not always better, it can be for sure, you need to have the analytical capabilities to turn it into useful information. Otherwise it's just hoarding.

darawk · on Oct 24, 2018

They have those capabilities.

soared · on Oct 23, 2018

How is having recorded video of actual roads not valuable data? You're assuming the only option is training self driving cars on it. They might be training something totally separate to recognize signs, or see damaged roads, how quickly pedestrians react at different times of day, etc.

Very narrow view point for you to take.

lolc · on Oct 24, 2018

I agree that having detailed maps is an advantage. But they only make you a better driver in the places where they are correct. As such you can't rely on them to learn to drive, because you must be able to adapt to changes. Driving is not about the best case, but about the worst case.

Using uncalibrated and random positioned phone cameras to learn about driving a car that has better sensory equipment seems backwards to me. But point taken, the article says "learn more about driving overall." So that could be anything.

blattimwind · on Oct 24, 2018

To reiterate, you cannot, ever drive based on any map, regardless of how many smartphones collected that map's data. You might use a map for navigation, but even then you'll have to deal with closed roads or changed traffic flow that isn't yet on the map. You cannot use maps for driving.

lolc · on Oct 24, 2018

Well that's my sentiment too. But the article is wishy washy on the use of the collected data. Having local knowledge can make you a better driver after all.

woah · on Oct 23, 2018