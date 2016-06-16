reply
Yes only wealthy people's routes are being data mined. However, if someone is going from P to U, you will still learn that R to T is a hot spot. Those hot spots are common to everyone.
[1] In cities with good public transit (e.g., Boston, NYC), uber can seem more like a luxury but in cities that are very car centric (e.g., Detroit, Miami), Uber is actually cheaper than owning a car for many people. I live in Miami and use UberPool all the time and I meet people from all walks of life. Car insurance here is very expensive, so for a lot of people, using Uber saves them money.
I mean don't get me wrong, the data is valuable. But this is a company that constantly says "fuck you" to the law and fair work practices. Let's keep that in mind.
Uber is engaged in civil disobedience against unjust laws that exist to protect political insiders. When did this become a bad thing?
"Movement makes all insights available under the Creative Commons, Attribution Non-Commercial license."
The experience in Boston was that Uber anonymized the data too much for it to be useful[1]. I would be very skeptical of this as it seems to be a program publicly touted as useful, but the e-mails obtained via FOIA requests show it to not be. I'm not sure what the solution is since privacy should be a concern.
[1] https://www.boston.com/news/business/2016/06/16/bostons-uber...
> [Boston chief information officer Jascha Franklin-Hodge] said the data has been useful to show the volume of Uber rides in Boston and users’ typical wait times, but it has not done much to aid in city planning.
In other words, that was just a PR offensive. Maybe this new iteration will be something else, but I would be skeptical.
> What are the licensing terms?
Now, that's pretty interesting.
I wonder if we'll have API level access in a way that we can build tools on this data.
NYC already releases this information for taxis. It might even include Uber rides since they all have TLC plates, though I'm not sure.
http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtm...
I do wonder if there any holes in this idea though: What if they released the data the number of cars per hour per segment of road (a segment meaning a continuous piece of road between two intersections) rounded to the nearest multiple of 10?
Example technique: if you treat each record as a (src,dst) pair of (lat,lon) pairs or somesuch, you can then build a 4d grid whose cells you populate with (Laplace) noisy counts. This provides eps-differential privacy when the noise is roughly 1/eps.
Whenever counts are sufficiently large, you can refine the contents of the cell and ask again. If you do the refinement at most k times, you get k*eps-differential privacy. There are smarter ways that work even better.
These provide "trip privacy", meaning they mask the presence/absence of individual trips. Uber presumably has user identifiers, and could group all trips by one user together, and do the same counting where the weight of each trip is scaled down so that they sum to at most one for each user. This would then give "user privacy", meaning it masks the presence/absence of individual users.
Aggregation is a common method for anonymization. One approach is to only display trips that were made by at least 15 different people in a day.
EDIT: I think in the video they showed census tracts, which is one of many geographic units they could choose from.
Good idea though, but not applicable everywhere
https://www.bloomberg.com/news/articles/2017-01-05/uber-does...
I genuinely do not believe this. If they intended on making anonymized data open, they wouldn't have a marketing site up already.
