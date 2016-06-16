Hacker News new | comments | show | ask | jobs | submit login
Uber Movement (uber.com)
170 points by ndirish1842 4 hours ago | hide | past | web | 53 comments | favorite





Very interesting. I worry about a selection bias, though. Uber riders are wealthier than average, so the trips won't necessarily reflect all the commutes of those in the city. I hope this doesn't lead to just an improvement in bus routes, road construction, traffic, etc. between nicer sections of the city.

Traffic is traffic.

Yes only wealthy people's routes are being data mined. However, if someone is going from P to U, you will still learn that R to T is a hot spot. Those hot spots are common to everyone.

Not just wealthier [1], but also younger and more urban. People who have 25 mile commutes from the suburbs--even very wealthy suburbs--are more likely to own cars and less likely to use Uber.

[1] In cities with good public transit (e.g., Boston, NYC), uber can seem more like a luxury but in cities that are very car centric (e.g., Detroit, Miami), Uber is actually cheaper than owning a car for many people. I live in Miami and use UberPool all the time and I meet people from all walks of life. Car insurance here is very expensive, so for a lot of people, using Uber saves them money.

Well of course there's selection bias, but it doesn't affect this use case because cities already concentrate all their transportation efforts on the wealthiest areas first. How it may affect things is that traffic may get redirected to new areas the way Waze caused suburban neighborhoods in LA to become flooded with speeding rush hour traffic.

My quess would be that Uber use it as a carrot for cities. Let Uber operate and we will give city official data and city planning tools for free.

It's win–win–win. If cities make improvements to their transportation infrastructure based on this data, ultimately it helps Uber deliver better service to its customers.

Well except for the drivers that get paid less and less now. I mean we'll forget about all that once their self-driving vehicles start being deployed, right? Well until they kill people.

I mean don't get me wrong, the data is valuable. But this is a company that constantly says "fuck you" to the law and fair work practices. Let's keep that in mind.

I mean, don't get me wrong, this civil rights movement is valuable. But this is a movement that constantly says "fuck you" to the law (c.f. Rosa Parks) and fair work practices (the civil rights movement favored a "race to the bottom" between white and negro workers).

Uber is engaged in civil disobedience against unjust laws that exist to protect political insiders. When did this become a bad thing?

I think part of the problem is that the parallels between Rosa Parks and Travis Kalanick are not as obvious to everyone as they are to you.

Possibly also better overall driving experiences are good for long term business. In the same neighborhood as Facebook working to increase access to internet world-wide, or Google Fiber forcing telcos to increase speeds and drop costs. Basically encourage others to improve the infrastructure on which you rely.

A win win for everyone but the drivers. Sure they can quit but doesn't excuse Uber from using its negotiating power against exploiting them wherever possible.

Don't worry. Having drivers is not the long term plan anyways.

This. People who drive automobiles for a living should start investing in new skills now.

Hope this is as fantastic as this sounds. Many cities are starting to make this data publicly available, and the potential impact to urban planning, traffic reduction, parking and more is enormous. That said, what's Uber's commercial angle? Licensing fees?

"Movement makes all insights available under the Creative Commons, Attribution Non-Commercial license."

Making friends. Uber is quite aware that their reputation within cities hasn't been the best. Internally, there is a loose federation of people who genuinely want to improve cities, and people who recognize the business value in not continuing to piss cities off.

Yep. Conversely, it's interesting to see some urban planning folks warm up to ride sharing as a way to extend the reach of existing transit networks. But still a lot of hostility, to be sure.

I think urban planning folks have been, in principle, favorable to new mobility options that don't involve private vehicle ownership. I think some folks in the cities have been conflicted about some of Uber's lawlessness / relationship to labor.

> Hope this is as fantastic as this sounds

The experience in Boston was that Uber anonymized the data too much for it to be useful[1]. I would be very skeptical of this as it seems to be a program publicly touted as useful, but the e-mails obtained via FOIA requests show it to not be. I'm not sure what the solution is since privacy should be a concern.

[1] https://www.boston.com/news/business/2016/06/16/bostons-uber...

An unsurprising quote from the article:

> [Boston chief information officer Jascha Franklin-Hodge] said the data has been useful to show the volume of Uber rides in Boston and users’ typical wait times, but it has not done much to aid in city planning.

In other words, that was just a PR offensive. Maybe this new iteration will be something else, but I would be skeptical.

From the FAQ:

> What are the licensing terms?

> Movement makes all insights available under the Creative Commons, Attribution Non-Commercial license.

Now, that's pretty interesting.

I wonder if we'll have API level access in a way that we can build tools on this data.

This is cool.

NYC already releases this information for taxis. It might even include Uber rides since they all have TLC plates, though I'm not sure.

http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtm...

Chicago also released their taxi trips dataset recently: http://digital.cityofchicago.org/index.php/chicago-taxi-data...

And I do wonder why google maps with much more data hasn't done this...

1. Privacy 2. Google doesn't have the same incentive to release data to local governments as Uber does (to improve regulatory relationships).

How so privacy? This data is anonymised, they could do the same.

Provable privacy based on 'anonymized' published data sets that resists third-party information and linkage is really hard -- arguably impossible in general. When people claim 'anonymized' data, I tend to not be fully convinced.

I completely agree with you on the difficulty of truly anonymizing data.

I do wonder if there any holes in this idea though: What if they released the data the number of cars per hour per segment of road (a segment meaning a continuous piece of road between two intersections) rounded to the nearest multiple of 10?

Google's trip data is an ingredient in their secret sauce.

they do, via waze (https://www.waze.com/). Founded in israel, acquired by google in 2013.

Google makes driving time estimates in advance available to consumers through Google Maps. I use it all the time when planning events around rush hour.

Uber however caters to a specific class of people and they don't seem to be the people who can only afford public transport. This may not be an issue in America, but i think if cities change their public transit planning to accomodate the class of people who can afford uber, they will be disadvantaging those who really need it.

I'm curious what techniques they will use to anonymize the data. I would guess some sort of differential privacy technique.

I hope so, but I would be a little surprised if they used DP. I am more inclined to think they will take a 'traditional' (weaker in terms of provable privacy protection) approach.

It's totally reasonable to consider doing something like this with differential privacy. The techniques exist, but it would still be pretty brave of them. They are certainly aware of DP.

Example technique: if you treat each record as a (src,dst) pair of (lat,lon) pairs or somesuch, you can then build a 4d grid whose cells you populate with (Laplace) noisy counts. This provides eps-differential privacy when the noise is roughly 1/eps.

Whenever counts are sufficiently large, you can refine the contents of the cell and ask again. If you do the refinement at most k times, you get k*eps-differential privacy. There are smarter ways that work even better.

These provide "trip privacy", meaning they mask the presence/absence of individual trips. Uber presumably has user identifiers, and could group all trips by one user together, and do the same counting where the weight of each trip is scaled down so that they sum to at most one for each user. This would then give "user privacy", meaning it masks the presence/absence of individual users.

How does Uber anonymize the data? Does it only use subset of a trip? If you can see where and when a trip started or finished, it's definitely not anonymous enough.

From the FAQ: "All data is anonymized and aggregated to ensure no personally identifiable information or user behavior can be surfaced through the Movement tool"

Aggregation is a common method for anonymization. One approach is to only display trips that were made by at least 15 different people in a day.

Watch the video again--geographic aggregation. Summary stats from geo-to-geo.

EDIT: I think in the video they showed census tracts, which is one of many geographic units they could choose from.

In their trial with Boston, it was limited to zip codes[1]. Cross my finger for census tracts as that would be far more useful.

[1] https://www.boston.com/news/business/2016/06/16/bostons-uber...

Yeah one scenario I was thinking of was the case of a rural home that's isolated from other buildings. It's plainly obvious that any data coming to and from that building is the occupants. There probably has to be some threshold of users (say, > 100 users taking a route) before it's exposed through this service.

Maybe they could randomize the pickup / dropoff points within a small radius, say 1/4 mile.

Certainly, this data can be identifiable.

This is awesome, really happy to see them release such valuable data!

This won't help in Belgium, where Trucks need to pay "road taxes" and have a device installed for that. They can also get the data for tracking vehicles and they won't need Uber.

Good idea though, but not applicable everywhere

Knowing "N people wanted to go from roughly here to roughly there enough to pay Uber for the ride" is information that wouldn't show up in "my truck moved from X to Y" data, and which is potentially hugely valuable to those adjusting public transit.

This is focused on urban traffic. That may be totally different from truck traffic, which avoids (or even isn't allowed in) urban areas.

Doesn't this run contra to another recent story about Uber being against giving cities this data?

https://www.bloomberg.com/news/articles/2017-01-05/uber-does...

In that case, NYC is asking for addresses and timestamps. The data provided by Movement is aggregate. In other words, cities like NYC want Movement but without the anonymization.

This is probably their reaction to the government requests for data. They want to define the terms under which they share data, and "volunteering" is the way to do that.

Having read no further than the headline, I really wanted this article to be about on-demand piggyback rides.

> We believe that breakthrough insights and ideas can come from anywhere. In the coming months, we’ll be making this data open to all.

I genuinely do not believe this. If they intended on making anonymized data open, they wouldn't have a marketing site up already.

I don't understand why those have to be mutually exclusive. It's possible that they haven't finished cleaning the data for public use, but the data is ready for 'trusted' city planners.

I don't think it's mutually exclusive - I simply don't trust Uber to act in good faith for anything they do. They've yet to demonstrate that they're deserving of trust that they're going to make that data open.

Maybe they plan - for the right price - to make identifiable trip data available?

