
Uber NYC Trip Data from April to Sept. 2014 - brbcoding
https://github.com/fivethirtyeight/uber-tlc-foil-response
======
dotBen
What Social Good and Smart City projects could you produce with a dump of all
of Uber's trip data in a given city?

I think it's fair to say we (Uber) didn't intend for this data to be made
public, but it was produced under a Freedom of Information Law (FOIL) Request,
and now it is out.

So I'm curious how we at Uber could help improve cities and society with
anonymized dumps of this kind of data - if this was a formalized thing.

Let me know your thoughts here or directly (benm at the domain of uber)

~~~
vessenes
fivethirtyeight mentions one obvious usage: providing it to city planners for
say bus routing.

The other thing I think you could do that would be very useful for drivers is
show them areas that there are typical long waits / longer rides right now. I
know a lot of drivers do some planning, but they seem to be in the dark. I'm
sure a 'where to uber' app on android / ios for drivers telling them when to
get where to take advantage of surge pricing for customers that will want to
get taken to a desirable spot would be VERY popular among drivers.

~~~
applecore
City planners already know the bus routes that would best serve the city.
Unfortunately, there's no incentive to improve service or increase ridership
since it already doesn't scale: for every 30 cents that bus riders contribute
in fares, taxpayers put in one dollar collectively.

~~~
manicdee
City planners do not neccessarily know the best bus routes. They know which
busses are used more frequently, and if they have unified fare systems using
tap on/tap off cards, they know which trips are most often taken by bus.

What they do not know is where new bus services might be useful. If Uber was
able to show that there is a demand for transport between X and Y not
currently served by busses (e.g.: late night riders are there but the busses
aren't, or there isn't even a bus that services route between A and B).

Services like taxis and Uber would be invaluable for planning mass transit
systems.

------
minimaxir
Having only Lat/Long of pickup data is disappointing (would have liked the
dropoff data too), but the date range can be correlated to the NYC Taxi data
set, which exists for all of 2014.

That's enough for interesting visualizations and statistical analysis of
comparisons between the two, although the original 538 article
([http://fivethirtyeight.com/features/uber-is-serving-new-
york...](http://fivethirtyeight.com/features/uber-is-serving-new-yorks-outer-
boroughs-more-than-taxis-are/) ) is pretty good. (for my own visualizations of
the NYC Taxi dataset, see my blog post: [http://minimaxir.com/2015/08/nyc-
map/](http://minimaxir.com/2015/08/nyc-map/) )

The Aggregate_FHV_Data.xlsx contains data on Lyft as well. In September 2014,
Lyft did 115,999 total pickups in NYC, while Uber did 1,028,136 pickups.
(however, Lyft didn't have any activity in NYC until the end of July.)

~~~
rawnlq
Why are you just a software QA engineer with your skills?

~~~
ploxiln
I'm not minimaxir but anyway... that's an odd way to put it. My guess is,
he/she is not the kind of software QA engineer you might be accustomed to
depending on the kinds of companies you've worked at. Think of automated
custom static and dynamic code analysis, continuous integration, build
systems, deployment monitoring, etc. Places like Google and Stripe (and many
other technical organizations with good engineering culture) have very skilled
people doing these things.

I'm a normal/infrastructure engineer who reluctantly ends up focusing on test
systems every once in a while. I get all the tests running, from a single
script, with all results collated and machine readable, get CI set up
properly, make the system reliable enough to trust, etc. Because someone has
to do it! And yes I've done this in organizations where there was a QA
engineer or two, who did reasonably cheap semi-automated QA, but couldn't put
together a whole system and make it reliable enough to run on its own. But
there are definitely a few places where QA engineer is not the "lowest rank".

------
15thandwhatever
What. the. actual. fuck.

Trips outside of NYC (aka the 'burbs) have the full street address listed.
Those are addresses of people's single family homes, vs. the NYC addresses
which tend to be multi-family dwellings.

~~~
schrodinger
On mobile so I can't open it at the moment. Does it show names too? If not,
what's the problem with knowing which addresses took cabs?

~~~
littletimmy
It's an issue of privacy. The public shouldn't know who does or does not take
cabs.

------
henridf
Simple interactive timechart to see the number of rides over time per company:
[http://www.jut.io/play#gist/henridf/403a7b2e5d52a979eb28](http://www.jut.io/play#gist/henridf/403a7b2e5d52a979eb28)

Uber had a pretty sharp uptick in early September (end of summer?) which Lyft
didn't appear to have.

------
rburhum
Mmm... there are some large areas that have no pickups in April:
[http://i.imgur.com/FRnH4zM.jpg](http://i.imgur.com/FRnH4zM.jpg)

~~~
minimaxir
Using Google Fusion Tables can be a little misleading for this since you can't
accurately illustrate density.

~~~
rburhum
Not using Google Fusion Tables :)

[https://www.amigocloud.com/api/v1/users/22/projects/3153/dat...](https://www.amigocloud.com/api/v1/users/22/projects/3153/datasets/24717/preview#zoom=14,lat=40.741664577133506,lng=-73.95841598510742,blayer=AmigoStreet)

By the way, here is the data in many different formats (kml, shapefile,
geojson, etc) in case you want to play with it in other software:
[https://www.amigocloud.com/data_share/8850bab3c62141278ac5c4...](https://www.amigocloud.com/data_share/8850bab3c62141278ac5c41bbd7d8126/)

------
maxmcd
Why would the TLC have this data? Is this just for cab pickups through the
Uber app? If not, would those also be included in this dataset?

------
sehugg
It's a lucky thing for a few that the Ashley Madison lat/long is only an
approximation based on zip code, yes?

