

NYC's Taxi and Limousine Commission Trip Record Data for 2014-2015 - bko
http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml

======
eck
I downloaded the first year dataset and didn't get very far in my analysis...
but this was the first result, which was kind of pretty:

[http://i.imgur.com/ov6K6mt.jpg](http://i.imgur.com/ov6K6mt.jpg)

~~~
stillmotion
I would love to buy a wall print like this.

~~~
verelo
This really is beautiful and I agree, if you do not want to sell it, is there
any chance you could share a higher res image?

~~~
eck
The trouble with making a higher res image is that when you make that lat/lon
buckets smaller, you have fewer samples in each one, so the image gets
noisier. For the best possible image you'd want to download _all_ the years of
data.

The process of making it was quite simple. I zeroed a 2d array of integers,
then took all the pickup/dropoff points and incremented the nearest cell. The
pixel values are based on the logarithm of the counts, since otherwise
everything outside midtown would be pretty much black.

There are some artifacts, like the thin vertical line down the east river. I
think that was because of how the data was rounded, i.e. the number of unique
longitude values that map to a certain image column.

I wrote this myself with a few hundred lines of C++, though I'm sure there's
GIS software out there that will do all this for you with a few clicks.

~~~
zippzom
Can you explain why there is a line going to the airport? Shouldn't the
airport be more of an island of light (since I imagine most people aren't
getting picked up / dropped off a half mile from the airport)?

Also did you overlay it onto a map? How did you get the angled effect if it's
just a grid?

~~~
eck
> Can you explain why there is a line going to the airport?

I assume it is because there is a fixed fare to/from JFK so drivers have
little incentive to start/stop the meter at the exact pickup/dropoff location.

> Also did you overlay it onto a map?

No. If taxis did not pick up or drop off people on some street, that street
does not appear. For example there is an area downtown where there _are_
streets but they have had security barriers since 9/11 thus no taxis.

> How did you get the angled effect if it's just a grid?

None of NYCs grids are exactly north/south/east/west aligned.

------
danso
FWIW:

\- 2013 data as FOILed by Chris Whong [http://chriswhong.com/open-
data/foil_nyc_taxi/](http://chriswhong.com/open-data/foil_nyc_taxi/)

\- 2008 to 2013 data as FOILed by me, on BigQuery
[https://bigquery.cloud.google.com/table/alien-
climber-851:ny...](https://bigquery.cloud.google.com/table/alien-
climber-851:nyc_taxi_redacted.trip_data?pli=1)

...note that after Whong's request, the TLC redacted the medallion numbers,
making it virtually impossible to analyze trips by cabbie.

~~~
aw3c2
> 2008 to 2013 data as FOILed by me

Is that data set available for people who do not use Google Accounts as well?
Maybe you could upload it to [https://archive.org](https://archive.org).

------
bko
I'm not able to review the data yet but I wonder if TLC took in any of the
feedback to better anonymize the data. [0]

Also, I would love to see an analysis of whether traffic is actually getting
worse compared to that of last year. This claim was made by mayor de Blasio as
a reason to cap Uber rides.

[0] [http://research.neustar.biz/2014/09/15/riding-with-the-
stars...](http://research.neustar.biz/2014/09/15/riding-with-the-stars-
passenger-privacy-in-the-nyc-taxicab-dataset/)

~~~
minimaxir
Also: [https://medium.com/@vijayp/of-taxis-and-
rainbows-f6bc289679a...](https://medium.com/@vijayp/of-taxis-and-
rainbows-f6bc289679a1)

Downloading one of the CSVs to check it out. Each one is about 2GB.

EDIT: Per the BigQuery table schema, medallion is no longer a field.

------
minimaxir
Press Release:
[http://www.nyc.gov/html/tlc/downloads/pdf/press_release_08_0...](http://www.nyc.gov/html/tlc/downloads/pdf/press_release_08_03_15.pdf)

BigQuery tables for the data:
[https://www.reddit.com/r/bigquery/comments/3fo9ao/nyc_taxi_t...](https://www.reddit.com/r/bigquery/comments/3fo9ao/nyc_taxi_trips_now_officially_shared_by_the_nyc/)

------
paulsutter
Local governments should be demanding detailed data from companies like Uber
in exchange for legalization, even raw data on locations of available cars.
They have the leverage to get it now but they're wasting the chance. This data
could eventually be used to avoid a true monopoly.

[http://www.washingtonpost.com/news/wonkblog/wp/2014/10/30/ub...](http://www.washingtonpost.com/news/wonkblog/wp/2014/10/30/ubers-
data-could-be-a-treasure-trove-for-cites-but-theyre-wasting-the-chance-to-get-
it/)

~~~
Shivetya
we already have enough hoops for companies to jump through to prevent
competition so we certainly do not need any more. New York is a perfect
example were regulation and so contorted the market you can make money selling
your permission to run the business to the point it might be more profitable
than running the business.

From medallions to food cart and restaurant permits, regulation is keeping
competition out while rewarding those who merely sit on permits and rent their
use. It is nearly an identical situation to how badly patents are managed and
rewarded

~~~
jdeibele
One of the reasons that taxis are licensed is that they're supposed to pick up
everyone and anyone. Uber and Lyft don't seem to have that problem with race
(AFAIK) but there are two other situations where people have trouble:

(1) small children, where you need a car seat (or two)

(2) people with disabilities - service animal, wheelchair, etc.

Analyzing the data would hopefully show whether people had to wait for 2
hours.

Also, as a substitute for race, it might be possible to see if certain areas
are under-served or not served at all. Perhaps drivers are avoiding picking up
in Harlem.

~~~
mlrtime
"One of the reasons that taxis are licensed is that they're supposed to pick
up everyone and anyone."

Which is clearly violated thousands of times per day. Sure you can call 311 in
NYC but very few people do.

~~~
rayiner
The difference is, the city has the authority to crack down on it. I was just
in NYC, and cabs are running a video in the back seat informing passengers
that it's illegal for cabs not to pick you up for race/disability. The city
doesn't legally have that kind of leverage over Uber.

~~~
SilasX
Leverage that is rarely and reluctantly used, and that no one reasonably
expects to be used, and that does not translate into any observable
consequences in the lives of black people attempting to get a ride.

I've once heard a quip that "You have no constitutional right to eat at a
restaurant, but you do have one to a speedy trial -- but which one feels more
secure?"

The city has "leverage" to stamp out discrimination against black people
wanting a cab ride, but "no leverage" to stamp out discrimination against
black people on Uber -- yet which one will more reliably secure a ride?

------
JoshTriplett
This is a disturbing dataset, in that it seems straightforward to extract
personal data from it. Consider the information revealed by a taxi ride
between a personal address and a workplace, a sensitive location, or another
personal address...

This kind of data was previously an issue with Uber, too.

