

NYC Taxi Trips Data from 2013 - Anon84
http://www.andresmh.com/nyctaxitrips/

======
SuperKlaus
Original torrent links from Chris' site for faster downloads:
[http://chriswhong.com/wp-
content/uploads/2014/06/nycTaxiTrip...](http://chriswhong.com/wp-
content/uploads/2014/06/nycTaxiTripData2013.torrent)
[http://chriswhong.com/wp-
content/uploads/2014/06/nycTaxiFare...](http://chriswhong.com/wp-
content/uploads/2014/06/nycTaxiFareData2013.torrent)

------
yincrash
Just to repeat my comment from a previous post, The hack license and medallion
fields appear to be unsalted one way hashes.

------
gk1
Will be interesting to see the comparison to 2014, when Boro Taxis[0] were
introduced to serve the outer boroughs.

[0]
[http://www.nyc.gov/html/tlc/html/passenger/shl_passenger.sht...](http://www.nyc.gov/html/tlc/html/passenger/shl_passenger.shtml)

------
iandanforth
If you clicked through to Chris Whong's site and wondered about the phrase
"hack license" this wikipedia article may be of interest:
[http://en.wikipedia.org/wiki/Hackney_carriage](http://en.wikipedia.org/wiki/Hackney_carriage)

------
ImJasonH
I uploaded these datasets to Google BigQuery in case anybody prefers to use
that to go data-diving:
[https://bigquery.cloud.google.com/table/833682135931:nyctaxi...](https://bigquery.cloud.google.com/table/833682135931:nyctaxi.trip_data)
[https://bigquery.cloud.google.com/table/833682135931:nyctaxi...](https://bigquery.cloud.google.com/table/833682135931:nyctaxi.trip_fare)

~~~
livejake
This is great. Is there anyway you could use data types other than string? The
way it's currently setup makes it hard to analyze.

~~~
ImJasonH
Sure, it wouldn't be hard to re-load the data in a new dataset/table with more
accurate types. Strings are just the default. I just wasn't sure how clean the
data was and didn't want to debug bad data before loading. I was lazy
basically.

But if you wanted to do it, it's pretty straightforward.

In the meantime I haven't found it to be too hard to just INTEGER(field) and
FLOAT(field) wherever I need to.

------
danso
The data fields kind of confuse me...I assumed that there would be one
medallion per operating cab, right? And I assume that one cab, i.e. medallion,
can make one pickup in a given _second_...therefore, pickup_datetime and
medallion should result in a unique key, right?

But I get dupes. From the 2013 January dataset, medallion
"DD336C4ADA65CCBD2284F63BF348A4F0" makes 3 pickups at 7:09AM on Jan 13, 2013.
The dropoff time is noted as that exact same second of the pickup. Yet the
pickup coordinates, and the dropoff coordinates, are all different. Though
there are all kinds wonky real-life situations...a driver starting the fare
and shutting it off during a trip, accidentally, perhaps...this seems to
indicate an error in how the data is recorded...because the machine, even if
doing something that breaks business rules, the timestamp should still be
consistent with real time, right? Anyway, it's something to be wary of when
doing analysis.

And it also looks like the hack licenses are all different...which I'm not
even sure how that works, but it doesn't seem to be a case simply of 3
different drivers being registered to the same medallion...

The three data rows in question:

    
    
        DD336C4ADA65CCBD2284F63BF348A4F0  689153C038F51E19A47257E15DF838BD  VTS 1   2013-01-13 07:09:00 2013-01-13 07:09:00 1 0 0.000000  -73.937492  40.758327 -73.937492  40.758327
        DD336C4ADA65CCBD2284F63BF348A4F0  0200F6738B90178B8B75EEF9E3C1988E  VTS 1   2013-01-13 07:09:00 2013-01-13 07:09:00 1 0 0.050000  -73.973656  40.738422 -73.973656  40.738422
        DD336C4ADA65CCBD2284F63BF348A4F0  B8C7594C95BB86A228C785305287583B  VTS 1   2013-01-13 07:09:00 2013-01-13 07:09:00 1 0 0.900000  -73.979225  40.747093 -73.929260  40.850357

~~~
andresmh
I think a medallion represents a group of taxis cabs. Each cab has its own
hack license.

~~~
yincrash
One medallion per cab. One license per cab driver.

------
blaincate
few interesting facts : from july data

22 % rides are less than 1 mile long

85 % rides are with 1 passenger

and more at : [http://www.twikstik.com/blog/2](http://www.twikstik.com/blog/2)

~~~
blaincate
was playing around with google maps and data : you can see the heatmap for
taxi pick up by the day for first 6 days of july. Also could be useful as
where uber drivers can find passengers based on last year data

passenger pick up for 4th july :
[http://akuchlous.github.io/NYC_CAB_ANALYTICS/July/4/](http://akuchlous.github.io/NYC_CAB_ANALYTICS/July/4/)

passenger pick code :
[https://github.com/akuchlous/NYC_CAB_ANALYTICS](https://github.com/akuchlous/NYC_CAB_ANALYTICS)

------
aw3c2
checksums please :)

