

Passenger Privacy in the NYC Taxicab Dataset - rgrzywinski
http://research.neustar.biz/2014/09/15/riding-with-the-stars-passenger-privacy-in-the-nyc-taxicab-dataset/

======
cjbprime
Amazing. If you said to someone "Hey, I wanted to know where you went after
the cab picked you up last year, so I called up the cab company and asked them
where they dropped you off and they told me", they would be outraged at (your
behavior and) the breach of privacy shown by the cab company. But the city
released a dataset that allows exactly this query. What were they thinking?

Something else that could be mentioned in the linked article: if someone you
were with got in a cab in 2013, and they told you where they were going, and
you remember the approximate time and location, you can tell whether it was
their true destination _regardless of how many other people were being picked
up at the time_ , because you don't have to find the exact ride they took, you
only have to see whether any rides went to the place they told you.

This search is even extremely resistant to the differential privacy suggested
by the post's authors. I'd be much happier simply stating that location data
is not de-identifiable, and no-one should use a cab in a city that logs
location data if they aren't happy with an adversary knowing where they went.

------
eck
What I wondered about that data set is, if two people living/working at two
locations, consistently take taxis to meet at various other locations at the
same times, could that pattern be identified in the data?

That is, are there locations A and B such that there are matching trips to
locations M1,M2,... at times T1,T2,... i.e.
(A,M1,T1),(B,M1,T1),(A,M2,T2),(B,M2,T2) and perhaps reverse trips (M1,A,T1+x)
etc?

Further classification of M* -- hotels, for example -- could classify the
nature of the meetings. You might be able to identify the addresses of people
having affairs, or other deliberately secret rendezvous.

~~~
yaur
This would be relatively difficult for Manhattan, some parts of the outer
boroughs though are a different matter.

I was concerned when this first hit HN because I have a friend that lives in a
fairly sparsely part of town and his (now ex) girl friend has a possessive ex-
husband that doesn't like her seeing other men. He isn't going to be able to
make sense of the data himself, but if someone weaponizes it the way you are
talking about it could be a real problem for people with stalkers/psycho-exes.

------
0898
Interesting read. The frequent visitors to gentlemen's clubs are probably
dancers rather than patrons.

~~~
benihana
I'd be very surprised if dancers were taking cabs to work

~~~
mpeg
My good friend owns one of those places, his girls all either book a taxi or
get picked up by their partners.

As an attractive girl, you do not want to be walking / taking public transport
in certain areas of town at 6am, it's a sad reality.

~~~
newman314
Even more terrifying is that it is going to be trivial to determine the pickup
location just by reversing the trip.

Please show this as an example to people that say "Why should I care? I have
nothing to hide".

------
ben1040
In fairness to the celebrities accused of being cheapskates, I thought it was
the case that the trip record in the dataset didn't include a tip amount if it
was paid in cash.

~~~
gelstudios
Number of trips by tip percentage where payment_type='CSH':
[http://i.imgur.com/EJh1B2d.png](http://i.imgur.com/EJh1B2d.png)

~85M rides with 0% tip is "interesting".

~~~
jfim
I don't know how the data is obtained, but it's probably more likely that the
driver is tipped but the actual tip amount is unreported than the actual tip
amount being zero.

------
scintill76
As a general fan of open data like this, I've been a little worried these
analyses would lead to the data not being released in the future. Hopefully if
they change anything in the future, it will still be useful and interesting.

------
tjoepie
This is an excellent (if troubling!) piece which deserves a wide audience. As
we say in Sauf Effrica: bleddy lekker stuff!

------
RankingMember
Anyone know of an existing online front-end connected to Google Maps for this
data?

~~~
comrh
It's pretty big (>5gb). It is here to download though:
[https://archive.org/details/nycTaxiTripData2013](https://archive.org/details/nycTaxiTripData2013)

------
elazarus
Outstanding ground breaking article

