EDIT: There are other ways to anonymize data than simply removing the name associated with data.
If one looks at a stream of location data over time, and sees the recurrence of a particular location in a residential area, particularly at night, then it can be pretty well surmised that this is your home. And from that, it's a trivial step to get your identity. And bingo, the anonymized data is now re-identified.
That would still be a very valuable dataset (for me at least), and almost completely free of PII.
Than again, I'm not an expert in these things; am I missing some way that this could be deanonymized?
In a city, that is probably anonymous. If you are in a rural area or drive along a route where your car makes up the majority of the data points, it still isn't.
But in a nutshell his point is that by its very nature GPS data collected over a constant time period cannot be anonymized. If your car is located >50% of the time in one of two places, chances are one is your home and one is your office. I now know where you live (and thus your identity) and I know where you work.
Everyone here is assuming anonymize means to remove name but keep everything else intact. I see no indication that this is the case. If there is reason to believe otherwise, point me in that direction.
for any purpose, at any time, provided that following collection
of such location and speed information identifiable to your Vehicle
He gave no evidence that they were not anonymizing the data properly,
he just assumed they were not.
EDIT: In response to parent edit and below comments
I have no proof of these, but factoids I believe to be true (so feel free to base a research paper on them :D)
1) To identify commuters: (Highway-Entrance-Location, Average-Highway-Entrance-Time, Highway-Exit-Location, Average-Highway-Exit-Time) -> some derived values: approximate (home,work), average speed, average driving aggression
2) Really, now that I think about it, any dataset where multiple gps tracks (for a single person) are tied together is out. If you can get any single Average-Location-at-Specific-Time data point, (plus point #3 below) you've reduced the unique set to quite small. Then you just stand on that street corner at that time (or, for the police, use the red light cameras...) and you're done.
3) This is an OnStar dataset we're talking about, so you're looking for GMC-manufactured cars, made in the last ~10 years (or whenever onstar started going into cars). I'm willing to bet that just that data point is enough to reduce any other lukewarm/weak de-anonymization to a solid match.
4) Anyone who buys onstar as an option is quite concerned with their safety at all costs (... my bias, I guess, since I consider it a waste of time), so look for e.g. families with small kids or other dependents.
I'm running out of steam for this single comment, but name is certainly not necessary for unique ID. Ongoing research is cracking this stuff wide open. When the netflix dataset came out, who would have thought that movie ratings could uniquely identify a person?