TL;DR: It's fairly easy to deanonymize datasets like this, provided they are somewhat complete.
We already learned that Warhol's 15 minutes of fame should read 15 megabytes , but, to cut the "it's the users choice to post that data" apologists short: almost no-one I speak to understands the implications of all possible interpretations, classifications and groupings that their online traces allow.
Giving it to 4sq for data mining is different than giving it to UMN and/or the whole internet for data mining and/or deanonymization.
We'll be contacting this researcher to ask where they got this data and whether it conforms to our policies.
Data is not "publicly and freely accessible" if accessing it requires you to agree to separate terms of service for it that restrict your ability to access and redistribute it.
(Whether or not one believes the data should be freely and public accessible is a separate matter, but given the above, it's hard to make the case that it is).
Amusingly, this data still isn't "freely accessible", because these people have attached their own, separate terms to reusing and redistributing the data.
Rules on itself don't restrict anything, enforcement does.
(the direct link is not working, but this confirmed that was freely available)
The number of check-ins seems to be low compared to other numbers.
I'm thinking this data set seems like a fun way to fill a rainy weekend, going for a dive into these worlds :)
Edit: Not removed, just unaccessible. 403.
EDIT: I would love to get my hands on this data... anyone? :)
But if you have any questions about it, I can try to answer them, my username here also corresponds to a gmail account I use ;)