However, this thread highlights a fundamental property of a networked life: privacy is dead, there is only identity management.
You only need whois and google/facebook to find the data in this case. Both are instances of seemingly disconnected and "local" data points (as in nobody without a real interest would ever look for these things), which, chained, intersected, etc. can reveal a ton of information about a person, all of which was volunteered by the person in question.
The NYT had a piece on "human-flesh search" recently, and 4chan raids have been well-known for a while. The point being, that once you release something in the wild, it's there, and accessible. Ethically, is it fair game to mine it (like people search engines, for example, do)?
It should give us collective pause to think - why do we get all those web services like facebook, gmail, twitter, etc. for free? Without even noticing it we are whoring out our privacy and intimate patterns. Google has two customer groups: users and advertisers - and advertisers bring in the money, users are there to maximize advertiser and google ROI - e.g. Google allows pure-ad parked pages, they give tips on "blending" ads to increase CTRs etc.
If harnessed properly, these things can be useful, but it requires a mindset and workflow not entirely dissimilar to those of spies or high-end criminals - controlling information by selective disclosure, identity segmentation, disinformation, anonymization, etc. - not for sinister purposes, mind you, but simply to guard what we traditionally call privacy.
written with a one time account
Smart move, but I'll bet you I can still identify you.
One really big eye opener for me how subtle this problem really is was this article, I've just posted it as a separate submission:
Thanks for the linked article. Interesting indeed.
It's a pretty large sample of text and I'm fairly sure that it is large enough that the match will be a close one.
Then there is 'stylistic' analysis, another angle of attack, mostly to do with the way you write, not just the words themselves.
Thanks for the exchange. :)
comma followed by etc with a period in middle of sentence
, etc. can
use of e.g. with a period after each letter.
well-known with a hyphen
high-end with a hyphen
likes to use colons correctly.
no misspellings. odd. proper use of punctuation. strange. I'd say the writer fails the Turing test and is a robot.
People who commented shortly before the throwaway account are also good candidates, as are people who seem to comment around the same time on other days.
Unless they use multiple browsers.
Heh. I see what you did there.
(Updated: "&" usage, "ain't", starting a sentence with a conjunction.)
Meanwhile, "[space]-[space]" rather than "--" for "—" is idiosyncratic, and since HN preserves original spaces in comments, ".[space]" rather than ".[space][space]" rules a few more of us out.
It’s not idiosyncratic, it’s British.
BTW, I strongly suspect stylistic analysis would work a lot better given the small corpus available for each user, though that idea goes out the window when someone deliberately tries to mask themselves.
Imagine buying groceries on Safeway. People does not seem to have trouble with that. However if a PE/police/ex-GF/stalker come there and wanted to track you down, they could. The cashier would remembered what you are like. The security would point out what you wore. Fellow shoppers can be interogated for your presence. And people do not have any problem with that.
Now, why can't I track a flamer on the message board, neo-nazis on twitter, hate-sayers on IRCs, perverts on dating sites, and paedophiles on MMORPGs?
Animosity does make the internet go round, but it also encourages stupid and illegal behaviour. It also destroys the credibility of the web as a trusted source of information, which is why you can't quote random sites or rants for an academic paper.
I believe the future of the internet is one which identity is not publicly shown, but managed. We'll finally be able to find out which company edited their own Wikipedia article.