Hacker News new | past | comments | ask | show | jobs | submit login

I fully agree with the sentiment that inspires your statement.

However, this thread highlights a fundamental property of a networked life: privacy is dead, there is only identity management.

You only need whois and google/facebook to find the data in this case. Both are instances of seemingly disconnected and "local" data points (as in nobody without a real interest would ever look for these things), which, chained, intersected, etc. can reveal a ton of information about a person, all of which was volunteered by the person in question.

The NYT had a piece on "human-flesh search" recently, and 4chan raids have been well-known for a while. The point being, that once you release something in the wild, it's there, and accessible. Ethically, is it fair game to mine it (like people search engines, for example, do)?

It should give us collective pause to think - why do we get all those web services like facebook, gmail, twitter, etc. for free? Without even noticing it we are whoring out our privacy and intimate patterns. Google has two customer groups: users and advertisers - and advertisers bring in the money, users are there to maximize advertiser and google ROI - e.g. Google allows pure-ad parked pages, they give tips on "blending" ads to increase CTRs etc.

If harnessed properly, these things can be useful, but it requires a mindset and workflow not entirely dissimilar to those of spies or high-end criminals - controlling information by selective disclosure, identity segmentation, disinformation, anonymization, etc. - not for sinister purposes, mind you, but simply to guard what we traditionally call privacy.

written with a one time account

> written with a one time account

Smart move, but I'll bet you I can still identify you.

One really big eye opener for me how subtle this problem really is was this article, I've just posted it as a separate submission:


I was just trying to empasize my point. Just out of curiosity, how would you go about identifying me?

Thanks for the linked article. Interesting indeed.

By taking the vocabulary (with frequency) of your comment and matching it against the vocabulary of all HN users.

It's a pretty large sample of text and I'm fairly sure that it is large enough that the match will be a close one.

Then there is 'stylistic' analysis, another angle of attack, mostly to do with the way you write, not just the words themselves.

Thanks, that would be my first angle of attack as well. The assumption that I have at least one other hn account is reasonable enough. Even when I wrote my first post, I was considering faking a different style/vocabulary (which is harder than it seems), but obviously, I was just using a throwaway account as a rhetorical device for emphasis.

Thanks for the exchange. :)

Ok, so I've run my analysis, my first guess was wrong, we'll see how many it takes to get it, if ever.

Quirks I noticed in the text to help with identification:

"fully agree"

comma followed by etc with a period in middle of sentence , etc. can

use of e.g. with a period after each letter.

well-known with a hyphen high-end with a hyphen likes to use colons correctly.

no misspellings. odd. proper use of punctuation. strange. I'd say the writer fails the Turing test and is a robot.

Also, odds are that someone wouldn't bother logging out and in between two accounts. I'd weight your search against people who commented in between the time that the throwaway account posted.

People who commented shortly before the throwaway account are also good candidates, as are people who seem to comment around the same time on other days.

> odds are that someone wouldn't bother logging out and in between two accounts.

Unless they use multiple browsers.

Well I fully agree with this comment i.e. I recognize the "quirks" too. It looks to me like the results of a U.K. education. As a Brit-born male, in my past I have had some non-Brit ladies comment about the robot...

I disagree on the UK edu unlees they're also a non-native - http://news.ycombinator.com/item?id=1199898

Except for "fully agree" and the hyphen bit, that would generally characterize my writing, if only because bad grammar & spelling bug me. But it sure ain't me :-)

if only because bad grammar & spelling bug me. But it sure ain't me

Heh. I see what you did there.

(Updated: "&" usage, "ain't", starting a sentence with a conjunction.)

> no misspellings. odd. proper use of punctuation. strange. I'd say the writer fails the Turing test

That's depressing.

Meanwhile, "[space]-[space]" rather than "--" for "—" is idiosyncratic, and since HN preserves original spaces in comments, ".[space]" rather than ".[space][space]" rules a few more of us out.

Meanwhile, "[space]-[space]" rather than "--" for "—" is idiosyncratic

It’s not idiosyncratic, it’s British.

I had an interesting HN corpus a while back but is there something newer/cooler you're using? Or did you just put it together yourself?

BTW, I strongly suspect stylistic analysis would work a lot better given the small corpus available for each user, though that idea goes out the window when someone deliberately tries to mask themselves.

I am not playing the devil's advocate here, but I think that as the internet matures it is the way to go.

Imagine buying groceries on Safeway. People does not seem to have trouble with that. However if a PE/police/ex-GF/stalker come there and wanted to track you down, they could. The cashier would remembered what you are like. The security would point out what you wore. Fellow shoppers can be interogated for your presence. And people do not have any problem with that.

Now, why can't I track a flamer on the message board, neo-nazis on twitter, hate-sayers on IRCs, perverts on dating sites, and paedophiles on MMORPGs?

Animosity does make the internet go round, but it also encourages stupid and illegal behaviour. It also destroys the credibility of the web as a trusted source of information, which is why you can't quote random sites or rants for an academic paper.

I believe the future of the internet is one which identity is not publicly shown, but managed. We'll finally be able to find out which company edited their own Wikipedia article.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact