
A challenge, identify this HN user, I tried twice and failed - jacquesm
http://jacquesmattheij.com/A+challenge%2C+identify+this+HN+user%2C+I+tried+twice+and+failed
======
adrianwaj
It could well be CitizenParker.
<http://news.ycombinator.com/user?id=citizenparker>

He submitted the Human Flesh story from NYTimes that he mentioned in his
comment. <http://news.ycombinator.com/item?id=1167615>

And he's only submitted 2 stories since starting his account 341 days ago. So
the story must have meant a lot to him, 1st to submit it, 2nd to mention it in
that comment.

Also, (s)he's only made 18 comments in total and his last comment was 104 days
ago yet the flesh story was submitted 13 days ago: so he's careful with his
comments: probably careful with his identity and privacy too, so much so that
CitizenParker has no bio info: and look at the name CitizenParker (sort of
like call yourself John Smith on a sample credit card) - so generic naming
could be important to CitizenParker: something he's conscious about, and will
write about it: whilst also doing that anonymously.

But mainly, his style seems similar, which was what got me thinking.

<http://news.ycombinator.com/threads?id=citizenparker>

<http://searchyc.com/user/citizenparker?only=comments>

Does it really matter?

edit: <http://citizenparker.com/> Scott Parker:
<http://citizenparker.com/page/About-Scott-Parker.aspx>

------
lkrubner
Is the question serious, or is it meant as a joke? As a riposte to the earlier
thread, it is fantastic. The user writes "privacy is dead" and here, on Hacker
News, and also on jacquesmattheij.com, you have a thread with a lot of
intelligent people trying to figure out the person's identity, and failing.
Therefore, the user's original point is disproven simply by starting this new
thread. If this was deliberate, then this was genius.

This earlier thread:

<http://news.ycombinator.com/item?id=1197027>

Contains this sentence:

"this thread highlights a fundamental property of a networked life: privacy is
dead, there is only identity management."

but then this contradicts the original thesis:

"If harnessed properly, these things can be useful, but it requires a mindset
and workflow not entirely dissimilar to those of spies or high-end criminals -
controlling information by selective disclosure, identity segmentation,
disinformation, anonymization, etc. - not for sinister purposes, mind you, but
simply to guard what we traditionally call privacy."

I'd say the current thread offers proof that privacy can be defended. After
all, here we have all these smart people, failing to identify the earlier
user.

~~~
jacquesm
Exactly! That's the whole idea here, I was quite surprised that my first
solution (with a very high correlation) failed, even more surprised when the
second one failed as well (especially since that person had been commenting in
the same thread _and_ had a very high correlation as well).

Elsewhere in this thread someone is jumping up and down to stop trying to
identify the poster, the funny thing is I think he/she is in no danger at all
of being identified, at least not without his/her cooperation.

The only person that could identify this user is PG, and maybe alaskamiller,
and I'm pretty sure that our secrets are safe there.

edit: and if that person is the original poster then they're not helping
themselves by increasing the sample size :)

~~~
jimmyjim
_However_ , as I tried pointing out (in a seemingly dead thread:
<http://news.ycombinator.com/item?id=1200091>), identity management is
something that only the technically savvy can pull off. And even they are
likely to stumble apart at some point, because being perfect in every way is
inhumanly hard. And so, as it stands, privacy _is_ dead in this age.

~~~
jacquesm
On /. posting anonymously is as simple as checking a box, even if you have an
account.

Plenty of people have done so over the years, checking a box is nothing that
only the 'technically savvy' can do.

If you do that rarely then I think your anonymous words are reasonably safe.
If you do it regularly then you are open to the kind of attack that I
attempted, and then it will have a better chance of success.

Privacy is dead in a general sense, companies like facebook, google and
twitter facilitate identified communication and in that sense every letter you
wrote using the old postal system was just as revealing, it just wasn't open
to be read by the public.

People are slowly coming around about all this stuff being visible online. I
can see that with the 'reocities' project, on average two people every day ask
for their old account to be wiped because of privacy reasons. That's not much,
but it still means that 1,000 non-technical users per year that I happen to
have backed up a few pages for realize this. So if you extrapolate that to the
internet at large I think that the number of users that are wising up to this
is much larger than you'd expect at first glance.

Time will tell if there will be enough support for this, the 'think of the
children' and 'war on terror' people seem to have the advantage for now, but
laws that are enacted can in due course be repealed.

I've never bothered to hide my identity, there is nothing that I have to say
that I wouldn't put my name to, even if not all of it is received equally
well, that doesn't bother me (maybe it should).

There are people in positions that are sensitive that have stuff to tell us,
in such cases (which are rare) anonymity really serves a purpose and I think
this little experiment shows that without at least access to some log files
these exercises get a lot harder.

edit: your thread definitely isn't 'dead'.

------
duhprey
I'm not sure I'll have the time to do this, but I've had some good results
running Latent Semantic Analysis and Latent Dirichlet Allocation on a similar
problem. In my case, I have data from people playing a negotiation game and
having a conversation with a human actor. I have scores from a human judge
going from 1 - 5. Using LDA on the transcriptions of the dialog I can predict
the results of the human judge to a correlation of .5 There was a previous
study with essay's a teacher grades that got .8 with LSA. The LSA study used a
much larger training corpus outside the individuals.

For slightly more details, here's a sketch of the algorithm: Treat each
comment as a "document" input to LDA. Use the theta matrix that represents the
distribution of topics over each document. Then use the inverse dot product
between two document theta vectors and perform k Nearest Neighbors to predict
IDs. You should be able to tune the rank and k values from all the labelled
data.

When it comes time to infer I suggest running the with the whole set through
LDA instead of reusing the discovered alpha and beta. For some reason (which
I'm not entirely sure of), my results seem much better that way.

------
nagrom
Simple psychology would lead me to guess that it is you Jacques. If I wanted
to tell how easy it was to identify an anonymous comment, then I'd make one.
I'd then publicise it, and challenge other people to crack it.

Maybe I'm being too clever for my own good...

~~~
username3
Or it is you nagrom.

~~~
nagrom
Sadly, in this case, I am not Spartacus. That would require a mind so cunning
that it makes mine hurt just to think of it ;-)

------
windsurfer
Is it marketer?

<http://news.ycombinator.com/threads?id=marketer>

This is merely a naive guess. He's the only other user on hacker news
(according to Google) to use the term "people search engines". He also seems
to have been working in the data mining business.

~~~
andreyf
From a quick look at his comments, he seems to match the other heuristics
seasoup mentions in that thread: meticulously correct spelling, grammar, and
punctuation, use of semicolons, and use of dashes.

He also seems to comment heavily on technical issues - programming languages,
database technologies, so it might make sense for him to feel heavy non-tech
opinions deserve a onetimetoken. Similarly reasoned, the sentiment of "privacy
is dead, there is only identity management" seems to be a realization
appropriate for someone who recently started working on YC-funded companies.
Seem convincing to me, but since they're all reverse-justifications, probably
best to take them with a grain of salt: you might be able to draw similar
conclusions combing through many other comment histories.

Interesting that this "human [powered] search engine" style of identification
might have been faster than devising a machine heuristic.

~~~
barmstrong
Yeah it's almost crowd sourced. Reminds me of when the cops put a letter or
riddle from a serial killer in the newspaper figuring someONE out there will
recognize it, as opposed to a computer recognizing it. Or maybe that just
happens in the movies.

~~~
staunch
The unabomber was caught by his brother recognizing his writing.

"...his brother recognized Ted's style of writing and beliefs from the
manifesto, and tipped off the FBI." from
<http://en.wikipedia.org/wiki/Theodore_Kaczynski>

------
kyro
It'd be really interesting if we had challenges, both social and technical,
posted here on HN on a weekly basis. Some of the solutions and discussions
would be pretty brilliant, I think.

~~~
DanielBMarkham
I'd like to propose a tag:

Challenge HN:

~~~
kyro
Sounds good. Anyone with a challenge, email me at kyro@kyrobeshay.com with
title/text of the submission. I'll post them on a weekly, or even bi-weekly,
basis and credit the author.

Edit: The intention behind this was to keep it structured and organized,
contest-like, and not for karmic purposes, which I take is the reason for the
downvotes.

~~~
eru
How about donating small prices for the winners? Similar to ICFP programming
contest bragging rights [1].

[1] See <http://en.wikipedia.org/wiki/ICFP_Programming_Contest#Prizes>

~~~
jacquesm
That would certainly spice things up.

Hacking and puzzling are intricately interwoven anyway, especially debugging.
It's no wonder that plenty of hackers have hobbies like lockpicking.

~~~
eru
Also donating prizes would give a different metric than pure karma-per-
submission to order the challenges. (Though it might be hard to order bragging
rights. But we should be able to find a (corporate?) sponsor who hands out 50
dollar for the charity of choice of the winner every week. (Hey, I might even
be able to get the money out of my employer, if I asked to--or I just do it
myself.))

Enough parenthesis. I just go ahead and pledge 10 Pounds per week to it.
Perhaps we should discuss more by email?

(More later, I'll have to go to bed now.)

~~~
jacquesm
Ok, I'll match your 10 pounds, whatever that works out to in my currency
(euros).

------
ericb
I think the post was by "eru"

I base this on eru's phrasing and use of "dissimilar" in the post in question,
which can also be noted here:

<http://news.ycombinator.com/item?id=1159200>

My strategy was to look for unusual words and phrases and do a google site
search for those phrases.

Additionally, eru's post in this thread indicate an interest in privacy and
eru's activity pattern is both frequent, and recent which I would expect to be
true for the poster.

edit: Here, eru even taunts us a bit:
<http://news.ycombinator.com/item?id=1200060>

~~~
eru
<http://en.wikipedia.org/wiki/Non-denial_denial>

------
gruseom
Could be fun, but are there answers to these two questions?

    
    
      1. Does this user object to being identified?
      2. How will you know you succeeded?
    

I just skimmed the thread (already 107 comments), so perhaps I missed it, but
I didn't see anything definitive.

Also, where's randomwalker when you need him?

~~~
jacquesm
1) based on his writing ('a one time account as a rhetorical device') I don't
think he'd mind, also there is nothing in the comment itself that you would
have to be ashamed of

2) you can't be sure, unless the person will confirm using the original 'one
time' account.

------
onetimetype
Stop trying to identify this user.

They did not issue a challenge to be identified--in fact they agree with the
notion that privacy is dead, which seems to be what you're trying to prove
with this exercise. They may have serious reasons for using a one time
account.

If your name is one of the (very random) guesses in this post, please neither
confirm nor deny that the user is you, since this could identify that user by
elimination.

This item should not have so many points. The post is rubbish. A 275 word
sample is long, but likely insufficient given the pool of candidates. The post
did not explain what methods were used, what work in authorship identification
influenced his approach, nor did he provide his ranked findings. The tries are
actually failed guesses, rather than, say, different algorithms attempted.
This item has now devolved into a guessing game, rather than a coding
exercise.

Again, stop trying to identify this user.

 _written with a one time account_

~~~
jacquesm
> The post did not explain what methods were used,

The post didn't but the original thread did, I tried matching the vocabulary
of the samples to the corpus of HN comments.

> what work in authorship identification influenced his approach

This is not a scientific paper.

> nor did he provide his ranked findings.

I'm _not_ giving my ranked results because I think two attempts from me is
enough.

> The tries are actually failed guesses, rather than, say, different
> algorithms attempted.

They were the #1 and #2 outputs of my code.

> This item has now devolved into a guessing game, rather than a coding
> exercise.

No-one said that you had to guess, but human guesses are also powered by
computation at some level, even if it would be very hard to figure out exactly
what went on.

> Again, stop trying to identify this user.

If that request would be posted by 'onetimetoken', who posted three times then
it would have some credibility.

If you are not him/her why does this upset you ?

The 'one time account used as a rhetorical device' says fairly clearly that it
is just a gimmick, not some kind of terrible secret.

And if you are 'onetimetoken' you are increasing the sample size ;)

~~~
whimsy
I assume this upsets the user because using a one time account indicates a
desire not to be identified or associated with the posted content, and the
user wants this preference to be honored.

~~~
Freebytes
They find it amusing, and they do not mind. If you look at the comments of the
user, they authorize their own identification if we can do so.

~~~
whimsy
Ah, by "the user" I meant the user that was upset, not the target of the ID
hunt.

My apologies for being ambiguous.

------
mmelin
I bet it's tokenadult. I do not have any other proof than the fact that I
immediately thought of that username when I saw onetimetoken. :-)

~~~
aresant
The word "token" is used in two completely different contexts, so I don't
think that's right.

EG:

"tokenadult" = the included minority adult

"onetimetoken" = account used once, like putting a disposable token into a
machine

~~~
tokenadult
You are correct. I first used the screen name on a forum, and then another
forum, where the majority of users are teenagers. The screen name doesn't fit
well here on HN (where almost everyone is an adult, even though I am older
than most participants), but I like to minimize my use of distinct screen
names. However, I am sure by screen name searches that other people now use
this same screen name.

~~~
smokinn
Maybe you could switch it up and use commonadult along with tokenadult
depending on the perceived demographic?

Though I'm not one to speak, I've used "smokinn" (or "Smokinn" which I
generally prefer) for, I believe, 16 years now.

------
yumraj
I just searched for "google-facebook" and "identity management" and saw a blog
by the title "Google-Facebook: Identity Management in a Brave New Internet"

Link:
[http://blogs.oracle.com/clayton/2008/05/googlefacebook_ident...](http://blogs.oracle.com/clayton/2008/05/googlefacebook_identity_manage.html)

But, don't know if Clayton Donley is on HN or not..

His Bio, at Oracle:

Clayton Donley, Sr. Director, Development

Currently run the dev organization for some of Oracle's security and identity
management products. Landed here after selling OctetString in 2005. Before
that held various roles at IBM, Motorola, and as an independent consultant.
Also wrote LDAP Programming in 2001.

~~~
josh33
I think everything matches here except that Mr. Donley capitalizes
"F"acebeook. The comment in question has these as lowercase, which has already
been mentioned.

~~~
yumraj
Well, you're forgetting that most people, well at least I am, are more careful
when writing formal blog posts vs simple HN comments. So, the minor
differences can be attributed to that.

For example, the comment has "/" in google/facebook, while the blog has
"Google-Facebook".

What made me curious was not just the topic, and the identity management,
google-facebook, but the fact that he is in the field of security/identity
management.

But as I said, I dont know if he is on HN or not :)

------
mds
How about "martythemaniak"?

Quote:

> 1 point by onetimetoken 23 hours ago | link

> I was just trying to _empasize_ my point. Just out of curiosity, how would
> you go about identifying me?

<http://searchyc.com/empasize>

~~~
pbhjpbhj
Only two posts - <http://searchyc.com/%2522identity+management%2522+roi>

Edit: ignore, just realised SearchYC doesn't respect "" and looks for near
match words.

~~~
dmoney
From that search, what about tptacek, based on the non-standard use of "ROI":
<http://news.ycombinator.com/item?id=1024825>

~~~
pbhjpbhj
The styles are similar.

------
eagleal
I used the rarely rare words used (rare combination of words used only 1/few
time(s)[ORG]):

(i) The first search[1] revealed _"gstar"_ , but although they both have
similar writing style, gstar doesn't have an active participation on privacy
discussions (based to the query only [2])

( _ii_ ) The second search[3] revealed _"astine"_ : now this is interesting
because this user has a very active participation on privacy discussions[4],
especially I think he was inspired by _why[5].

[1] <http://searchyc.com/sentiment+that+inspires>

[2] <http://searchyc.com/user/gstar>

[3] <http://searchyc.com/Ethically%252C+is+it+fair>

[4] <http://searchyc.com/user/astine>

[5] <http://news.ycombinator.com/item?id=774337>

EDIT: [ORG] Based on the original comment at
<http://news.ycombinator.com/item?id=1197027>

~~~
JesseAldridge
strlen is a match for [1]: <http://searchyc.com/sentiment+that+inspires>

He's my prime suspect: <http://news.ycombinator.com/item?id=1200739>

It's gotta be him.

~~~
eagleal
I noticed strlen, too. But I don't think it's strlen. Some say can be also
randomwalker.

------
rjett
Didn't someone come up with a "Which HN User Are You Most Like" app a couple
months back? I'm trying to find it now but no luck so far.

~~~
ErrantX
[http://swimwithoutgettingwet.com/hnusers/?user=onetimetoken&...](http://swimwithoutgettingwet.com/hnusers/?user=onetimetoken&weight1=0.4&weight2=-0.15)

Smart idea; but I just tried it and no results.

------
vaksel
it's pretty impossible to identify a user just by 1 anonymous post on a
website.(without the logs). I mean sure you can compare a person's typing
style...but unless they always add "jambalaya" to their posts, it'll be next
to impossible to be 100% sure.

The way it works in real life, is that you find a person's email address or a
long term account on a forum, and then use that info to build up a full
profile about that person. The longer the person is on the web, the more
personal information they've revealed in the past.

i.e. 6 months ago they might have mentioned their phone #...so you can use
whitepages to see their address. Or maybe they posted a link to their
site..where they didn't have privacy enabled, so you can get the full name and
address using whois. Or maybe they are using the same username on all sites,
so you can use google to see all the forums they've ever posted on. etc

~~~
noodle
<http://news.ycombinator.com/item?id=1199756>

yes? no?

~~~
vaksel
not me

------
smokinn
Do you keep a corpus of all HN comments?

I doubt PG would appreciate all of us hammering the HN server to collect the
data.

~~~
jrockway
searchyc has this.

~~~
jacquesm
As well as google.

------
JesseAldridge
I guess randomwalker, cuz this sort of thing seems to be his specialty. And
the writing style seems to match up fairly well.

<http://news.ycombinator.com/user?id=randomwalker>

------
pbh
Obviously, everyone loves a good challenge, but is there any evidence that the
find-ee wants to be found?

They seemed interested in the thread as to how they might be found, but don't
seem to have given any permission for a site-wide (wo-)manhunt. (This might
have happened out of band, though.) I guess they do say that they were just
using a one time account for rhetorical emphasis.

Further, if this were to really be a contest, it seems like there should be
some sort of rules, such that the result isn't determined just by exhaustion
of currently in use usernames by guessers.

~~~
jacquesm
Please read the whole original thread, that's exactly how we got to that
point, and the response I got to my 'I bet I can identify you' and his/her
admission that they thought of obfuscating the text made it pretty clear they
would not mind an attempt, but that does not guarantee that there will be a
resolution.

------
JesseAldridge
I picked several suspicious words and ran them through searchyc. The four
rarest words in the post are: pure-ad, CTRs, disinformation, and dissimilar.

Walked through the results for each word, pulling in all the usernames:

    
    
        intersection of:  pure-ad dissimilar
        ['noodle']
        intersection of:  CTRs disinformation
        ['jacquesm']
        intersection of:  CTRs dissimilar
        ['ivankirigin', 'patio11', 'strlen']
    

Of those, strlen's writing style seems to be the closest match. So I'm
changing my guess from randomwalker to strlen :)

~~~
patio11
It wasn't me. Nice approach, though. Just intersecting word choices has very
little to recommend it for industrial scale author identification but for a
small-ish community like HN it might work, and of course it is trivial to
implement if you already have the data source lying around.

------
joshu
Heh. I identified a Reddit IAMA once this way.

------
roundsquare
Okay, so stupid question time... but oh well.

Is there an easy way to download the corpus of comments?

------
Freebytes
"What a surprise to find a whole thread and blog post dedicated to the search
for my identity. I consent to a benevolent search for my identity or
identities. I was quite surprised to see the speed and scale of this
development - another symptom of networked life."

This person has said that we can find out his identity... if at all possible.
Therefore, you are welcome to search based on his own permission.

------
chanux
I tried gender guesser. Words on the post was not enough so I used all the
text he wrote.

<http://www.hackerfactor.com/GenderGuesser.html>

The Output

Total words: 365

Genre: Informal Female = 326 Male = 616 Difference = 290; 65.39% Verdict: MALE

Genre: Formal Female = 348 Male = 450 Difference = 102; 56.39% Verdict: Weak
MALE

Weak emphasis could indicate European.

------
petercooper
_So I wrote a bit of code to compare against other HN comments_

Do you have a corpus?

Further, and this is an open question, is there an archive/downloadable corpus
of HN in part or entirety anywhere? It would be fascinating and I'd love to
keep a copy to look back at in years to come.

~~~
jacquesm
> Do you have a corpus?

Yes, that's why I thought it would be an easy challenge.

My bad :)

> Further, and this is an open question, is there an archive/downloadable
> corpus of HN in part or entirety anywhere?

Yes, you can query the google cache. It's fairly easy to do.

The only things you don't get that way is the stuff you can see as a logged in
user.

~~~
petercooper
_Yes, you can query the google cache. It's fairly easy to do._

You could, but I'm not trying that again. It's easy to get your IP busted
depending on what random algorithm Google decides to run each week.. <g>

------
mrcharles
Do you have any proof that the guy would acknowledge you are correct even if
you identified him?

Seems like a good thing to verify first. Maybe you already got the guy and he
just said "no it's not me."

~~~
jacquesm
I'm a big proponent of fair play and I think the author would identify himself
when asked, but at the same time only PG can be sure.

Of course you can be paranoid, but I think the bigger chance is the author
seeing a chance here at sowing some disinformation. Such as participating in
this thread and giving false pointers and / or confusing the issue.

For the _really_ paranoid, of course the last person to participate in this
thread is 'the one'...

It wasn't me, that's for sure :)

------
stse
First I want to say that I don't agree with publicly disclosing the "identity"
of people who doesn't want to be found. I also don't think doing so
"originates" from any good personal quality. That being said, I do remember
this [1] talk from last CCC to be interesting from a technical standpoint.

[1]
[http://events.ccc.de/congress/2009/Fahrplan/events/3468.en.h...](http://events.ccc.de/congress/2009/Fahrplan/events/3468.en.html)

~~~
jacquesm
Obviously it would not have to be public, an email with a confirmation and a
request to keep it quiet would be fine.

The original poster could log in and disclose who found him first.

------
DanielBMarkham
Sample size way, way too small

~~~
windsurfer
The sample is actually pretty huge considering the number of words.

~~~
DanielBMarkham
_For online messages with such short length, when the full set of features are
used, a sample size of about 30 messages per author is necessary to predict
authorship with an accuracy of 80~90%_

[http://ai.eller.arizona.edu/COPLINK/publications/CACM_From%2...](http://ai.eller.arizona.edu/COPLINK/publications/CACM_From%20Fingerprint%20to%20Writeprint.pdf)

------
Freebytes
One of the strong indicators is the use of italics in the post. Many users
will ignore formatting within their posts. I am confident this user has used
italic formatting before for emphasis and has done it often within their HN
posts. It also indicates a comfort level within HN which means they have
likely posted frequently. (At least one a month)

------
duck
I think it is a trick and it's jacquesm.

------
simon_
My googling turned up "neilc" as a user of dashes at least one of the obvious
digrams from that post.

~~~
jimmyjim
I thought google ignored dashes (as well are all other punctuation
characters), how did you deal with that?

------
maxklein
Run the following phrases on your thingy and filter by the users who use them:

"The point being,"

", mind you,"

"I fully agree with"

"highlights a fundamental"

":" some text "," some text "."

", etc."

" - e.g."

"entirely dissimilar"

Whatever user has the most instances of these signature phrases is likely your
man.

~~~
jacquesm
That's exactly what I did and failed...

~~~
pbhjpbhj
There are some short "googlewhacks" (though they are multiple words) in there:

* collective pause to think

* pure-ad parked

These suggest to me a [highly proficient] non-native speaker too. "pause for
thought" and "pure ad-parked" are correct versions.

* "intimate patterns" is an unusual turn of phrase in this context, would probably be "personal usage patterns"

* "high-end criminals" looks like an unusual hyphenation

This search gives a name -
[http://www.google.com/search?q=%22identity+management%22+roi...](http://www.google.com/search?q=%22identity+management%22+roi+%22networked+life%22).

~~~
petercooper
"high-end criminals" isn't unusual. Certainly not to these British eyes. If
you Google for "high end criminals" even without the hyphen, about half of the
results use the hyphenated version.

The other things you point out encourage me to share your opinion, however.

~~~
pbhjpbhj
Google's regular search doesn't handle hyphenation but Trends appears to:
[http://www.google.com/trends?q=%22high-
end%22%2C%22high+end%...](http://www.google.com/trends?q=%22high-
end%22%2C%22high+end%22%2C%22highend%22). Google searches give me results
which suspect this Trend search is not sound however.

I'm from the UK too.

~~~
petercooper
The problem is context, a common issue with tracking things with Google Trends
in particular. Tracking programming language usage with it, for example, has
been a nightmare ("ruby" and "python" having far too many meanings, but few
write "ruby programming").

"high end" has more uses than "high-end." For example, "I bought a car at the
high end of my budget." In that case, "high-end" wouldn't make sense. In
"datacenters have been targeted by high-end criminals," however, "high-end" is
a compound adjective.

Alternatively, you could drop the hyphen and/or form an entirely new word:
"highend." The word "highend" doesn't seem to have caught on yet, though. I
suspect that's because "upmarket" covers the same meaning already and is less
susceptible to these morphological mishaps.

(On seeing what OS X had to suggest as a correction for "highend," it
suggested both "high end" and "high-end.")

~~~
pbhjpbhj
It's a blunt tool, agreed. But it was supposed to be a simple indicator only,
not a measure.

------
gort
chime fits a few of the patterns: use of etc. mid-sentence, occasional use of
hyphens - in this very pattern - and moderate use of slashes when "or" would
do. Also American spelling.

Still, it's easy to get into a sort of confirmation bias looking at this stuff
manually, and seeing things that fit while missing things that don't.

------
eru
A different (and probably much easier) challenge: Disprove that it was me [or
insert any other user here].

~~~
jacquesm
I don't think that's an easier challenge at all.

Disprove that pink elephants exist vs prove that gray ones do.

Which one is the easier challenge ?

~~~
eru
Yes, and still people seem take the denials of the other users at face value.

~~~
sorbus
I think that we're all treating this as a game, under the assumption that if
we figure out who onetimetoken is he'll tell us, as it supports his point that
privacy is dead. Also, it wouldn't be fun if we assumed that he will deny it
if identified. Never underestimate the importance of having fun.

------
visitor4rmindia
I'm going with SwellJoe because he seems to have some patterns in common.

------
pmikal
Privacy is the new sharing.

------
noodle
i believe it is vaksel.

~~~
jacquesm
Definitely a possibility... but based on what ?

~~~
noodle
i tossed a few of the stylistic quirks into a search and took a look at some
writing samples. i think his stuff looks the most similar. i'm going purely on
my own arbitrary judgment. it just _feels_ right. :) the method itself isn't
much different than what has already been talked about.

edit: oh right, i also looked at the fact that he was posting comments on HN
around the time that the one in question was posted. a lot of my other
candidates didn't meet that data point.

~~~
jacquesm
> i also looked at the fact that he was posting comments on HN around the time
> that the one in question was posted. a lot of my other candidates didn't
> meet that data point.

Ah, very clever, another angle of attack. Never thought of that one.

------
grandalf
Out of curiosity, why does it matter who wrote it?

~~~
jackowayed
I don't think the reason he wants to find out is because he wants to know. I
think he wants to prove that it's possible to find out.

He thought it was long enough that he could pretty trivially have a program
compare the writing style to other HN comments and determine who it was, but
he failed. So it's a challenge for other hackers--can you write a program that
can determine who said something simply based on the writing style and knowing
that he's a member of a reasonably small sample (HN users).

~~~
jacquesm
You got it. Sorry for not being more clear, I thought it was an interesting
challenge, and since I've used up my 'two guesses' I think it is more
appropriate to admit failure rather than to keep on hammering away at it until
I hit the right user.

PG would have an easy time of it (log files) :)

~~~
blogimus
Well, that's the difference between a black box and a white box.

------
Estragon
Why is this a worthwhile endeavour?

~~~
jacquesm
It's not 'worthwhile' in a sense that you can't take it to the bank (though it
may come to that, see elsewhere in this thread), but I think it is what
hackers do, solve puzzles.

This is effectively a puzzle, a reasonably hard one (my attempt failed, but
not for lack of trying, I spent a fair number of hours on it before making my
guesses, of course it could simply be that I'm stupid), and one that seems fun
to solve.

It is _exactly_ the kind of thing that I enjoy doing when it comes to
programming in the first place, figure out how stuff works and/or solving
reasonably hard problems. One step above my current competence is my favorite,
that way I'm reasonably sure I can solve the problem, if the difference is too
big then I tend to get stuck.

It's of course a bit like the question why people climb Mount Everest, the
answers are: because they can and because it's there.

edit: Funny, I thought your downmod for asking a valid question was unfair, in
return I get downmodded for answering :)

------
pclark
website down :/

~~~
username3
I fully agree with the sentiment that inspires your statement.

However, this thread highlights a fundamental property of a networked life:
e.g., etc. <http://news.ycombinator.com/threads?id=onetimetoken>

------
shorbaji
vanelsas

------
yourdaddotmom
<http://news.ycombinator.com/item?id=1162135>

------
ohashi
I am Spartacus

~~~
gwern
No, _I_ am Spartacus!

