

Similar Hacker News Users - riffer
http://www.swimwithoutgettingwet.com/hnusers/

======
riffer
People asked for a similar users tool back in the early days of HN:

<http://news.ycombinator.com/item?id=701>

And recently, too:

<http://news.ycombinator.com/item?id=1036247>

How does this particular tool work? It's based on the threads a given person
comments on, who else comments on those threads, and how the topics and
terminology of threads and comments relate to each other. Karma on the
relevant threads is used as a subtle authority metric. Incorporating voting
relationship histories would almost certainly make the tool better.
Particularly at finding interesting (and not merely similar) stuff.

That said, it'll be interesting to hear what folks think.

~~~
sarosh
I commend you on an elegant UI. Would you mind explaining a bit how exactly
you anticipate using voting relationship histories in future revisions? Also,
am I correct in assuming that there is some sort of distance function involved
in computing 'similarity'?

------
idoh
I recently changed my user name from antiismist to idoh. When I moved the
karma setting to Diamonds in the Rough, it listed idoh as the #2 most similar
user to antiismist. Nice!

[http://www.swimwithoutgettingwet.com/hnusers/?user=antiismis...](http://www.swimwithoutgettingwet.com/hnusers/?user=antiismist&btn=Search+for+this+HN+User&weight1=0.4&weight2=0.2)

------
ambition
This is awesome. Having computed a reasonable distance function between users,
you should be able to use this distance function as edge weights in a big
graph. Rendering this graph with a force-directed layout algorithm like
Fruchtermann-Reingold might create visually appealing results by clustering
related users.

I've done this before on different datasets and would love to cooperate with
you on it...

------
profquail
Interestingly, there seems to be a local minimum (or maximum) in your
algorithm that I found when searching for myself:

Start here:
[http://www.swimwithoutgettingwet.com/hnusers/?user=profquail...](http://www.swimwithoutgettingwet.com/hnusers/?user=profquail&btn=Search+for+this+HN+User&weight1=0.5&weight2=-0.15)

The next two 'clicks' to the left (towards co-commenting) don't have 'spolsky'
in my list:
[http://www.swimwithoutgettingwet.com/hnusers/?user=profquail...](http://www.swimwithoutgettingwet.com/hnusers/?user=profquail&btn=Search+for+this+HN+User&weight1=0.3&weight2=-0.15)

But one more click to the left, and he reappears in the list:
[http://www.swimwithoutgettingwet.com/hnusers/?user=profquail...](http://www.swimwithoutgettingwet.com/hnusers/?user=profquail&btn=Search+for+this+HN+User&weight1=0.2&weight2=-0.15)

~~~
riffer
This is interesting, thanks for pointing this out. The top result is actually
getting filtered out from the displayed results (don't ask). spolsky is going
from being the second result, to the first result, to the second result in
your examples.

~~~
po
You keep mentioning filtering… are you sure we can't ask? I'm curious.

------
llimllib
excellent choice setting the defaults to give very flattering results...
there's a lesson in that.

~~~
patio11
I could probably turn this into a blog post (and will later), but there are a
lot of microoptimizations of default settings for B2C apps that make huge
differences in conversion:

\- Defaults should be almost sufficient to use the app. (i.e. if it has some
workflow, you should be able to pretty much hit "Next next next" and get it to
work. Bonus points for making the number of nexts as low as possible.)

\- Pick something that results in visually impressive output rather than a
blank page. See Balsamiq for inspiration here -- they start you with a mockup
in progress that demonstrates most of the highlights of the software.

\- Ever seen Firefly? I really like how they use the word "shiny". Ideally,
your defaults should show the shiny in your app. In Hollywood they have a
saying: make sure your budget makes it onto the screen. In B2C apps, make sure
the stuff you did all the work on makes it into the user experience most of
your users will see.

\- Assume your user is a novice at both your software and the problem domain
until you have evidence otherwise. A lot of people ask the user "Hey, are you
a novice?" That is one way to do things, but it makes your core workflow one
stage longer and every stage costs you conversion. I prefer "Assume they are
and give them a discrete 'skip ahead' button" or "Assume they are and watch
them for evidence that they are not".

\- If your app is supposed to make the user feel like they just killed an
effing lion, then your default settings better have a lion bound and gagged
sitting under a forty-ton weight suspended by a weak string which passes
through an open pair of scissors next to a sign saying "Snip this."

\- (Do this if nothing else.) Track actual usage of the app and modify your
defaults based on actual usage. Bonus points if you can do it dynamically, if
that makes sense for your app. For example, if you pick A as the default and
25% of your users go out of their way to change it to B, then that probably
should have been B. (You can split test and see how many people would have
changed it to A if it had defaulted to B.)

~~~
llimllib
While all of these are true, I find that they usually derive from a simpler
rule: Have a laser focus on the users you're trying to reach, and everything
you do should make their lives easier in some way.

When I find people breaking these sorts of rules, it's usually because they're
thinking of themselves or some non-customer stakeholder.

edited to add a corollary: until you've done the sort of testing patio11
advocates, you _don't know_ who that customer is.

~~~
patio11
I think what you are saying is good advice, but don't think it is necessarily
co-extensive with what I'm suggesting.

For example, say you're targeting teachers. Keeping a laser focus on what
teachers want is important. However, I think you need to put _extra_ focus on
making their first five minutes absolutely amazing. (And their first 30
seconds. And their first 5 seconds.) My reason for this is simple: almost all
apps are going to leak an amazing number of their customers between first and
second use. I don't have my report in front of me at the moment, but I think
something like 40% of BCC users never complete their first bingo card and
never log in again. Essentially none of these people buy the software. On the
other hand, roughly 2.4% of trialers convert, or roughly 4% of users who
succeed in their first interaction with the app. Increasing my bottom line by
5% requires converting 5% more of that second group -- which, let me tell you,
is _hard freaking work_ \-- or, in the alternative, improving the first run
experience of _three_ out of every 40 users who fail to complete Task #1.

It is really hard to optimize your entire application, user experience, value
proposition, etc to get +5% conversion. However, polishing your first five
minutes until it freaking shines is not nearly as difficult. Do you present
people with a blank page currently? Spend an hour and put something on it. I
will put money on that helping. Does it take a critical mass of
friends/input/lions slain to get fun? Do the work for them. Fake it if
necessary.

And, yeah, instrument _everything_. Can I plug Mixpanel here? _plug_ Every
time I think "You know I should really build more instrumentation into my
app..." I remember "Oh, wait, it takes a twentieth of the time to just throw
it on Mixpanel -- no visualization code or complicated controls to refine the
data range required, praise be."

~~~
TheSOB88
What does instrument mean?

------
alex_c
Quite flattering - no matter what I set the sliders to, the names I recognize
are people I respect.

I think that only really says something about HN, not about my comments :)

~~~
randallsquared
I had the same result. Perhaps this is not measuring similarity after all. :)

------
rms
Aww, I was hoping to get amichail.

~~~
amichail
I think I was filtered out.

~~~
riffer
You're correct. I feel terrible. There is no question that you're the
posterchild for controversial-interesting. I will fix this.

~~~
rms
LOL! There was an actual amichail filter?

------
Zak
Does moving the karma slider to the right look for people with low karma, or
does it ignore karma? I want the option to ignore karma.

~~~
riffer
Excellent question. Middle ignores karma in the quantitative ranking. Far to
the right gets lower karma users. Users who consistently get almost all 1s
and/or negative points are separately filtered out since they generally don't
make for interesting recommendations. Users who sometimes have negative
comment scores but also make lots of 2+ comments are included (these users are
rare but are arguably the most interesting).

------
apu
Based on what other people here are reporting, I wonder if there's a bias
towards matches with a large number of comments (e.g., pg, patio11, edw519,
etc.). Perhaps there's some normalization needed?

~~~
silentbicycle
There's a bias towards people on the leaderboard, though one of the knobs
adjusts it.

~~~
mahmud
I was happy to see you in my list, silentbicyle :-)

~~~
silentbicycle
:)

------
silentbicycle
A few weeks ago, I tried searching through old threads to find where PG had an
archive of old comments and server stats. Does anybody have link(s)? Thanks in
advance. (This post is meta enough that it's probably as good a time as any.)

I'm guessing either such a corpus was used here, or it's based on a cache of
recent comments.

------
chaosmachine
Based on the default settings, I'm up there with PG and Patio11. I like your
algorithm ;)

~~~
run4yourlives
I'm up there with patio11, but not pg. (I get nostradeamous)

Not sure what that means though.

~~~
randallsquared
Interestingly, nostrademons and pg are the top two for me, so I wonder what
would produce one but not the other. I'd love to see if comments could be
separated into [user1]-like comments and [user2]-like comments.

~~~
riffer
Yeah, one potential solution is lists of users.

Another route might be clustering ('technical', 'political', etc.)

------
brk
I got quite a few people I respect (mixmax, edw519, mattmaroon) and one I
consider a personal friend, dennykmiu.

Neat little utility, thanks.

~~~
mixmax
hey thanks!

------
dolinsky
Am I missing something or does this not work for the majority of us who are
casual commentators on here?

~~~
silentbicycle
It probably needs more data points.

~~~
riffer
Yeah, there's really two issues here.

Most of the techniques for this sort of thing don't work that well for sparse
datasets. So given a choice between showing bad results for users with
relatively few comments, and not showing any results ... especially when users
can search not just for themselves but also for folks they know and enjoy ...
Also, scraping the full history of HN is not cool.

~~~
silentbicycle
A while ago, pg posted an archive of HN comments, specifically so that nobody
would have to scrape them. (It was coupled with comments on how the arc server
was holding up, IIRC.) I haven't had any luck searching for it, though - there
are too many discussions about the ethics of scraping comments, archive file
formats, and the like.

~~~
slackenerny
I remember that too. It was, as you say, a while ago, meaning long time
outdated. And I indeed recall it to be pg-posted, but instead all I can find
is this non-pg release with all links broken:
<http://news.ycombinator.com/item?id=173045>. I looked for in all links to tar
and zip archives, could've missed something.

~~~
silentbicycle
Yeah, they're 404. Thanks for looking!

------
edw519
I love it! My results were very encouraging:

    
    
      S W G W                      einstein
    
                                   newton
      +------------------------+ 
      | edw519                 |   liebniz
      +------------------------+ 
                                   turing
      ===========||============= 
                                   carnegie
      Co-Commenting..SEMANTICS.. 
      Word Choice                  tesla
    
                                   godel
      =====||===================
                                   escher
      Leaderboard.....KARMA.....
      Diamonds in the Rough.....   bach
    
                                   edison
      This site has no
      affiliation with Hacker      galois
      News or Y Combinator
    	                       patio11

~~~
bugs
I think your leaderboard dial was set to 11.

~~~
csuper
Yeah - he should just make 10 louder.

~~~
josefresco
I don't understand why ... his goes to 11

~~~
brettnak
For $2,000 I'll build you one that goes to 12.

------
mixmax
Interesting. After the obligatory vanity search I tried searching for people I
know have different (more technical, less startup/strategy(marketing) taste
than me. And it seems to work quite well. I have nothing in common with those
guys :-)

For instance the top pick for tptacek is cperciva, which seems natural.
Doesn't work the other way around though, so there's still some work to be
done...

~~~
riffer
Thanks for the feedback, you bring up a very interesting point: reciprocity.
If I am closer to you than anyone else, does it follow that you are closer to
me than anybody else? I'm not sure the right answer to that is yes ...

~~~
sparky
Definitely not in the general case of points in n-dimensional Cartesian space.
For instance, in the 1-d case, consider points at 0, 10, and 12. 10 is the
closest point to 0, but not vice-versa.

Put another way, it seems as though the algorithm considers tptacek more
distinct (from all other HN users) than cperciva.

~~~
Xichekolas
I was wondering the same thing. I don't show up in the lists of any of the
people that show up in my list.

I take it that means I am the cheese.

------
SapphireSun
Hmm, small bug(?) When I typed my name in all lowercase it didn't come up.
That might be by design though. Cool tool ;-)

~~~
riffer
That's a good point. Let me see what I can do. The catch is that HN is case
sensitive as well.

works: <http://news.ycombinator.com/user?id=SapphireSun>

doesn't work: <http://news.ycombinator.com/user?id=sapphiresun>

------
tokenadult
I like my company. I don't know if they feel the same way about me.

One newer participant whose posts make me think "I wish I had posted that"
doesn't show up on my list of associated participants. But I show up on his.
Maybe that is because of the karma setting in the default operation of the
search. Interesting.

~~~
riffer
If you don't mind telling me the user you're thinking of, I can take a look as
well.

------
riffer
Thanks for the positive response.

If three things were to get added to this, what should they be?

~~~
ratsbane
1) If I understand correctly the first slider ranks greater similarity in
commenting on the same threads the farther left you go and similar word choice
the farther right you go. Why not make this two separate sliders?

2) For each of the matches could you show some of the data used to compute the
matches... e.g., for semantic matches, show the top X (maybe five?) common
words or phrases you matched on. For threads, show the parent or OP of the
five most recent threads... something like that.

3) Really neat. I like this. It's quick too. How did you do it? Did you
replicate the entire HN database? Third suggestion: post the source or just
post an explanation of how it works.

~~~
ratsbane
... and 4: apply the same thing to twitter (!)

~~~
riffer
Yeah, that's one of things to consider. All of the logic for this was
originally developed for another purpose, and it got applied to HN for fun.
The human connection element makes it very cool, and so do the possiblities
for applying it to so many different applications.

------
jacquesm
I think there is still a subtle bug in there somewhere.

When I set the sliders 'semantics' all the way to the right ('word choice')
and leaderboard all the way to the left, then check I get this:

    
    
      - patio11
        - mahmud
        - nostrademons
        - tptacek
    
      - mahmud
        - edw519
        - swelljoe
        - davidw
    

Shouldn't the relationships be symmetrical, so 'edw519' would get 'mahmud' as
the first match and 'mahmud' would get 'edw519' ?

edit: also, your 'match' is case sensitive, so 'riderofgiraffes' won't work
but 'RiderOfGiraffes' does.

~~~
astine
It filters on karma as well.

All of those guys showup in my default results, but I can't get any of them to
pick me up. Clearly I don't post enough.

~~~
jacquesm
If that were the case then I would show up, and I don't either.

I've received some email from David (the guy that built it), he's going to fix
this and the lowercase issue as soon as things quiet down a bit.

------
ramidarigaz
Really fun app! Great for boosting one's ego.

Slight note about the page formatting: My screen resolution is 800x480, and
the text by the sliders wraps in a very confusing manner.

It looks like this to me

    
    
      ===============||===================
      co-commenting......SEMANTICS.....word
      choice
    

A little confusing at first, until I realized that it was wrapping. It's the
same for the other slider.

Great app though!

------
matt1
Very cool.

My matches using the default settings: pg, swombat, mahmud, wheels, SwellJoe,
edw519, mattmaroon, gojomo, davidw, mixmax, unalone, tptacek

Did you get permission to scrape the data? (I tried once without asking with
mediocre results: [http://www.mattmazur.com/2008/08/the-wrong-way-to-get-
notice...](http://www.mattmazur.com/2008/08/the-wrong-way-to-get-noticed-by-
yc/))

~~~
jacquesm
As someone in the comments on your pages also notes, you could use the google
cache. But it would still be nicer to ask first.

------
ax0n
Awesome! I knew SwellJoe and I seemed to end up in the same threads, and IIRC,
agree on things. Even though I'm a relative neophyte. I love it!

------
NathanKP
Very interesting. Depending on how I adjusted it I was judged similar to pg,
unalone, and a couple other people on the leaderboard. I guess that means my
comments are on the right track....

Are there any other details about the algorithm or how it works. I'm curious
about what exactly the different weightings mean.

~~~
laktek
Same story here..Is pg is added to all users by default?

------
jcl
How is "word choice" similarity calculated? If you have a high similarity with
someone, does that mean your range of words is the same (perhaps because you
write on the same topics) or that your word frequency is the same (because you
have similar patterns of speech)?

Are quoted sections filtered out? URLs?

------
alanthonyc
Cool. Now we can play six degrees of hn.

------
andrewljohnson
My girlfriend and I have almost the exact same lists... smokey_the_bear and
andrewljohnson.

------
10ren
So, some great users are on your similarity list... but are you on their
similarity list?

~~~
trunnell
For exactly zero of the twelve users listed, no.

------
rinich
I approve of all the people I've been grouped with, except that bizarre
swombat fellow.

------
adrianwaj
How does colins_pride compare with riffer? Aren't they the same person?!?

~~~
riffer
This is a cool question. Two things: the first is that I stopped using the
colins_pride account and subsequently started using the riffer account, so the
co-commenting commonality is not particularly high. The terminology score is
substantially closer. The other thing is that because the dataset is tilted
towards the recent, the colins_pride user is only lightly represented.

------
tlrobinson
Interesting both of my co-founders (tolmasky and boucher) showed up on most of
my lists. I guess that means it works, since we tend to talk about similar
things.

------
Vivtek
Apparently tumult is most similar to me no matter what I select, so clearly I
can just stop posting entirely. (Thanks, tumult - more time in the day!)

------
rdtsc
Interesting, similar to:

    
    
        edw519
    
        btilly
    
        patio11
    
        pg
    
        tptacek

------
MikeTLive
websense doesnt like you.

Security risk blocked for your protection Reason: This Websense category is
filtered: Potentially Damaging Content. URL:
<http://www.swimwithoutgettingwet.com/hnusers/>

------
mgrouchy
hrm, I seem to have gotten tptacek as my top result, if the results are so
ordered.

Id say thats a good thing. In general the whole list is people I would happen
to be even somewhat similar to.

------
auston
[http://www.swimwithoutgettingwet.com/hnusers/?user=auston...](http://www.swimwithoutgettingwet.com/hnusers/?user=auston&btn=Search+for+this+HN+User&weight1=0.9&weight2=-0.2)

------
tibbon
It didn't return any results when I tried from the iPhone

------
Sukotto
I don't show. Guess I'm not "one of us" yet :-/

~~~
PebblesRox
Did you capitalize your name? It seems to be case-sensitive...
[http://www.swimwithoutgettingwet.com/hnusers/?user=Sukotto&#...</a>

------
peregrine
I got a great list. Neat.

------
pclark
got swombat. lame. :)

------
whalesalad
pg was number one for me :D

