
Why HN Should Use Randomized Algorithms - luu
http://danluu.com/randomize-hn/
======
spez
At reddit we had two tactics for the frontpage problem.

The first thing we added was the "rising" page, which used to be reddit's
default "new" page. The rising page was a weighted new page. It was a little
hacky, but worked for a while. I see they've changed this, which I think may
have been a mistake, but I haven't thought about it in forever.

The second thing we did, which worked really well, was to have an up-and-
coming link placed at the top of the list on the frontpage. This helped those
borderline posts get more visibility. I had a reddit simulator I used to use
to test things like this. That space appears to be a subreddit search box for
me, so it seems they've moved onto another solution.

Randomness is an interesting idea, though it might be a bit difficult to fit
into the way we cached things.

~~~
highCs
_The first thing we added was the "rising" page_

I think the problem that the randomized algorithm solve is not missing good
content. It is to give the same chance for equally good content - or at least
to minimize the error due to page limit.

 _to have an up-and-coming link placed at the top of the list on the
frontpage_

Two things: 1) either the up-and-coming are a few borderline posts taken in
order and then here you'd just extended the front page with a few more posts
(ie. it changes nothing) 2) either you'd carefully choosen a heuristic for
those posts which is pretty much what is suggested here.

Anyway, thank you for sharing your experience with this problem, it's very
interesting.

An other idea would be to fix the content quality and make variable the number
of posts in the front page.

------
obituary_latte
I'd consider the title renaming posse quite random already. Some days I get to
be excited about the same article multiple times. Neat.

~~~
krapp
I think the submitted site (or article's) own title should be treated as
canonical, and a text field provided for a byline description by a submitter
(or maybe include some metadata with the domain in the header) could balance
the need to accurately reflect the content and... whatever it is that
determines changing the titles around here. I understand the reasoning against
it though, among other things the competitive nature of submissions to HN
would mean gaming the system with your own metadata would be unavoidable.

~~~
Steko
I've hypothesized a few times that: when many people submit the same url, x>1
of them are very likely to submit the actual title of the article while people
who editorialize will tend to do so uniquely. Because of this HN can rely on
an algo to update the title.

I realize there there are moderators for HN and they are confirmed in some
cases to have changed titles but I don't think this necessarily means they are
responsible for all the title updates.

~~~
ChuckMcM
Well there is some evidence that people vote on the title only without reading
the article, sometimes even commenting just on the title. So if you were
really interested in maximizing karma you would put an outrageous title on it,
get some rage-comments to force it to the front page and then the mods switch
it to the real title. Not saying this has ever happened of course but it did
strike me that blogs use link baitey titles, why not karma-baitey titles?

~~~
jonnathanson
I have no concrete proof that this is happening, but it sure feels like it
from time to time. It's not uncommon to see articles chart to the front page
with provocative headlines that, upon clicking through, bear little
resemblance to the headlines or body of the linked content.

Gut-level guesstimate, but the following titling strategies seem to work
disproportionately well:

1) Pointed, rhetorical questions (as much as we all claim to hate them)

2) "How I..." titles (usually some legitimate merit to these posts, but if one
were so inclined to game the system...)

3) Contrarian declaratives, usually about popular topics. (Hypothetical
example: "Facebook is not a social network." This will generate a lot of blind
upvotes, plus at least a few knee-jerk comments in opposition).

In fairness to HN, the content actually matters here. I can't say the same for
a lot of the subreddits I browse, where blind upvoting based on title alone is
a lot more rampant. It's pretty hard to crack the front page with lousy but
well-clickbait-titled content here, though we've all seen it done before (and
it seems to happen at least once a week).

~~~
hayksaakian
I wouldn't say it's too much harder than reddit. The key is long form content.

Legitimate commentary will take longer to flow in, and allow for knee jerk /
blind influence to last longer from original submission time.

IMO the beet would be a clickbait title for a long article that starts
contrarion, but ends on a neutral note.

Rational/logical people will take less offense b/c they're more likely to read
the neutral perspectives and others will blind vote/comment based on title and
first couple of sentences of the article.

------
jere
The 'new' page depresses the hell out of me. I read all of these articles
about the best time to post to get maximum exposure, but you know it makes
very little difference if you're going to end up with 1 vote anyway.

I looked at the 'new' page this morning while thinking about the "optimal"
time (it was around 9am EST, which is supposedly one of the best times
according to several articles). Almost every post had a single point. Since
your post starts with one point, that means most posts have received
absolutely no support whatsoever. I suppose this is inevitable. Only a small
percentage of submitted content can get on the front, but nonetheless it's
still frustrating to put effort into something and then... crickets.

~~~
DanBC
...but you voted up the good content, right?

~~~
jere
I usually do when I visit the new page, yea, but it's not a place I go to that
often.

------
acjohnson55
This is a valid approach. This is actually exactly what dithering [1] is in
signal processing. You spend a little additive noise, but gain a much more
regular distribution. The pagination is adding a quantization error of sorts
in the amount of exposure each article is getting.

[1] [http://en.wikipedia.org/wiki/Dither](http://en.wikipedia.org/wiki/Dither)

------
rkuykendall-com
Could this be used to get rid of "new" altogether?

Instead of blocking them from the homepage, just give them a very very small
chance of appearing. Might make the "Knights of New" which are considered
self-sacrificing users a thing of the past.

~~~
crazygringo
It's always going to be obvious which ones are new, because they've got 1 vote
and are on the front page.

I read HN all the time, but confess I never visit "new".

I'd definitely be a fan of including 1 random article from "new" for each
pageview of the front page -- but make it explicit. Keep it at the top, or in
the middle. It seems like that should produce a much "fairer" result.

I'd be curious if there are reasons why this wouldn't work -- why HN, reddit,
etc. keep "new" on its own page...

~~~
eksith
I always visit /new and it's quite astonishing how many great posts never make
it to the front page and instead, quietly drop off. This really has a lot to
do with what time of day they post as well.

E.G. I've posted around 3AM - 6AM EST (which is usually when I have more free
time) and it never makes it. But if I post a bit later, say around 9AM, then
it has a much higher chance of coming to the front page.

There are a great many good stories that are several pages into /new and I
think (with the exception of a handful that are already on the front page)
those are barely seen by the vast majority of HN visitors.

~~~
moron4hire
I've found a very strange case of getting better results from posting at off-
peak hours. My theory is that my post stays on the first page of "new" long
enough for enough people to vote for it. If I post at peak hours, then my post
gets pushed off the front page within an hour, meaning too few people have
seen it to get it to the front page.

------
_piyush
How ironic would it have been if this never made the first page :D

------
null_ptr
You have some parenthesis issues in "if rand(2^x == 0) x++" that my tired,
C-laden mind had problems parsing just now.

------
hawkharris
The bottom line is that posting your content to a social news website is
usually the wrong way to gain exposure.

With sites like Facebook and Reddit, you're simply shouting from a mountain
top and hoping someone listens.

To be fair, your odds of gaining exposure for a tech-type story are greater on
sites like HN and subreddits devoted to technology. In these places, you can
count on listeners sharing some of your interests.

Tweaking the ranking algorithms may improve the situation, but the fact is,
many of us have become complacent; we resort to throwing our content into
social news forums and expecting a lot of exposure instead of doing our
homework to determine who the appropriate point people are.

For example, if you're trying to pitch a new web app, maybe you should forget
about Hacker News and try strategically emailing / calling a few friends /
colleagues. When it comes to early exposure, quality can trump quantity.

~~~
hayksaakian
When the difference between shouting off a mountaintop and not doing so is 30
seconds AND the upside can be so great (thousands of page views), why NOT
submit your content?

> quality can trump quantity

Maybe in general, but submitting a link online is so damn easy that why
wouldn't you spend 30 seconds doing it?

~~~
solistice
And lottery tickets are a dollar each, and the upside can be so great
(thousands of dollars), why not buy one?

But online submissions are worse than lottery tickets, because you buy a
lottery ticket, and then check it from time to time. No, they're seductive
because there seems to be some pattern behind it, because they're so easy to
test, and before you know it, those 30 seconds have turned into 30 hours of
testing and reading into the material for a parsley 600 views.

Technically, working at Costco's and spending your earnings on AdWords would
have brought you more exposure with the added upside that those are more
qualified leads (for business sites), or on Facebook, where that money could
buy you thousands of impressions (I found some numbers putting average CPM for
a sponsored check in story at 6$ with a CTR of 3.2%, so spending your 192$
from Costco will get you ~960 clicks @ 30k impressions, if you're doing a
mediocre job).

So yeah, if it were 30 seconds it'd be a great deal. But unless you have the
self control of a buddhist monk, it's not going to be 30 seconds. Never.

~~~
hayksaakian
I'm not saying you should, ;-) but if you did a simple automation of 'submit
URL and title on creation'

This way you do it once, and since you don't have to revisit it every time it
posts, you can remove the element of self control.

------
rxl
Great suggestion. A system like this would also work well to:

1) increase the variability of information consumed by the community

2) combat group think (which is all too prevalent these days on HN)

3) provide a more level playing field for people who don't make an effort to
abuse the voting system and exceed the front page threshold

------
ryderm
It's amazing how much we have done, yet how horrible even the best commenting
systems are.

~~~
potatolicious
Mostly because we keep hitting the reset button.

People continue to complain that easy one-liner jokes consistently dominate
the top ranks in Reddit threads, but Slashdot already came up with contextual
upvotes (Funny, Insightful, etc) years ago.

Of course every time someone comes and tries to build a community we end up
reinventing online comments from square one...

~~~
minimaxir
I remember when YouTube tried contextual upvotes for their comments.

Somehow, that made things ten times worse.

------
jessaustin
The suggestion appears to be to randomly adjust the scores of articles, but
ISTM it would be more democratic to randomly adjust the scores of articles
separately for each user. There's no reason we all have to see the _same_
random posts.

------
DanBC
There are some ideas here for getting different content to the front page.

Perhaps someone, after they've made a submission, could "pay to promote" \-
paying with a lockout of their account for 48 hours, or loss of downvoting, or
somesuch, to give their post a smidgen of front page attention.

That means that the people just churning submissions at the rate of 6 a day
have to carry on with their scattergun approach, and other people who really
think they have an interesting article could give it more attention but with
some pain to themselves.

------
kineticfocus
I tend to use browser tabs as a 'to-do list'; and as a result don't often get
back to the listing to upvote it. A different setup might get better ranking
data... at least from me.

------
wtpiu
or, even better, make a "second page" of HN that's more prominent (like a link
in the nav), so that it's not as if there is a front page, and then several
other pages, but a front page AND a second page and then the rest...therefore,
articles that are in this limbo can get some extra attention, and I would
assume that, having differentiated a "second page" from the following pages,
people would be at least as likely (if not more) to checkout what's not on the
front page.

------
malandrew
Why not just try interlacing the top posts and new posts randomized?

i.e.

    
    
        1) Top #1 post
        2) Random New Post
        3) Top #2 post
        4) Random New Post
        5) Top #3 post
        6) etc.

------
Steko
Vary the number of items on the first page between 20 and 30 depending on
screen size and a random +/\- 1.

------
kumarski
(pseudo)randomized

~~~
moron4hire
poindexter

------
bsullivan01
Great idea. I think 5 random submissions should get a front page chance. It
increases variety and as the other person said, it also combats group-think.
Five out of thirty, even in the worst case scenario (not even one of them is
interesting to a person) is not enough to degrade HN.

Also something has to be done about the fact that many news stories that are
bad PR for certain companies or good about their competitors are quickly
dispatched to page 2 and 3.

