Hacker News new | comments | show | ask | jobs | submit login
Reverse Engineering the Hacker News Ranking Algorithm (sangaline.com)
349 points by foob on Mar 14, 2017 | hide | past | web | favorite | 49 comments



Or you could just search for Paul Graham's posts[0] :)

> (= gravity* 1.8 timebase* 120 front-threshold* 1 nourl-factor* .4 lightweight-factor* .17 gag-factor* .1)

    (def frontpage-rank (s (o scorefn realscore) (o gravity gravity*))
      (* (/ (let base (- (scorefn s) 1)
              (if (> base 0) (expt base .8) base))
            (expt (/ (+ (item-age s) timebase*) 60) gravity))
         (if (no (in s!type 'story 'poll))  .8
             (blank s!url)                  nourl-factor*
             (mem 'bury s!keys)             .001
                                            (* (contro-factor s)
                                               (if (mem 'gag s!keys)
                                                    gag-factor*
                                                   (lightweight s)
                                                    lightweight-factor*
                                                   1)))))
[0]: https://news.ycombinator.com/item?id=1781417


That's almost seven years out of date. The algorithm has changed significantly since then.

Example: Flags affect a story's position, but that algorithm doesn't mention flags at all.


> Here's the code running now:

it's from 2345 days ago, which is < 7 years and I'm guessing it had a reasonable lifetime after that anyway - how come you're thinking that it's > 7 years old?

And my point wasn't so much about the actual algorithm either, just that sometimes a pinch of Google = a dollop of calculus :)


I mean, it's cool to see. I wish dang would show the latest version.

Maybe if all of us ask him nicely?


Not a chance if I'm any judge of this. For a simple reason: HN is already being gamed, no need to hand the other side a freebie.

Security by obscurity is bad practice if that's your only layer of defense but it certainly doesn't hurt as one of the layers in your onion.


From the point of view of someone gaming HN, there's nothing to be gained by knowing the ranking algorithm. It doesn't matter if you know five flags will bump you off the front page, or that mods can adjust the position of your story, because that's not something that you can use to your advantage. The only thing you'd be able to do with that info is to use it maliciously, like flag a story off the front page. But everyone already knows that you can remove a story via flags, and it's pretty easy to figure out how many flags are required.

Honestly the argument "there are bad people, we need to worry about them" is getting tired. It was cool when pg was showing off Arc to the community. It felt like we were all discovering something new, and how to build a community together.


That's not correct.

A person trying to game the front page could get all kinds of useful info from there. For instance how the flame detector and voting ring detector work and what penalties exist and when they trigger. The ranking algorithm is a lot more than a simple formula.


There was a flame detector (and presumably a voting ring detector) six years ago, but it wasn't revealed by pg showing off that algorithm. Dan could just redact whatever he's not comfortable showing.


They effectively have redacted whatever they're not comfortable showing.


reddit's is open source and available for anyone to see. Knowing the algorithm doesn't help you game it in this case -- it's knowing the spam controls that do, and those are secret.


To be fair, the ranking code is included in the article :-).


Yeh I'm not sure why this is even a thing. I thought everyone knew that algorithm at this point? Hell, I'm using the same thing on a couple projects.


From the article:

I find this to be a particularly interesting question, not because I actually care about the answer, but because it feels like the data should be able to tell us the answer... My main goal was to tease out the ranking algorithm from the data in a simple and elegant fashion. This made it a little more interesting as an endeavor and hopefully makes it a more interesting read as well.

You're right, the details of the algorithm aren't hugely interesting and are generally available. The point here was to use the data to uncover it in a somewhat novel way. Figuring things out can be fun in and of itself, even if the answers are already available.


Anyone interested in the topic of HN's ranking algorithm should look through the HN's submission archives:

https://hn.algolia.com/?query=How%20Hacker%20News%20ranking%...

For example: https://medium.com/hacking-and-gonzo/how-hacker-news-ranking...


I wrote a piece a while back around implementing your own ranking algorithm. Thought this audience might find it relevant/interesting.

https://jkchu.com/2016/02/17/designing-and-implementing-a-ra...


This is the article that was discussed in yesterday's "The stories that Hacker News removes from the front page" [1]. After speaking with @dang, it sounds like what happened with the original submission was that a moderator accidentally put "(2010)" in the title and users flagged it because they incorrectly thought it was old. He invited me to resubmit the article today to allow for real discussion and to demonstrate that what happened to the first submission was accidental.

I know that this analysis will get less attention than the one from yesterday, but I personally find it far more interesting and hope that it can stand on its own merits. I'll be around to answer any questions that might come up.

[1] - https://news.ycombinator.com/item?id=13857086


Being old is not a reason to flag a submission, at best appending the title with the year would be all that would be required.


Not by itself, but if the age suggest that the content is stale it could be. How the HN algorithm worked 7 years (assuming it has changed) isn't of that much interest, even though the analysis might be.


Flags are for content that's spam, off-topic, inappropriate. While you might claim that the content is dated, it would likely be better to post a comment providing a more recent source for the topic, not upvote the submission, and allow the community to decide for themselves what the best source of information is on the topic.


Flags are downvotes, given the lack of an actual downvote, and hence have much broader uses. What they should be used for is irrelevant to what they're actually used for.


Realize they are used as downvotes which is why using them as part of the ranking system makes no sense. Adding a downvote button also makes no sense. If you see something you do not want to see, but others do want to see it, click "hide" and move on.


That entirely depends on how you see yourself and other users as a member of the community. Do users have a way to influence what content the community has besides what they individually submit and upvote? Yes, flagging. Similarly with comments, we have more than one's own comments and upvotes, we have downvotes. Downvotes and flags both serve as useful signals to others what the community wants to discourage. Hiding / ignoring sends no signals, at least no different signals from never having seen the item at all.


This is a ridiculous position to take. The flagging operation is for things that violate rules and it severely punishes posts. It's not meant to be used as a tool for you to shove your opinions of what is worthy onto the community.

A post will die on it's own if it fails to receive upvotes. If you think something else should be on the front page, you go and upvote something else or leave a comment on the article explaining why it is crappy (comments down-weight an article). If you can't find something else that's better, then move on and stop acting like some gate keeper of worthy content.

If everyone behaved like what you're suggesting, the front page would just be a bland pile of the lowest common denominator content that displeased the fewest number of people.


The front page often is a bland pile of lowest common denominator crap. We couldn't even go a week without dumb political stories which are covered everywhere else (though charitably more because of miscommunication that it was meant to be an experiment and only a week long, still). Lots of people don't even look at it because they gave up, though I'm not exactly close to that point.

This is all beside the point that flagging is (no matter what it ought to be) an extra signal that has broader uses than merely spam/rule violating. I could also argue that "off topic", a use mentioned earlier and on the guidelines, is sufficiently broad and subjective that "something I think the HN community would be better off not discussing" fits "off topic". In any case the flagging mechanism is still there. The site does remove flagging privileges if you use it too often, so there is clearly a sense of how flags ought not to be used too often (or else you lose them) but that hardly influences why flags ought to be used.


Hey foob. The level of detail in your post is great. Any plans to research things like optimal posting times?


Thanks! That's a good question. I feel like I've seen analyses like that before a few times but I'm having trouble finding one right now. I would guess that submitting around 8 EST is probably best in terms of getting the most views because you'll catch most of the US audience during the day. The probability of making it to the front page is another question though and it would definitely require looking at the data there.

I can actually think of some other relevant metrics here that I don't think I've seen quantified before. I'll probably play around with this a bit at some point and if the results are interesting then I'll write them up. If I do though then it will most likely be a few months down the line. I've already written twice as many HN meta articles as I was planning on and need to take a little break :-).


I did a brief analysis a few years ago after shoving the HN Algolia results into Postgres. IIRC, the optimal time (for highest median score) was on a Sunday afternoon or something like that. I figured that meant that you're not competing as much for views on Sunday afternoon and will get higher-on-average points. Then you may still be around on page 1 or 2 for the Monday morning rush and get a lot more traffic.


> it sounds like what happened with the original submission was that a moderator accidentally put "(2010)" in the title and users flagged it because they incorrectly thought it was old

mhm, I'm sure that's what really happened


Which bit? The 2010 thing is precisely what happened. It was a case of sleep deprivation, which is one lesson of how trying too hard to make this place good can mess with a person.

The other bit was just my attempt to explain why users might have flagged the post. User flags were what demoted its rank, and it isn't obvious why people flagged it. There's also the issue that meta posts aren't great for HN in the first place, but those rarely lack for upvotes.


It would be nice if there was non-filtered view of HN available for users with a rep above 500, much like the "show dead" option for user comments that have been hidden. Basically allowing the submissions to be placed in the submission rank as if their ranking was not pinged due to users flagging the submission.


I like this idea, but it'd be nice to have it not based on karma (says a person that doesn't comment very often and has low karma).


Understand though I could easily see spammers using the data and getting 500 really should not take more than a month if you post a handful of comments a day; for example: 25 days, 5 comments a day, average of 4 upvotes per comment.

Worth noting with a rep of 500+ HN users are able to downvote comments.


Yeah, and I think the karma system works well overall. It isn't ideal for people that are significantly more consumer than producer of content, but I would guess that is the minority.


I feel there are a lot more consumers than you might imagine, myself included. I'm on HN constantly but I rarely ever comment. The karma system is not for everyone.


I think there are a few people with > 500 karma, more with an account but less karma, and even more without an account.

Pareto principle and all.


I'm not sure why that information would be valuable to spammers. It would only tell them that their posts are being flagged, but it wouldn't really help them get to the front page.


From the point a spammer is identify until their interaction with the system becomes a burden to deal with, it's better to not let them know it's known they a spammer. Craigslist does this by "ghosting" postings that's flagged as spam; that is the spammer sees the posting as live, maybe even gets an automated reply to see how/if they respond, but the general public does not see the posting.


I consume HN via an RSS feed, and it is an excellent experience. I don't need a hivemind to do my filtering and thinking for me.


if you don't need the filtering from the hivemind, you can read https://news.ycombinator.com/newest (I'm not sure if it has a RSS.) Warning: it's full of crap.

And you can enable the showdead flag. Warning: The worst crap is [dead] so this version has even more crap.

Anyway, there are from time to time some interesting articles that are unlucky and don't get even a vote, so after reading the front page please go to the newest page and try to find a hidden jewel there.

PS: There are a few alternative unofficial HN UI. They use alternative orders, probably without the official penalties. For example http://www.daemonology.net/hn-daily/ and https://hckrnews.com/


To be fair, "the feed" (assume you mean the new feed) is populated by the hive, it's just not sorted by votes or any other filter; of course unless the mods remove/hide a submission for some reason.


I'm using /rss - I see a lot of stuff on there that subsequently gets flagged.


It was asked by someone yesterday, but the question got lost in the noise, whether the voting ring detection extends to flagging rings (whether on posts or comments)?

It would be naive to assume that it doesn't happen...


I suspect that in part this is because having HN's actual ranking algorithm as implemented today (including the secret sauce) would not benefit HN.


But this is not just some random meta post it's about ranking algorithms, I think those are doomed to be popular whether it is HN or Reddit.


Hanlon's razor: "Never attribute to malice that which is adequately explained by stupidity"


Nice. Summarizes my longstanding argument with conspiracy theorists.


On the one hand, I'm also skeptical. On the other hand, I could see someone glancing at the date and accidentally reading "Mar 10, 2017" as Mar '10.


[innocent yet incisive comment removed due to excessive down-voting]


That's... not how the internet works. But it doesn't matter; we invited foob to repost it as a way of owning our own mistake.




Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: