Hacker News new | past | comments | ask | show | jobs | submit login
How reddit tried to solve the "new link" problem. Why HN doesn't need a new algo
198 points by jedberg on Dec 2, 2013 | hide | past | favorite | 67 comments
This morning I saw two articles on the front page about how HN should change their algorithms. I would contend that an algo change is not the right solution.

Here is what we did to try and solve the problem on reddit.

First, there is the "organic" box at the top of the page. The first link in that box is always an ad, but after that, it shows pseudo random links from the new page (more on that in a second):

https://github.com/reddit/reddit/blob/89f6f1ad9c1babbf520b83c49fa27f509bb5d0ef/r2/r2/lib/organic.py

What this does is give exposure to up and coming links to a lot of people all at once, which helps overcome the luck factor of who is looking at the new page at any given time.

The second solution is the "rising" sort on the new page:

https://github.com/reddit/reddit/blob/89f6f1ad9c1babbf520b83c49fa27f509bb5d0ef/r2/r2/lib/rising.py

The rising sort accounts for how many times the link has been shown in its ranking algo, which helps better new links rise to the top.

The organic box on the front page uses this rising ranking to choose what is in the box, and also contributes to the view counts.

So I would humbly suggest that HN should do as it has done often in the past, and copy reddit's solution here by implementing the rising sort and the organic box.




Why was my title edited?

It said "How reddit tried to solve the "new link" problem/ why HN doesn't need a new algo"

How is this new title an improvement? I would at least expect a comment here when a title is edited as to why so I can learn for next time.


Not an admin, so I can't tell you for sure. Lots of times they edit titles they feel are designed to "game" the HN audience. Posting "why hn doesn't need a new algo" if you aren't part of the HN team likely was viewed as either misrepresentation, or an attempt to game.

I don't have any direct insight just my experience in the past.


In that case, it seems odd to allow the original post title of Why Hacker News should use a different Deterministic Algorithm, Not a Random One but not jedberg's.


At least they didn't censor you when you posted the note saying they edited your content. If they wanted to they could have just deleted that too. I think any sort of censorship at all, once the word gets out, will be damaging to HackerNews, and they should stop doing it before it becomes their "reputation". Communities like this can form elsewhere. Hopefully they know they have no monopoly, and should err on the side of NON-censorship at all times.


It wasn't censorship. Only governments can censor.

HN is a private organization and has every right to do anything they want with the content.

My only request was for feedback.


> Only governments can censor.

Bit of a side issue, but that's an interesting statement, I've never heard that before. A quick google doesn't seem to back that up, so I wonder what I'm missing, or why you said that?


When Americans use the word 'censorship' they seem to explicitly refer to censorship in the context of the First Amendment to the US Constitution. By the general meaning of the word, of course anyone can censor anyone else but in America it is meant to be "censorship that violates the First Amendment".

I'm not American. It's just the best explanation I can come up with why this view is so common.


Thanks, yeah that satisfies my curiosity and seems like a reasonable explanation.


He's correct that only governments can censor.

On a forum like HN freedom of speech does not apply, because you don't have a right to post here. The first amendment only applies to the government, as it relates to your property, public property, or other private property that you've been granted the right to express through (eg the NY Times).

You don't have a right, for example, to walk into your neighbor's house and exclaim your position on communism.

For another private party to stop you from exercising your freedom of speech, eg in your yard / on your property, they'd have to initiate violence, and properly that is already illegal. It's the government's job to protect you from that force, and by doing so would be upholding your right to free speech (if they fail to do so, they're not protecting your rights, and that can be government censorship by proxy).

And in regards to private context, it also is dependent on what you've agreed to, such as in signing a contract. If you join a community, whether Hacker News or a housing development, and have agreed that you may not fly a flag in your front yard or use verbal abuse (on a forum), then that's a choice you've voluntarily signed on to.


"you don't have a right to post here"

But you may not be able to avoid it. If a forum is the popular forum, you either post, or never be heard. You have to post there to get heard. So the "private company" can choose what is heard or not. That is censorship. Not illegal, as many said, but censorship. With all the bad results.


arjie covered it well. I accidentally put on my Amero-centric hat, where censorship by the government is illegal, whereas by private organizations it is not.


I think that's in the legal context: the US government is bound to the 1st Amendment, which has usually been interpreted broadly to apply to (almost) everything the government does. So when the US government does the censoring, it's often questionably legal. However, private organizations aren't usually bound by the same rules, so there's no law against a private organization doing the same thing.

I suppose we can call it "censoring" in both cases, but usually only one of the cases is an actual violation of the law.


What would be the word for "private organization removes or edits content that they do not like"? Can't we just use the word "censor", or do we have to write half a sentence every time we want to get that idea across? Being verbose just because we don't want a word to have two or more possible meanings doesn't seem very useful.


You're right, it's perfectly acceptable. I just have an aversion to the word because people would often accuse us (reddit) of "illegal censorship", in which we'd have to point out that it is only unlawful for the government to do it -- regardless of the fact that reddit wasn't actually censoring anyone.


Right, there certainly is a difference between the two concepts, particularly legally, but I think they overlap enough that we can use the same word so long as we are careful to remain aware of the distinction. Your aversion is well-founded.


I really miss a downvote now =8-/


I think any sort of censorship at all, once the word gets out, will be damaging to HackerNews, and they should stop doing it before it becomes their "reputation"

HN censors people every day, as is their right. My account right now is under the influence of a slowbanning for what reason I have no idea.


Really no idea? Maybe it's that glowing pink target you have taped on your back that reads "300bps"? :)


As a reader, I'd say that only one title is necessary, so maybe they just took the first one. Also, sorry, but your post is pretty poorly written so it's kind of hard to figure out which of the titles better applies.


HN uses the editorial approach to content, figuring that having some people edit user-supplied content will make it more standard and enjoyable, like a newspaper.

I have some doubts about this as well.


That's great and all, but in a newsroom, when the editor changes your title, they tell you why so you can do better next time.

That is what is missing here.


> That's great and all, but in a newsroom, when the editor changes your title, they tell you why so you can do better next time.

I agree that would be nice. However, as you know from running a busy site, it's not always possible to give such feedback, or you'd drown under emails. I'm thinking of /r/redditrequest or any of the methods of reporting things like doxxing to the mods, which usually don't get any feedback for the user.

My interest is this: I think content needs to be edited for maximal reading potential. That in turn leads to the question as to whether Reddit/HN are more like links lists (MetaFilter, blogs) or more like discussion forums (phpBB, vbulletin, etc). By "more like" I mean as communities, not as software.


You know how we avoided this problem at reddit? We didn't edit the user's submissions. :)


I understand, but for devil's advocacy sake, that leads to other problems for the content reader, such as:

* Duplicates. Endless duplicates.

* Rejection of good content because the headline is bad.

* Essentially outsourcing this function to subreddit mods, who have only rejection at hand.

If someone mods a subreddit, and they see a submission that's perfect except the person misspelled "intimacy" in the title, they have two options only:

* Accept it as it is.

* Reject/delete it and get the user to submit again.

There's a bit of high overhead there, and less of a powerful experience for the reader. I think that matters less with Reddit's mostly 15-25 audience but is more important here.


That's not true at all. What newsroom did you work in that had journalists writing their own titles and expecting even a modicum of influence over the final product once it has been handed in?


I sat next to the Wired newsroom for a few years, and they certainly did it.


Doesn't history tell us that it's very common for reporters not to write their own headlines?


Getting feedback on your work is not the same as having influence over it.


A good editor will do that. It's about teaching and preparing journalists to be great.


I work in a newsroom, albeit a small one--a public radio station--and we do that.

I also have the freedom to tell my bosses "no" so I expect my newsroom is probably in the minority.

Interesting side note: We're growing in a small market (Vermont) at a time when most newsrooms are shrinking or even collapsing, especially the small markets. Sure, correlation is not causation, but I would argue that the culture plays no small part in our success.


I think the techniques Reddit uses to give exposure to deserving links are nice.

Here are just a few other ways you can fine-tune a "link recommendation" algorithm beyond just the standard "show highly rated links at the top" technique:

1) Devote a portion of prime real estate (e.g. homepage) to new or trending links, as Reddit does.

2) Give higher placement to submissions that come from someone whose previous submissions the user has upvoted.

3) Give higher placement to submissions that come from the same source as previous submissions the user has upvoted.

4) Give higher placement to submissions on which a person has commented whom the user has previously upvoted.

One way I think HN, Reddit, and other link-recommendation sites can put power into their users' hands is to allow each user to tweak the recommendations algorithm to suite their own preferences.

For instance, one user might want half their homepage to be filled with trending stories, rather than popular stories. Another user might find Technique 2 above to be useful but might not want to enable Technique 4.


Your suggestions sound great on the surface, but I suspect 2-4 would increase the echo chamber problem.

Also, it's computationally difficult to compute 2-4 in real time (reddit used to do a similar calculation a long time ago, under the now defunct recommended section).


Yeah, just throwing stuff out there.

One other technique that springs to mind:

5) Set a minimum number of views or clicks a link must get before "falling off." So, if a ton of links are submitted around the same time, sprinkle them back into the mix -- perhaps using a version of Technique 1 -- until they've hit the minimum, then let them die a happy death.


The goal of HN and reddit may be different. Reddit seems to want to be everything to absolutely everyone. I dont know what the goal of HN is- is it known?

If the goal is to draw in those who are most likely to make good founders and convince them to join ycombinator, then what you want is somewhere between echo chamber and reddit, but it is closer to echo chamber.


Would a graph help with the computationally difficult part? They seem suited to recommendation stuff. (Assuming the echo chamber thing is not/is not an insurmountable an issue)


I personally would not want #2-#4. One of the reason I like Hacker News is that it's system is not customized to my personal behavior. I like that front page looks the same to me when I'm logged in and when I'm not (as far as I'm aware). I prefer the wisdom (or madness) of the crowd to the bubble of myself.

I also think #2-#4 would result in the diversity of topics & sources on my front page to erode.


If my HN frontpage were that user-dependent, I'd want to see both. I'd read my HN, but I will definitely browse in Incognito mode just to see the 'real' HN.

One has to wonder what draws all these people to sites like HN. I don't think they know for sure. At first, the site was great without me. There were lots of interesting links without me asking for them. As it becomes more amplified, with the front page being hotly contested & measured, mechanisms getting more complicated, etc., it seems we may get what we never wanted.


>"One way I think HN, Reddit, and other link-recommendation sites can put power into their users' hands is to allow each user to tweak the recommendations algorithm to suite their own preferences." //

This.

When scores were removed this was my reaction; if you don't want scores why does that mean I can't have them?

Diverse algos for ranking would also work against gaming of the system IMO.


This issue isn't what need solving. The much larger problem is the "unknown or expired link" page.

What year is this? Why are we still accepting an implementation detail as an excuse for an awful user experience?


Yeah this is the biggest problem with HN.

Go to front page, click an article or 2 and click to go to next page and its already "expired" in minutes. For a site that caters to startups/developers its pretty embarrassing.


Yes. I hit that page almost every time I hit the 'more' link.


What I resort to doing on bursts of high HN reading (such as now, at lunch) is to command+click each link that looks mildly interesting and go through a few pages.

This site is too frustrating to use any other way, and for a community with so many engineers, I'm still shocked that so many (including myself) put up with it.


The only reason any of us put up with it is the community.

It doesn't matter how many engineers there are here, it's Paul Graham's forum and he seems unwilling to fix these sorts of issues. There have been several threads where I've seen offers to fix it.


HN still uses tables for designs. There's a mix of CSS and <font> tags. The login/register page has no styling at-all. Searching is handled by a third-party (HNSearch). Your top-bar color won't persist on some pages (ie. submit).

HN has a lot of issues but yes, by far the most annoying is the link expiration.


all the ones you listed are trivial though.


tables still > css


Well what's the solution?

Do you simply redirect all "expired link" GETs to the home page? Should we have a fixed # of 10 first pages (like some imageboards)?


The solution is to avoid capturing the state of the user on the server, passing it in the URL instead. That way nothing need expire about the link.


The solution is for someone else to create a better site that's actually maintained and start migrating people away from HN. A more neutral foundation would be better anyway, imo.


User state is current captured in a closure in the single process single threaded HN server (running a custom lisp). The solution is to not do that.


Maybe it should redirect to the New page.


I think the biggest issue is that the Hacker News admins want to have as few moving pieces as possible. It's why Hacker News doesn't have collapsable comments, why it doesn't have a mobile layout, etc. There's been some tweaks here and there, but I think the biggest change I've noticed over the past ~3 years I've been here is that they removed karma count from comments, and they made the up-vote triangle high resolution.

They seem feature-adverse, and I assume it is because A) Adding more features lead to more causes of failure, B) Front-end/back-end code additions leads to higher page file size/more computation on the backend (meaning higher costs for them to deliver content) and C) the K.I.S.S Principle


I would have to put my money on the way simpler answer D) A few users might complain, but the traffic keeps growing so why bother and E) YC is eating up all my time and is more interesting and F) The site clearly works (the traffic keeps growing) so it isn't as interesting to hack on anymore.


I would like it to have fewer moving parts. I'm not fond of 'unknown or expired link' - that link shouldn't 'move' :)


HN did have a mobile layout for a little while, but it was pretty awful https://news.ycombinator.com/item?id=6253835 and seems to have been discontinued. Maybe they just don't have anybody to make these changes properly.


How hard would it be for them to simply make the site completely responsive? That's not a huge task & still keeps things pretty simple.


I understand the endless desire to tweak, I really do, but HN simply doesn't have anywhere near the volume of posts Reddit does.

This place has the same feel as: http://www.reddit.com//r/depthhub

Most of the action is in the comments and a lot of the traffic is from lurkers. Its a slow roll in other words and frankly there is a very finite amount of good quality new posts to be had.

That is the real problem - try and rank this place more optimally by hand as an experiment, it simply won't take that long. Where is all the great content you are trying to algorithmically float?


I don't actually think HN has a problem, I was just suggesting that if I were wrong about that, here is an alternate solution.


What I think they should at least do is remove penalty for 40+ comments. I see where this is coming from, the desire to always have "quality discussion" effect by showing only those comment sections with very few comments, usually quite insightful as it usually come from people interested enough in the often obscure topic, but what it also does is kills discussion. At least they could make penalty growing gradually, not something that suddenly activates as soon as there are 40 comments.


That place is the worst of the self-congratulatory, narcissistic, Reddit hive-mind.


I'm afraid I have to stand by the analogy. ;)


Does the "rising" page on reddit actually work? On subreddits with millions of subscribers it seems to show one or two submissions, at best, and often the votes cast on those submissions don't reveal why they're considered to be rising. For example, many submissions on the rising page have scores, using the format (upvotes,downvotes), of (1,0) or (1,1)

Is this the intended behavior?


Yes. Remember, it is a function of both votes and views, so something with two votes that's only been viewed a few times will still be higher than something with more views and two votes.

It's not perfect, but it is better than just straight chronological.


The solutions presented were better and more mathematically sound. This might work, but that algorithm maximizes the exposure of deserving posts and optimizes the tradeoff between testing new posts and sticking with proven ones. This is just arbitrarily throwing new links to a top box that everyone ignores.

And reddit's solution doesn't seem to be working much better IMO, tons of posts get buried with little exposure. The rising section seems to be usually empty or just 2 or 3 totally random posts.

Reddit's algorithm is often criticized for heavily favoring quickly consumed content like images because they get vote quicker. Also easily manipulated by bots/sock-puppets.


Not sure if a rising sort is really needed. A link only needs a few points to get on the front page of HN. Based on my experience if a link gets 3 votes in the first 15-20min or 5 in an hour and it'll get some frontpage time.

I think it may almost be easier to just show a random new link from the past hour rather than doing anything fancy. I'm sure a ton of good content misses the frontpage just because of the sheer lack of visibility that links on the new page get.


I think the real problem is that most of us come to the site too often... take a stretch and go outside :)


I think just giving a small portion of the front page to new stories could help a lot.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: