This is an effect of the normalization algorithm, which biases towards trying to have a post from as may different subreddits as possible on the front page.
Edit: At least we're consistent! Ketralnis said the same thing about an hour ago. :) https://news.ycombinator.com/item?id=8568021
Code is the theory (bottom to top) and his research (which is great) is a top to bottom analysis
>r/funny posts simply never appear on the bottom half of page one or most of page two
Note the "never" part....not just rarely...never. Well within this 1.2 million point dataset anyway.
So the alpha-funny link makes it, and the beta-funny link has to sit on page 3, suppressed, unable to take its rightful spot... until suddenly alpha-funny is too old, and is kicked off the front page, at which point beta-funny immediately ascends to take its place.
On tangential note, though, I was a bit surprised by the hypothesis. 'Meritocracy' to begin with is a dubious fiction, but especially so when mapped on to the vote distribution of a given post in a semi-decentralized human-moderated environment that is constantly being gamed. "Merit" just seems like a big value judgement over such a noisy channel.
You can customize ggplot2 for even more beautiful charts, such as the ones used in my analysis of HN comments: http://minimaxir.com/2014/10/hn-comments-about-comments/
Note that reproducing this analysis for the HN front page is harder since there is a lot that happens behind the scenes that cannot be generalized. Here's an attempt at performing such an analysis (note dang's comments): https://news.ycombinator.com/item?id=8533757
- They tried too hard to do a Tuft* - but it's actually way too flashy.
- Basically the issue is that it's not generic enough. Whenever you see a ggplot2 plot it's always screaming at you "I WAS MADE IN ggplot2!!" Just KISS...
- The grey background is completely unusable if you want to print out your charts!
- The default pastel colors (ie. PowerPoint 2012) are always a disaster for readability. In meetings people constantly can't tell them apart and they are definitely not colorblind friendly.
It' a shame b/c each script I write ends up having to have an obnoxiously long theme declaration to make it look "normal"
* I think most people miss the Tuft's point. He presents a way about thinking about data presentation. Instead people just look at it and think "oooo! That looks pretty. Let me copy what he did". The guy has his own personal style - it's definitely nice, but the point isn't to copy it.
EDIT: For those thinking about learning ggplot2.. maybe wait off for a bit. It seems like it'll be deprecated soon and replaces with ggvis.
I would recommend using theme_bw(), which helps solve some of the problem with the gray background.
EDIT: This is ggvis: http://blog.rstudio.org/2014/06/23/introducing-ggvis/
This is news to me, so I'll look into it to see how it differs from ggplot2.
EDIT 2: It seems more like that the difference is that ggvis is more for interactive charts, but then it requires a dependency on Shiny, which is not optimal for blog posts.
I'm definitely not an R guru or "in" on the latest news, but it seems that Hadley Wickham (who probably single handedly is the reason R is still relevant) now works for the RStudio guys and he's reworking his tools. plyr is now dplyr and gpplot2 is now ggvis. And there also another tool called tinyr. My understanding is that they're still in development, but they'll ultimately provide an "integrated" ecosystem for processing data.frames (with hooks into the RStudio IDE)
He talks about it at the beginning of this video
AFAIK there are no filters to bind the maximal post of a domain in the front page, but some domains have automatic penalties.
HN no longer functions entirely off of the "reddit algorithm." I know that it's now curated to some extent.
"Merit" just seems like a big value judgement over such a noisy channel.
I think the takeaway from such investigations -- and all of human history -- is that the system will be gamed. To survive, systems must be robust in the face of corruption and manipulation.
It was before, too. We just explain it more now.
Every submission made there went to the frontpage. What are the chances? I mean, it's hard to believe that mainstream Redditors at this point are so interested in Reddit news that they keep sending every Reddit announcement to the top. What's more likely, to me at least, is that Reddit is either leveraging its knowledge of how things work to get to the top in a surefire way, or it's plain messing with the upvote numbers. I would love to see data on these submissions, if possible. Of course, there's good reason for them to actually do this -- they need to advertise the Reddit marketplace stuff and so on, it's how they make money.
But the normalisation algorithm https://github.com/reddit/reddit/blob/master/r2/r2/lib/norma... does prefer to get one at least of each subreddit. This means that if /r/blog has a post it will probably be placed on the front page immediately. That action gets it a lot of votes just because it's very visible.
AHA! Let's just cancel out the double negative ("no" and "n't") and look what we uncover, in this former reddit admin's very own words: "the votes are manipulated"
> That's how english works. That's a double negative.
> That's a double positive.
Back when there were only 25 default subreddits and multireddits had just been launched, I tried to create a default multireddit. If I just included the top 25, the rankings would match the logged out view perfectly. But if I added /r/blog and /r/announcements they would appear in different places. I think they usually ended up at a slightly lower rank, but they also stuck around in the top 50 much longer, with week old posts often being inserted into the middle of the rankings.
But if your front page is 25 links long (the default) and has 25 subreddits, I imagine adding 2 more means that you need to show 27 subreddits in 25 links, so you hit some edge-case there where things need to be laid out again.
The way this probably works is that we pick 25 subreddits at random (and cache that random choice for a while), some of which may have no qualifying submissions. So you may end up with 2 /r/funny posts, even though you had 27 subreddits for 25 links.
Or at least that's how it used to work. I know this bit has changed but I don't know what the new way is
Could you then explain the, pretty much on the dot, massive score changes every 2 hours? That was even more noticeable when the downvotes were shown - although they were 'fake' the final score, which is according to the FAQ correct, was changed.
I noticed this when I created a simple graphing script, you can see example graph of what I'm talking about here: https://github.com/Nikola-K/reddit-thread-graph#example
You can see that on any popular thread until the score is "normalized" around 3k points.
"Mainstream Redditors" don't decide what is on the homepage because they don't vote. Only about 20% of redditors vote, and only about 20% of voters comment .
If you still find it hard to believe, just read the 900+ comments written in less than a day on their most recent post, which announced nothing and was just about how it's nice to be nice to people.
Are you familiar with the 90/9/1 rule of thumb? That for any site with user-generated content (message boards, social networks, etc.), only 10% of people contribute to the discussion and only 10% of those people start new conversations?
At first, I read your comment about "20% of 20%" and thought "wow, that would be 80/16/4, that's incredible".
But this adds voting into the middle of the equation. So perhaps the 80/20 rule continues, and we can estimate that 20% of voters comment, and 20% of commenters post original content. That would make it more like 80 (reading only)/16 (voting)/3.2 (commenting)/0.8 (posting original content). Which might be considered (with rounding) 80/19/1. The original content contribution is roughly the same, but the voting system seems to increasing the middle-ground of engagement.
Double sign-ups by adding a voting feature?
That doesn't really ring true. If you make a post with a question in the title, you'll often get 5+ answers before a single person votes. Even downvotes.
Maybe it's more accurate to say "Of those people who don't hang out in the new submissions queue, only about 20% of the voters comment."
But, really, Reddit could do anything it wants internally, and we'd have no idea. Only they have access to the stats. They already compress the votes so that it only looks like there are ~10,000 votes maximum, when in fact hundreds of thousands of people have probably voted on certain posts.
Again I'm out of the loop by several years here. But to my knowledge, no that's not true. There are just fewer voters than you think there are.
https://www.reddit.com/comments/z1c9z Obama's AMA. According to the sidebar, (upvotes - downvotes) = 14,700, with upvotes/(upvotes+downvotes) = 94%. Reddit recently changed their algorithms so that the 94% figure is very accurate. That means 14,700/0.94 ~= 15,600 people voted, according to the sidebar.
Here are Reddit's traffic stats for the Obama AMA: https://www.reddit.com/r/IAmA/comments/z3msa/traffic_stats_f...
The AMA brought in an extra two million uniques over two days. Also, it was one of the most legendary and historic events thus far on the internet, because it was the first time a sitting US president directly engaged with the public on a social media website.
Reddit has millions of subscribers. They can't really obfuscate the subscriber counts, because the subscriber count for each subreddit is visible every day. If it suddenly slowed down, people would notice. And the info needs to be available to moderators in order to manage their subreddit. Therefore, when /r/funny says it has 7 million subscribers, we can be reasonably certain there are at least 7 million Reddit accounts, many of which are active.
So, assuming there are only about a million active Redditors (there are probably more), and that a large number of those Redditors visit http://www.reddit.com/r/all on a regular basis, and considering that the Obama AMA was one of the most significant events in Reddit history (and indeed all internet history), and considering that over three million people viewed the AMA, it seems hard to believe that only 15,000 people voted on it.
It's possible. Statistics are one of the most counter-intuitive fields. Behaviors emerge at scale which aren't seen in the initial stages. Maybe it's possible people saw that 5,000 people had already upvoted the AMA, and so were less likely to upvote it themselves. But given the nature of politics and the significant historical status of the event, could it be true that the fate of the AMA was influenced by just a few thousand people?
Another observation: Reddit has had steady upward growth, but I remember that as of a few years ago the upvote counts were regularly reaching 3k. That number hasn't gone up too much in the meantime: https://www.reddit.com/r/all/
... but Reddit's traffic seems to have doubled since the start of 2013: http://www.google.com/trends/explore#q=reddit
If the number of voters could be calculated as a simple percentage of total active Reddit accounts, and Reddit's traffic has doubled, then why haven't the vote counts in /r/all also doubled?
That said, I'm not entirely convinced of my position. Reality is weird, and I'm often wrong. All I'm saying is that if it's true that only 15,000 people voted on Obama's AMA, and that a submission on /r/all regularly receives only ~5k votes out of a million active redditors who see it, then I'm surprised.
EDIT: The more I think about it, the more I think I'm mistaken. If it were true that Reddit compresses the vote counters for submissions, then they'd also have to do it for comments, because otherwise the top comment would regularly display a count that's way higher than the submission. They'd also have to compress the added comment karma, etc. This is where it moves from "plausible" to "the simplest explanation is that very few people vote."
The people who are joining reddit now are more lurkers than participants, so as the site grows, the percent of people who participate (vote and comment) gets smaller.
Personally, it makes me sad.
As mentioned in the article, the algorithm for selecting which posts appear on the homepage seems to allocate slots for each subreddit. If one subreddit has more people voting than another subreddit, they can both appear alongside each other, rather than the larger one always filling the homepage.
Since the only posts made to /r/blog are infrequent, they will always get the slot for the subreddit and will likely always appear on the front page or second page even with few votes. Then, because they are visible to many users, they can attract more votes.
Is this really a surprise? If just went by upvotes alone, a sub with 1M subscribers will always dominate a sub with 500k. You need to factor in the that context.
Front-page weighting: https://github.com/reddit/reddit/blob/master/r2/r2/lib/norma...
For example, assume there is a rule that a given subreddit can have no more than N posts in the top 50 at a given time. It seems like this alone would explain the clustering shown in the article. Super-popular subreddits like /r/funny would rarely have posts on page 2, simply because they usually already have N posts on page 1. Thus they drop off sharply in likelihood to appear in the 40s, then shoot back up after #50 when the limiting stops.
Meanwhile clusters 2 and 3 appear to be the subreddits which rarely and often (respectively) reach the top 50, but only due to the limiting rule. Cluster 2 is the least popular in the unlimited spots past #50, so it makes sense that it usually reaches the lowest of the limited spots, while cluster 3 (apparently medium in overall popularity) takes the middle region.
Naturally I'm just squinting at it, but it looks like the article's findings could easily occur without Reddit treating some subreddits differently from others (as I take the author to imply it might, given the title). Am I missing something?
* I am not a quant :P
a quantitative analyst.
ORIGIN 1970s: abbreviation.
It seems like you wouldn't get much value out of reddit if you just view the front page without logging in?
See also: buzzfeed, 9gag, imgur, funnyjunk, etc.
no cat pictures, 2 gifs of dogs being cute
I check r/all because it's the breadth focused zeitgeist of the internet. Whether the comments are right or wrong doesn't matter. Once you can fix a point on popular sentiment you can interpret how it will ripple through more "prestigious" forums.
A lot has changed.
Also, there's /r/all, which tends to closely mirror the front-page since non-default subreddits rarely climb in /r/all
"In other words, assuming you’re not logged in. I don’t have any supporting info, but I’d imagine that a large chunk of reddit’s traffic comes from logged out users."
In fact, I'd rather see a variety of posts from subreddits I don't usually go to or usually follow. I want to see thing I don't already follow. Most posts I end up liking are ones I find in the subreddits I visit. I'd like my front page to give me posts from other unsubscribed subreddits so that I may end up expanding which subreddits I'm subscribed to.
There is a little line of text on the top of the reddit front page called "trending subreddits"...I've found a bunch of cool stuff that way http://imgur.com/okDu5AN
But of course, the best way to find new subreddits is to read the first 5 or so comments on a weird gif or picture. Someone will link to either subreddit that it came from, or a subreddit where it could have been posted.
EDIT: I misread the article's argument; there's a lot of luck on HN, but I cannot confirm there's a systemic bias toward people/topic.
Reddit exposes you to a huge audience, who then in turn comment, link to, and debate your post. Reddit's memory is short however, and within hours your post will be gone. However the reverberating effects benefit you in many ways and almost guarantee traffic for years if the topic is "evergreen".
Also, love the use of R. R is beautiful.
That is, while popularity is correlated with quality, the two are rarely considered identical, and one is often a poor heuristic for the other.
The balance trying to be achieved can most simply be described as known good content vs. discovery. I wouldn't call it uneven it's more like; this is interesting vs we might think you'll find this interesting but we're taking a gamble because it has low visibility. I'm betting subreddits can move from cluster to cluster over time as well fairly frequently. Maybe an interesting thing to try to track over the next month or 2?
At the end of the day, I'm just there to browse and not jump through a bunch of hoops just to filter out junk that only appeals to the under 18 crowd.
(meritocracy was between commas). Clearly votes are not what people think when you speak about merits, but let it be.
Whether it stays there is another story.
The default front page is just the landing page for newcomers to get a first impression and a starting point for personalization.
An analysis of and comparison with /r/all would have been way more interesting.
There are many great sub-reddits of course but it's not a place for politicial discussions (in fact I am still looking for a good place to have political discussions)
Edit: Why does my personal experience get downvoted?
I guess I would characterize it more as libertarianism strongly centered on the perspectives of middle class, white men.
Edit: I guess "I don't really buy that" was a poor choice of words. I don't think you're lying to me. But your experience was very different from my experience. I stopped using reddit because I felt like all the big subreddits were too socially conservative!
Pro-military? How? I haven't visited it in a while, but a common sentiment on AskReddit was a total disillusionment with the chauvinistic "support the troops" mentality and a belief that military service does not make one righteous in of itself.
As for the military, I guess that one is more debatable. I agree that the Bush-era "support the troops" jingoism isn't around much, but I used to see a lot posts where it's implicit. Things like hugely popular photos of special ops soldiers followed by adoring comments of how "badass" they are.
Enough kids who grew up playing Call of Duty has shifted reddit into being more pro-military and pro-weapons. There is also a steady stream of "coming home" photos and videos with kids and pets.
Its getting to the point where the only way to defend against such tactics is to play just as dirty.
I sometimes wonder if such a place can exist. Outside of friendships, I kind of don't think it can.
Unfortunately most people these days are more interested in their initial opinion being right than in having real discourse that everyone can learn and grow from.
I think if your theory involves mass-exclusion/banning you could possibly end up with a place like this. But the conversation would actually lose an important voice at that point. The angry people are (often) angry for a reason, they're just usually pretty bad at making their point civilly.
If you look at a political debate from a game-theoretical standpoint, a debate among friends is a multi-stage game. You don't want to go nuclear right off the bat, because when the dust clears, you'll still want to be friends with your opponent. There's a continuity to the relationship. There will be a second, third, fourth,...,500th "round" to the "game."
I'm not sure if this situation is any more likely to yield productive discussions. But on average, it yields more civil, less abrasive discussions. On the downside, it can often result in echo-chamber conversations that never really get interesting.
Conversely, the worst and least civil debates I've seen have occurred on long-standing message boards or communities, wherein the members are anonymous, but they're known for their handles. These members have "known" each other for years in some cases, but they don't really know each other, and they have few qualms about unleashing the flames. Especially when they feel their reputations or credibility within the community are at stake. These users have created identities for themselves, and paradoxically, they'll often defend those proxy identities more ferociously than they'll defend their real identities.
Tl;dr: when people assume group or tribal identities, you get worse flame wars and less substance. When people assume individual identities, you get fewer flame wars, and possibly more substance.
My favorite part of this article is the graph showing how "Do you think Twelve Years a Slave should win an Oscar" breaks down by party lines.
I use to hang around Reddit back then, but I don't clearly remember. My guess is that reddit found all the personal freedom stuff appealing, while mostly ignoring of the facts about closing the Fed, eliminating minimum wages, reducing intervention... But remember that during that time (2009-2011) most of the econ talk was surrounding the bank bailouts. Ron Paul's speech was actually aligned with that of Occupy WS, since most libertarians opposed the bailouts and oppose big corporations, given that they usually lead to crony capitalism. I also have the feeling that there was less nonsense on reddit back then, but this might just be my perception.