And a quotation for those not wanting to click:
*ProfDrMorph* 2 points 1 year ago
So that means all posts in all subreddits (when browsing
'hot') are sorted this way:
1. all posts with more upvotes than downvotes with the
order determined by age (newer posts are preferred) and
2. all posts with the same number of up- and downvotes in
whatever order the database returns them
3. all posts with less upvotes than downvotes with the
order determined by age (older posts are preferred) and
popularity (posts with a lot more downvotes are preferred)
Because that's what the _hot() function implies if the
sorting algorithm uses it as a 'key'.
*ketralnis* 2 points 1 year ago
Yes that's accurate
It feels a little weird quoting myself, but I also said:
> The thing is, the two most important pages are the front page (or a subreddit's own hot page) and the new page. The new page is sorted by date ignoring hotness, and if something has a negative score it's not going to show up on the front/hot page anyway. The two other main opportunities to get popular (rising and the organic box) don't really use hotness either.
> So when it comes down to it, what happens below 0 is pretty moot. Smoothness around the real life dates and scores on the site is more important than smoothness around 0, where we don't really have listings that will display it anyway.
In summary, there don't exist listings in which the discontinuities at 0 really matter
What you can say is that you want posts to disappear from the hot page as soon as they go to -1, in which case I'll say that it's more than a little weird for the first voter to hold so much power.
But for the particular problem, it would be a solution, provided Reddit did not have any mechanics in place to prevent that exact thing. And it would be stupid to assume you do not.
I have no interest in doing anything like it, I browse Reddit a lot.
PS: Totally unrelated to this, but please look into the API returning a ton of HTTP 503/504 gateway timeouts. It's been happening to me across several servers in different regions of the world.
Ironically, that's probably the rate limiting blocking you. Are you hitting the API more often than once every 30 seconds?
I...what? This is just wrong. If it were not for the bug, then many posts with a negative score would show up on hot pages. Due to the bug, many posts which would otherwise show up do not. The bug is changing how things are working, and it is doing so in a way which has clear impacts on hot pages.
> In summary, there don't exist listings in which the discontinuities at 0 really matter
To the extent this is true, it is true only because there is a bug in the code that hides posts with negative points from the hot pages. What you are saying is that it doesn't matter if posts with negative points are shown on hot, because posts with negative points are not shown on hot.
...I hesitate to even ask this, but, well: Do you actually understand the bug, and the impact it has on how posts are sorted? Because your attempts at explaining it make no sense. The reason negative posts don't show up on the front page is because of this bug. That's why the bug has some (slight) importance; it is not the reason why the bug has no impact. It does have an impact.
I think you misunderstood. What I got from this, is that articles under 0 MUST NOT show up on the Hot page.
Now that could probably have been implemented as a filter, but I don't think it's a bug doing it in this weirder way.
> The new page is sorted by date ignoring hotness, and if something has a negative score it's not going to show up on the front/hot page anyway.
The key word there is "anyway". We're discussing the code that makes them not show up on the hot pages, so the word "anyway" makes no sense. If your theory were true, I would expect him to say: "This code is important because if something has a negative score then we don't want it to show up on the hot page", but he doesn't; rather he treats negative articles not showing up as law of nature. Paraphrasing, he says "This code is unimportant because it doesn't do anything, because article wouldn't have shown up anyway" But it does do something, and they would have shown up.
My theory is that the confusion comes from him saying "front/hot" page. Everything he said is true about the front page, and the front and hot pages use the same algorithm. But the same algorithm applied to different data yields very different results, and everything he said is blatantly false when discussing the hot page of small, low-traffic subreddits.
In short, I think he's trying to say "hey, negative articles won't show up on the default front page no matter what, so what are you talking about?". And the response is "yes, but it has a huge impact everywhere". (Notably: The hot page of small subreddits, as well as some customized front pages and multireddits.) You can't conflate front/hot the way he does, because the bug is ONLY shown when _hot() is called on a low-traffic source.
And the counterexample is given above: a not-very active subredddit, one new post in the last day, with a score of -1 after one vote. Are you absolutely sure that this post MUST NOT show up on the Hot page?
This is not true.
I am active in a small (local area) subreddit, and there sometimes a post totally disappears from the "hot" listing. Not only from page 1, but it cannot be found on pages 2 or 3 either.
When there is a recent post with 4 upvotes at the top of the hot list, a recent post with -1 votes still would deserve to rank higher than a week old post with a small positive vote score, don't you think?
I was really wondering, do the mods remove posts that quickly get 2 downvotes. But this bug explains my observations.
A post at -1 from yesterday, in a subreddit with only 1 post a day, is completely deserving of being on the front page of that sub (top 30) posts. If all negative ones are banished from the top... that's very bad.
That said, maybe they aren't particularly concerned about subreddits with minimal activity, because by definition not many people use them.
All it takes is one downvote to kick those things off the front. These low-traffic subs don't have enough users to have their own Knights of New constantly patrolling the /new view, so the Hot page is pretty much it.
Basically, it becomes a job of a moderator to constantly check /new to rescue anything that suffered a downvote infanticide.
And it has never occurred to the reddit developers to document the behaviour in the code base? Regardless of whether there’s a bug in the implementation, that is one poorly-written piece of code.
My guess is they want to fast bury spam. Spam in old posts will have been deleted and thus old posts are more trustworthy. Voting brigades are a smaller risk than the constant flood of spam.
Are they? This behaviour allows one vote to bury a post. Is the "downvote brigade" risk really so small that you can hand it an opportunity like this?
> Maybe there is no moral. Reddit screwed up.
...or maybe, they know what they're doing.
Maybe not. ...but when you supply a bugfix, the onus is on the submitter to demonstrate that 1) the fix fixes the problem and 2) that it doesn't break anything else.
It would appear that no effort has been made at (2), to demonstrate that the proposed change would not have an adverse affect on other high-vote rankings.
To be fair, it would have been nice to see the pull request response (https://github.com/reddit/reddit/pull/583) mention that an alternative algorithm choice would have to be demonstrably better in a large scale analysis before they would even dream of changing their core ranking algorithm, but it's not unfair for them to take that stance.
It's like asking Google to change their page rank algorithm because you don't like it.
No. The onus is on Reddit's test suite which, ostensibly, would cover voting (one of the core features/functionality of the site!) to demonstrate this. Or are you suggesting that he didn't run the full build?
Oh wait, the test suite doesnt actually guarantee bug free software? Worse, test suites dont protect you from bugs in the specs?
Well I can tell we are going to have a rough night ahead...
Pull request -> should include tests if relevant ones don't already exist.
Well then, that's running with scissors as far as any modern legit open source project is concerned. That should be the VERY FIRST thing rectified.
And one of the reasons might be that sorting doesn't work quite properly on smaller subreddits?
"I found a recent post in a fairly inactive subreddit and downvoted it, bringing its total vote score negative. Sure enough, that post not only dropped off the first page (a first page which contained month-old submissions), but it was effectively banished from the “Hot” ranking entirely. I felt bad and removed my downvote, but that post never really recovered...
While testing, I noticed a number of odd phenomena surounding Reddit’s vote scores. Scores would often fluctuate each time I refreshed the page, even on old posts in low-activity subreddits. I suspect they have something more going on, perhaps at the infrastructure level – a load balancer, perhaps, or caching issues."
This is partially due to vote fuzzing. More to the point, votes go into a queue and the removal of the downvote might not cancel out the previous action for some time.
As a result, this suggested flaw will supposedly let somebody successfully snipe puffins from the new page of a small sized birdwatching subreddit before they ever get a fair shake. I think if somebody would attempt this sort of manipulation further they would find it an ineffective strategy, there have been (probably constantly are) attempts to game Reddit before and this seems like an excellent honeypot.
Beyond the narrow set of circumstances during a very small time window the flaw disappears, yet if you try to abuse this you'll stick out like a sore thumb.
The true horror expressed in the OP is that the ordering of posts in the purgatory is not strictly logical - the post ranked 10042 should really be ranked 10041. Gasp. Twitch.
This is a very lovable brand of OCD to my eyes. :)
I'm actually involved in moderating a fairly large subreddit, and we have periodic waves of neo-Nazi posters gaming the subreddit, and they are surprisingly effective at altering the general mood. You can also see some genuinely shocking opinions as top posts on r/worldnews. These are subreddits with hundreds of thousands of daily visitors. If reddit is operating a system which can easily be gamed, it matters a lot.
In this case, with enough proxy accounts, and a modicum of programming experience, you could anonymously supress stories you don't like, with some ease. Do you not think that matters?
Any system that mimics democracy, even with active moderators, will succumb to a large enough minority of trouble makers. If they really are a marginalized group that does not represent a significant percentage of the community - even with all the tricks and manual puppet accounts and all the real world parallels - they will remain marginalized. If things turn dark that easily one sadly suspects it has more to do with the flaw in the algorithm of the people rather than the system.
As for programmatically doing what you claim, that hasn't been demonstrated. I'm pretty sure spammers have even more incentive and resources and yet the volume of spam is manageable still.
> It's perfectly plausible that something that ends up on the front page gets only a handful of upvotes in the first twenty minutes, or half an hour.
For example: this post.
The bigger danger is that it makes the whole community subconsciously downvote happy, because sometimes it's more effective to tune the site to what you want by downvoting things you don't like than by upvoting things you like.
If people are downvoting everything that doesn't fit their expectations, it creates a lot of cultural inertia.
(Eh, that's the worst impact I can come up with, and it's probably still not too big a deal.)
SORT ABS(ups - downs) ASCENDING
A much better algorithm for controversy would be:
SORT MIN(ups, downs) DESCENDING
SORT (ups + downs) / max(1, ABS(ups - downs))
Take a look at the current page to see how awful the current algorithm is:
I figure all the different options to slice and dice are fun for information junkies but most users won't bother or even know about them.
Consider a set of points in the X/Y plane. If you want to find the one closest to the origin, you don't have to find min(sqrt(x^2+y^2)), only min(x^2+y^2), which is much cheaper.
This would really help the learning process but I appreciate how time intensive it is.
Which goes to show you that things are the way they are not because that's the way things should be, but just because that's the way things are. Which is a very stupid way to run things, but that is the way our 'society' works.
1. If the material is newer and already has attracted the same amount of negative votes in shorter period than another one in longer period -- the first is worse. Push it down.
2. If people suddenly started hating something very much, that might mean the content is hot and attracts a lot of attention. So pull it up.
"thinking out of the... emm, where is my box???"
Imagine two submissions, submitted 5 seconds apart. Each receives two downvotes. seconds is larger for the newer submission, but because of a negative sign, the newer submission is actually rated lower than the older submission.
Imagine two more submissions, submitted at exactly the same time. One receives 10 downvotes, the other 5 downvotes. seconds is the same for both, sign is -1 for both, but order is higher for the -10 submission. So it actually ranks higher than the -5 submission, even though people hate it twice as much.
Its success is based on attracting an engaged audience, who participated heavily, in turn attracting a larger audience, whose participation further attracted even more people... etc.
The algorithm may have mattered very early, in the beginning, when it was first attracting people who were evaluating it for the first time. But even then, I think that the content that Reddit's staff continuously posted was a bigger factor than the algorithm.
And of course, if you would like more articles written by me and an extremely high signal-to-noise ratio (because I post so rarely...), consider subscribing: http://technotes.iangreenleaf.com. RSS is not dead, dammit.
order = log(max(abs(s), 1)) * ((s) / max(abs(s), 1))
order = log(max(abs(s), 1)) * (s / max(abs(s), 1))
ns = max(abs(s), 1)
order = log(ns) * (s / ns)
I prioritised community engagement over the communities quality of content. This turned out to be slightly more effective way of ranking content.
HN's system would rather quickly derank a post that is potentially inflammatory and keep good content on top rather than using comments as a heuristic for community involvement.
s = score(ups, downs)
order = log10(max(abs(s), 1))
The blog sketches out a corner case that maybe isn't handled well, but posts with net negative votes probably aren't "hot", and I'm pretty sure they have mechanisms in there to make sure that bad voters are at least eventually ignored.
This bit is actually misstated. Those posts all have a comparison value of 0 (assuming score is simplistic), and are not affected by the oldest-first ranking of negative submissions. The ordering here is likely insertion order, which just happens to be the same as oldest-first.
The point is to have your bots/friends downvote everything but your submission.
It works every time.
This is what computer scientists should take away from this.
In reality, Reddit is successful by a pure chance. In its initial days it was pretty much barren wasteland for fringe people. Most people had written of it as another me-too without much of a differentiation and Digg was the place to be. Then Digg screwed up and people wanted alternative and suddenly Reddit was overnight lord of link submission evolving in to discussion forums.