Shows stories which have hit the front page ever, in the order of their posting. If it's currently on the front page, the link is orange. If it's not, it's black. It's very interesting to watch how frequently highly upvoted and commented posts turn black, while their temporal peers remain.
Anecdotally, there appears to be trend of positive/neutral news about YC companies remaining on the front page the longest, the latest shiny technology sticks around for awhile longer than average, and pretty much any negative news disappears almost instantly.
For example, as of this instant in time, there is an article about Angular2 which remains on the front page while more highly upvoted and commented articles about laptop security, AT&T discrimination, and a Nintendo Switch CVE discussion are all gone from the front page.
While I realize I'm not entitled to explanations, some transparency would be appreciated. Maybe it could even be automatic, whenever a mod removed something forcibly from front, they could leave a comment and it'd show up on some page?
[EDIT] - After reading the article, if a mod did indeed take down the post because it discussed reverse engineering the rank algorithm, I think that's pretty naive. Security through obscurity isn't a thing, and the better response is just to make a better algorithm, not try and suppress knowledge about it.
I say this naively myself, as I've never had to maintain a ranking algorithm with these many users who depend on it (or any at all for that matter), but surely the problem isn't intractable?
Obscuring an algorithm or making it more tedious to reverse may not make it perfectly secure, but that's not the goal. It's not like actual information security, where loss of the encryption keys means your product is broken or your database is on the Internet. You're just trying to minimize the workload on humans who act as a back-up for the few posts that slip through.
If an email spam detection algorithm was public, spammers could precisely craft their content to slip through. If the heuristics for showing a CAPTCHA were public, bots could automate their requests to avoid it. If a ranking algorithm was public, people who might financially benefit from the front-page traffic could force content there through vote rings and sock puppets.
If the algorithm is secret, far fewer will be able to do so, and this small fraction of abusers can be handled by humans.
If the algorithm can be reverse engineered then trying to suppress the knowledge that is already "out there" will only create an illusion of it being a secret, and the fewer people know about it the more damage they can potentially do (i.e they more they can financially benefit from their knowledge).
It's the same as with information security - if you discover an exploitable bug then chances are someone else has already discovered it too (or can discover it any time) so making it public is one of the most sensible things you can do.
Mods (as far as I know, I'm not one) never forcibly remove things from the front page. These are almost always the result of user flagging.
This article however is about stories that disappear without receiving either.
Here's an example of a submission that does not include the [flagged] label, but which has a moderator saying it got flagged, and that the flags are the reason it's not on the front page. https://news.ycombinator.com/item?id=13741276
You use the word "suppress", but we generally can't use words stronger than that in security. We deal in abstractions that are hard to reasonably quantify, so we do so by approximations of monetary or temporal costs, and frequently both. We also try to limit absolutes. Reverse engineering exists as a discipline because 1) contrary to popular opinion, security through obscurity is valid, if incomplete and 2) there's frankly not much more that you can do in many cases other than obfuscation.
There are situations where algorithm secrecy gains you nothing defensively and is actually a strategic disadvantage, such as in encryption or hashing. But in situations where you fundamentally cannot discriminate between authorized users, such as in email (spam), search results (SEO) or, here, Hacker News (front page), you cannot rely on the strength of the algorithm to properly discriminate between users, because that's not its intended purpose. In these situations, obscurity is essentially your only remaining option.
To be fair the Hacker News moderators have more control of the ranking algorithm being reversed as it's on a remote server they control, as opposed to embedded in a client deployed to inherently untrustworthy hands. And once the information is out, it's out. But I don't agree that they have a trust or transparency imperative to keep that sort of submission on the front page. Even if the information exists, there's no reason to make it even more accessible. They can remove it and also improve the ranking algorithm.
If you want to design a general purpose web application without significantly reducing usability, functionality that is not restricted through authentication or higher levels of authorization is susceptible to reverse engineering. Being that the ranking algorithm is not client-side, there are fundamental protections we cannot bypass, but much of it is still inherently obfuscation. There is rate-limiting of course, and you have to log in, but the inherent inputs and outputs can still be somewhat flexibly assessed over reasonable timespans because there are hard usability requirements in place.
The tl;dr: algorithms which cannot be gamed because their inputs have significant quantifiable and controllable time/monetary costs do not require secrecy - these are excellent for implementing authorization. Algorithms which do not have such costs are not appropriate for authorization and, unless also paired with significant authorization constraints, require some degree of obfuscurity.
I can understand they might want to keep the ranking algorithm and anti-spam techniques secret, but stuff that are manually censored by a moderator should be indicated as such, maybe by some automatic message like "This post was removed due to [reason]".
Some websites manage to fight spam while remaining reasonably transparent (eg. StackExchange, where pretty much everything is documented - flags, closing reasons, edits, etc.).
I was really disappointed as the comment I was looking for wouldn't even show up in search.
This feels pretty transparent:
https://news.ycombinator.com/threads?id=dang
https://news.ycombinator.com/threads?id=sctb
I'd be all for tagging a generic "Downvoted by moderation" onto any article this happened to. The poster will probably still be annoyed, but I imagine most would agree with moderation most of the time. And if they like, they can contact.
What they're less likely to agree with is this happening invisibly. And if it's invisible, then how do we know whether we (as a community) agree with it or not?
PS: Side note, strongly against "toggle a non-default option" as a solution. Filter bubbles are incompatible with democracy and true, diverse community.
I've learned that it's best not to jump to conclusions based on what you think is true (however sound your analysis might be). Always ask the other side(s) for an explanation. In this case, you could have sent an email to the mods asking for an explanation. If you find their response unsatisfactory, go ahead and write a post explaining why.
> There are also a few articles that are thinly veiled affiliate spam. For example, the ShelfJoy links about books that Aaron Swartz and David Bowie loved fall into this category. They get called out in the comments for being very low-effort lists of Amazon affiliate links.
If what is being said in other threads is true, high karma accounts clicking flag could push a post off the front page without showing the "flagged" tag, then that could be an explanation instead. Which would then mean that maybe these high karma users have too much power.
- This is a bug / happens randomly; you just noticed it because you were looking (i.e. as you analyse this data); all the posts it's happened to before and since went unnoticed. That's supported by the evidence of your analysis; most of the results don't look any different to other posts.
- It's not the link, but the related activity. Presumably if you're running analysis on HN data, there are a lot of HN requests coming from your machine. Maybe any posts made by your IP are therefore treated as suspect (i.e. the sort of protection you'd expect to avoid automated posting or upvoting... just without that extra sophistication). Perhaps the other posters had something similar... Would be good to see if any of those posts were by the same author; as that may add weight to this theory.
- Other variables... Maybe the algorithm has rules which cause this behaviour under some conditions; e.g. posts made the previous day (not 24 hours ago; but rather before midnight UTC / something like that) lose weight when midnight hits; so posts made moments before suddenly lose enough score to knock them off the top spot; whilst those which had more score before midnight, or were posted just after survive... Many other possibilities such as this may exist; and we'd only know by looking at those variables in the data... What else is common about the posts which are in your post's club vs those which aren't?
You're definitely right about there being miscellanious rules in there. Something that I mentioned in passing in the article is that many stories exhibit a significant drop in position once they're 15 hours old. If you look closely at the typical story trajectories you can also see various other jumps of about 10-30 positions which I would guess are triggered by these various rules.
The stories listed in the article exhibit very different behavior where they jump hundreds of positions instantaneously. It's absolutely possible that this is triggered by some automatic mechanism but if that's the case then there's an enormous amount of signifance being assigned to the corresponding rules. If there's some random component to the ranking then I highly doubt that it would be responsible for jumps of this magnitude.
I try to emphasize in the article that I do think it's possible that there's a hidden flagging threshold that's responsible and that the data can't tell us with certainty whether or not that's the case. I just personally find it unlikely that that's what happened for all of these stories. If you ran a site like Hacker News then would you put an admin link next to each post that pushes it off of the front page? I know that I would.
[1] - https://github.com/HackerNews/API
I thought that the point of HN was auto-moderation? Perhaps now that HN has seen great increases in popularity, the quality of content has to be more carefully controlled, lest the quality of posts on the HN front page slowly enter a death spiral towards that of reddit.
“Think of how stupid the average person is, and realize half of them are stupider than that.” ― George Carlin
The author's meaning of "remove from the frontpage" means abruptly disappearing from the front page and you still don't see it in the subsequent 6 or more pages.
If you take a look at the comments, it's theorized there that the story got pulled not because of moderator action, but because people abused the flagging mechanism. Given the content, and given the principal person under discussion, this seems pretty likely to me.
The tug of war between upvotes and flags, as commented by dang, seems a strong sign that the article is very much political in nature. The question is if it also gratifies one's intellectual curiosity, but personally, it did not do that for me.
> It is the usual tug of war between upvotes and flags
Indeed, HN recently allowed a post that advocated gaming the system because it encouraged debate: https://news.ycombinator.com/item?id=13676362
A conspiracy theory, even backed by data, is not the best application of Occam's Razor.
So some articles might simply disappear because the OP asked too many friends for upvotes or because of false positives.
See dang's comment in this thread: https://news.ycombinator.com/item?id=13741276
I was hoping that HN's flag system would sufficient for the community to self censor. Perhaps we need a flag on comments too, something you can't see. I would also prefer that posts don't transition through the lighter grays as they get down voted but disappear completely once the dead threshold is met. That would prevent some piling on that does happen
Oh yes. https://news.ycombinator.com/item?id=13749685
I think there's a small karma threshold for flagging - something like 30 or 50 karma.
I've collected some of these anomalies. Peruse them and analyze them in this album:
https://imgur.com/a/6OvnE
Maybe OP can find a pattern in these.
is : http://sangaline.com/post/reverse-engineering-the-hacker-new...
Edit: to be clear, with common pattern I mean the topics of the submission (obviously they have one common pattern, which is dropping out of the front page quickly). They do not reveal some secret agenda moderators might follow or something like that.
I would say that 1/50 front page stories being buried is particularly common.
If something popular (and surely by definition of being on the front page it is) is suddenly removed, people are bound to be interested in the reason why? Was the source discredited? Was it just a copyright issue? A simple filter for spiked stories would be good, just with a note on the reason why.
Of course HN don't have to implement this, but it would be of benefit to the community.
HN doesn't have to be transparent, it's just a site with it's own agenda (by that I don't mean evil agenda, but it is there for a reason) but if you want to grow the community, I think clearly identifying why things were removed is a reasonable thing to ask. If every one it marked "flagged by users" I'd worry that there is no manual intervention.
This was during the time of the election so I was thinking along the lines of political astroturfing, but also to guard against companies unfairly promoting their products or suppressing posts related to a rival company. For instance, if someone really wanted to keep a discussion off HN, all it would take is to tangentially start a flame war over some sensitive issue and watch the ranking algorithm punish the ensuing vitriol.
I don't think being "bored" by something is generally considered a good reason to flag content. I don't see how it could possibly be considered off-topic either.
I origionally posted with the title "For a moment, I thought bing was down" or something (I don't remember the origional title). The title was later changed to:
Title: "Bing doesn't support SSL"
https://news.ycombinator.com/item?id=5576041
Later, the story was was removed entirely after I wrote the following comment:
"
Actually, it's been like this a really long time. I just noticed, that HN stories which have nondescript titles fare better, so I decided to conduct a little experiment. 1st spot on the front page seems to confirm my hypothesis.
"
https://news.ycombinator.com/item?id=5576342
I certainly understand why the mods removed the "story", but at the same time, I felt that the discussion of the "non-descript title bias" would have been an interesting one to have.
I asked this before and a mod said I should ask again via mail, but never got a response from hn@ycombinator.com.
Probably the mods don't want to disclose the complete criteria, because it may change constantly without warning. Try to send again an email again, but I guess you will get in the reply only a general idea of the system.
If you see something horribly misclassified, try sending an email to the mods.
It would seem to me that if you're looking to grind your political axe, this is not the best place to do so.
I disagree. Just a few flags can cause a story to drop off the front page.
There are way to many toxic users, trolls, shills, astro-turfers, voting rings, paid advertising, political organisations, disinformation campaigns, and other 'special interest' parties on the Net to be able to do without strong moderation.
If there was a middle ground it probably would be a section where you can specifically view threads that were removed from view.
Silent curation and other practices like shadow-bannning are unethical and symptomatic of a mentality that seeks to avoid confrontation. If things go well we'll see more transparency over time. A good start for a site like HN would be to create another page that shows just the titles of the submissions rejected (no links). People can google for those titles if they are interested.
In any case, I don't necessarily disagree, but I've yet to see good evidence of the shadowy cabal, rather than user-directed flagging. I mean, just look at the list in the article: does it point to any kind of "beating down stuff that doesn't fit the narrative"?
The stories that are buried are not appropriate for the front page. The reason you come to Hacker News is because it has a better front page, with better comments under it, than other places. You experience the benefit of this editorial intervention each and every day.
I've had a story buried as it was gaining a lot of traction very quickly: this one. https://news.ycombinator.com/item?id=11920431
The quality of the comments was inordinately low and it didn't look like it would be improving, which is the reason it was buried.
No complaints from me around this. You can email the moderators if you want to know their reasoning. (I'm not one.)
People here need to understand and be thankful for the extraordinary and ongoing work that the moderators do every single day to keep this place an appropriate place for interesting, deep discussion along the editorial lines chosen. It is not a democracy (see: reddit) but I find the moderators generally extremely fair.
As far as I understand the moderators bury tons of stories (often political, link-bait, etc), which do however get traction quickly until they do so. It is easy to get traction through click-bait.
Generating serious discussion is harder. For example, this title promises "the stories that Hacker News removes" -- but is not really about the stories that Hacker News removes. For example the author does not analyze the comments under them or see why it derails or is not a good contribution to HN.
It is more of a click-bait title is bait-and-switch, and is designed to generate easy outrage.
There's nothing remarkable here despite the traction this story is getting. It is part of the hidden workings that keep HN great. Dan and Scott (the moderators) do an extremely good and thankless job keeping the principles of this place alive.
You have no idea how hard they work and I've seen them make difficult and intricate decisions. (Sometimes as simple as detaching a thread that was derailing an important discussion.) In my opinion this story does not belong on the front page.
I would have flagged that submission if I saw it since it's flamebait.
But you are right, it was inappropriate. My point is there were 43 upvotes in a matter of a few minutes (and more coming) but it was not generating good discussion. The top comment:
>>hyperbovine 269 days ago [-]
>>Loads instantly, looks fine on mobile, the thing(s) you are probably interested in are linked right from the front page. As usual, Buffet is onto something here.
>>> walrus01 269 days ago [-]
>>> Looks fine in Lynx, too!
>>> http://imgur.com/yAEimmZ
Which is why I submitted it. I simply thought it was interesting.
However, although all the comments agreed with it (there was no flaming) and it was getting traction, the comments were simply not very high quality or generating any good discussion. It simply wasn't worthy of the front page despite getting voted there organically. I have no problem with it being buried.
I certainly don't share the sentiment. There are lots of places I or other people can visit and post to, what makes this particular site commend value is precisely the community and user-base it has fostered over the years. Thus, I believe that this kind of meta discussion does provide considerable value to the community.
Also, check out Tim Berner Lee's article about the internet being hijacked by the likes of Facebook and Google in order to understand why your 'I am grateful for the privilege' attitude is wrong...
Edit: argh, I attached this to the wrong post. I was responding to paulpauper of course.
So whilst it's a privilege to have such a resource, it's also acceptable to complain about issues with that resource; especially when things are being manipulated for unclear reasons with no transparency.
That said, the author wasn't complaining; just doing analysis and pointing out an oddity which was of interest given the context of their research.
It is appropriate that we readers/"consumers" have an interest in how our sources of information are controlled.
