The whole thing is automated and is built around a Ruby script that pulls down the titles on the front page at a frequent interval. Any titles for specific story IDs that change get tracked and rendered out to a static HTML page hosted on Netlify. It runs by itself without incident so far.
This has been on Show HN before but I was kindly invited by dang to repost it.
Edit: there it is. Title reverted now.
(I had written a first version of this some time ago as a bookmaklet, but a browser extension is simpler to use.)
As said elsewhere, the extension helps to avoid clicking on the same story twice when the title changed. (I will add url monitoring too.)
Such as entertainment and financial articles...
Recall during the banking crisis of 2008 there was the article stating that the EU was about 16 TRILLION in debt crisis, but then that was too alarmist and they didnt. Want that getting out, so they edited the title, but they forgot to change the UrL which had the original title in it.
And this is a well-regarded national newspaper in Canada.
The actual turning the JSON into HTML takes place on Netlify. If you wish to render the JSON for your own benefit, the current version is always at http://s3.amazonaws.com/jeanfromeastenders/current.json
Can you talk more about the data infrastructure?
Not all of those edits are by mods, of course. Some are made by submitters (edit: as the site points out!). Also, some are because we switched the URL and thus to the title of the new article (example: https://news.ycombinator.com/item?id=21616157). Those look weird if you assume they're moderation edits.
It looks like the list is sorted by reverse ID, which means articles that were submitted earlier are lower down on the page. But sometimes we re-up those (https://news.ycombinator.com/item?id=11662380), so from a front-page perspective some 'newer' stories are below older ones.
Yes, I just increase a LIMIT in an SQL query. I'll try that now. I didn't want the page to look too cumbersome.
Isn't there a small window of time to edit a submission before it gets committed permanently to HN? I imagine the list would be bigger if the grace period lasted much longer.
Take this one, for example:
nvidia Drops Support for CUDA on macOS
--> CUDA 10.2 is the last release to support macOS
--> CUDA Toolkit Release Notes
The first two titles are quite interesting to me, as a macOS user and a general follower of the tech space (and I'd note neither are sensationalised, or click-bait, from what I can see). The last...? I'm not going to click on that in a million years, as I don't work with CUDA.
I'm not totally clear what the moderators' motivations always are, but might it be true that in some cases, maybe they're prioritising strict accuracy over interest, or discoverability? And as a result, their actions are actually diminishing the value of HN as a discussion forum?
"[...] The big no-no is extracting some interesting fact from inside the article and using it as the title"
I wonder whether a subtitle giving an editorialised “so what” would help... but that would of course mean twice as much to moderate, and twice as much for people to complain about...
There are hybrid cases too, where we'll leave an editorialized title up because of what you call discoverability until the post is well-established on the front page, then change it at that point. Also, we're responsive to feedback, so when a user makes a good case for us having made the wrong edit, we'll change it. When users disagree with each other, though, as happened here, that gets harder.
Btw I wrote about that CUDA title here: https://news.ycombinator.com/item?id=21618991. We'd normally have left the second-last edit up, but I changed it in response to a user complaint. Had it not been midnight on Saturday I'd have just asked the readers whether the macOS detail was really the single important thing in the article or not, and decided based on that.
Your question about moderator motivation doesn't quite land with me though. We're not excluding any of those values, we're just trying to balance them in a good way. What that means is that you're going to get a different answer, often a surprisingly long answer, about specific cases.
Whether a thread will get changed or not seems very inconsistent.
I think most readers would be surprised by how much thought goes into these title edits and how they are based on principles, i.e. the site guidelines and lots of heuristics that derive from them. That's why we're always happy to answer questions about specific cases.
Trends in the San Francisco poop crisis
Trends in the San Francisco (dog) poop crisis
Trends in the San Francisco (mostly dog) poop crisis
Pew Research: 2.2% of Americans produce 97% of political tweets
Small share of U.S. adults produce majority of tweets on national politics
Why remove the exact figures?
Former Apple chip executives found company to take on Intel, AMD
Three of Apple and Google’s former star chip designers launch NUVIA
Isn't "star designers" more subjective than "executives"?
1.2B people exposed in data leak includes personal info, LinkedIn, Facebook
Personal and social information of 1B people discovered in data leak
Why make the headline less informative? Data leaks happen regularly. Data leaks from Facebook and LinkedIn has different implications than a leak from LexisNexis or a random blog.
Cloudflare open-sources Flan Scan, a network vulnerability scanner
Flan Scan: Lightweight Network Vulnerability Scanner
Again, why remove info? The fact that CloudFlare is behind this is more interesting than yet another random tool.
Mozilla: “Dear Facebook: Stop cross site tracking by default”
Dear Facebook: Stop cross site tracking by default
Same complaint. This distinguishes a random person making a random gripe from freakin' Mozilla who has the control to make Facebook's tracking more difficult.
Every single one of these headlines actually are less informative or less interesting (in general, of lower quality) than their original submissions. They actually served to make HN less informative. WTF?
That gripe aside, most of the edits are useful (typo fixing, adding dates, and such). These just leave me scratching my head.
That submitter broke the site guidelines by changing the article title when it was neither misleading nor clickbait—so we changed it back. Also, we've found that when a title is gratuitously numerical, it makes for worse discussion. Why? I don't know. It just does. Therefore, if anything, we take numbers out rather than add them in. For the same reason, we wrote software to abbreviate "1,000,000" to "1M", "1,000,000,000" to "1B" and so on. Numbers in titles are baity and long numbers baitier.
> Isn't "star designers" more subjective than "executives"?
That title changed because we switched to a different URL and updated the title to match the new article. See https://news.ycombinator.com/item?id=21616157.
> Why make the headline less informative? [...] Data leaks from Facebook and LinkedIn has different implications than a leak from LexisNexis or a random blog.
That submitter broke the site guideline against editorializing. It's editorializing to cherry-pick the details that you consider important and put them in the title. That amounts to the power to determine the story for everyone else, and on HN, submitters don't get such power. We prioritize authors; submitters have no special rights over a story. If a submitter wants to say what they think is important about an article, they're welcome to do that in the comment thread, on a level playing field with everyone else.
In fact there was a lot of data leaked in that leak, not just LinkedIn's and Facebook's. That's another important. Putting famous names in a title makes it baitier and evokes lower-quality discussion, because it activates everyone's pre-cached responses about the famous names. If anything, we are inclined to take famous names out of a title, and certainly not to add them in.
> Again, why remove info? The fact that CloudFlare is behind this is more interesting
Because cloudflare.com is right next to the title: https://news.ycombinator.com/item?id=21605719. From the guidelines: If the title includes the name of the site, please take it out, because the site name will be displayed after the link.
> Same complaint. This distinguishes a random person making a random gripe from freakin' Mozilla
Same answer: mozilla.org is right next to the title: https://news.ycombinator.com/item?id=21599496. Avoiding repetition is part of HN being organized around curiosity.
> mozilla.org is right next to the title
Hmm... hey, petercooper, perhaps you should consider including the submission website in the edit tracker somehow? Without that, anyone who reads the edit tracker is missing an important piece of information that actual HN readers see.
This is interesting and an example of this happened recently in a post that ended up on the front page with 50+ comments. It was titled "100k+ page views a month for $5 with a self-hosted static site".
I chose that title because it kind of sets the stage of what to expect (a small / medium tier site being hosted cheaply) but it did bring in a number of comments where some people dropped in with "but that's only 0.04 posts per second, anything could host that!" which kind of detracts from the content of the submission which had nothing to do with saying those numbers are impressive in any way.
It's definitely a tricky balance and is so context specific. I think that post without the numbers wouldn't have gotten much engagement because "How I build, deploy and host my static site" isn't that interesting at a glance and I wonder if you came to the same conclusion because the title wasn't edited other than capitalization.
"Please don't do things to make titles stand out, like using uppercase or exclamation points, or saying how great an article is. It's implicit in submitting something that you think it's important.
Please submit the original source. If a post reports on something found on another site, submit the latter.
If you submit a link to a video or pdf, please warn us by appending [video] or [pdf] to the title.
If the title includes the name of the site, please take it out, because the site name will be displayed after the link.
If the title begins with a number or number + gratuitous adjective, we'd appreciate it if you'd crop it. E.g. translate "10 Ways To Do X" to "How To Do X," and "14 Amazing Ys" to "Ys." Exception: when the number is meaningful, e.g. "The 5 Platonic Solids."
Otherwise please use the original title, unless it is misleading or linkbait; don't editorialize."
Isn't that exactly the opposite of how it's supposed to be done? "Editorializing" is a meaningless term if it means adding neutral, factual clarifying statements (example: "small share" -> 2.5%)
PIA (PrivateInternetAccess) VPN bought by company known for distributing malware
PIA: Our Merger with Kape Technologies – Addressing Your Concerns
Useful -> Non-useful. Who is PIA and why should I care that they're merging? Expanding the acronym was useful in this case. Not doing so is a form of clickbait all on its own.
The difference between an expert’s brain and a novice’s
Differences between expert and novice brains in mice: study
Note that the original headline is the one used by the source. Okay, so clarifying statements and general rewording can be added to non-clickbait, non-editorialized headlines in some cases, but not others?
Seriously, I'd like to not have to make work for the mods, but I see this lack of consistency combined with unintuitive, seemingly against-rules changes here that I actually do not know what is expected of me when submitting an article given that a plain reading of the guidelines doesn't match the application in the real world.
When you have a rule that's actively making things worse, and is this confusing even for the people who enforce it (I'm seeing a lot of headlines that have multiple changes, hours apart) it's not a bad idea to consider revising the rule.
Taking that PIA headline as an example, even if the chosen title is factual ("bought a company known for distributing malware"), the submitter is still making a judgement about what part of the story is most important. That is editorializing.
Let's look at your further examples.
1. Re "PIA (PrivateInternetAccess) VPN bought by company known for distributing malware", that one was outrageously editorialized. This is a complicated case though, because we often allow changed titles when the article is a corporate press release. Companies tend to use bland nothing-to-see-here language that can be misleading in its own right. But "bought by company known for distributing malware" is a massive claim and we have no idea whether it's true. We're in no position to adjudicate the truth or falsehood of everything people put in titles. So in this case, both choices were bad: the editorialized title as well as the bland original. If we could have come up with a better (i.e. accurate and neutral) title, we would have used it. But it wasn't obvious how to do that, so we went with the lesser evil of the original title. (We also got an email complaining about the editorialized title, but the complaint didn't cause us to change it, it just caused us to take a closer look.)
As for the acronym, it would exceed the 80 char limit to expand it in the title, so I didn't. Also, "Private Internet Access" is so generic a phrase that I'm not sure it's all that clarifying. Anyone who would know what that phrase means in this context would already know what PIA stands for. For those who don't, it would only take a tiny bit work to find out, and it's not a bad thing for readers to have to work a little. https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu.... One more thing, too: when an acronym is used in a title unexpanded (e.g. "CS", "AWS", "AMA"), it often contains the subtle additional information that the acronym is a widely-used one in the community, which PIA is: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu.... So in this case I'd say the acronym was a wash. By the way, we really do consider all these details when weighing how to edit a title, and since I was the moderator in this case I can tell I did that; more, even—but this answer is already too long.
2. We've learned from long experience that science articles lead to shallow, angry arguments when their titles make excessive claims. Instead of talking about the actual finding, the threads fill up with objections to how overstated the title is, and (worse) generic ventings about title inaccuracy and the decline of journalism and western civilization. In other words, such titles are baity.
We've learned that the best way to de-bait them is simply to narrow their scope, i.e. shrink the title down to the size of the real story. In this case we applied two scope-narrowers: "in mice", which is a great de-baiter when the story is about a mouse study, and ": study", which makes it clear that the topic is just a study, not the god's-lips-to-your-ears revelation that headline writers love to imply. Basically, it's their job to sex up the title and our job to knock it down to size. Those two devices work well for preventing science-article threads from going off track in predictable ways.
I know all of this can appear inconsistent, but that's because there's so much more going on, so many more concerns and details than one would ever imagine in the title-editing domain. For the first few years I did this job I used to find that irritating—how can an issue as trivial as internet titles be so important? Over time I learned how much more there is to it. I wrote about this here if anyone is interested: https://news.ycombinator.com/item?id=20429573.
The one thing I feel most comfortable defending about all this is how consistent we are—not completely consistent, but for sure relatively. You may not agree with the principles and that's fine, but we're not applying them arbitrarily. A nice aspect of the job is that we don't have to apply them arbitrarily: we have a good set of principles to rely on and they cover most of what comes up. We could talk for the rest of the year about specific cases and why exactly we edited them the way we did. But it all comes down to the site guidelines and the fact that intellectual curiosity is the organizing value of this site.
HN's hive-mind at work. social platforms need to rotate mods imo.
Keep up the good work!
Popular topics often get multiple submissions and trigger related topics. When the title changes it is hard to know whether it is a new link with new/different information or whether it is the same one that I've already read.
In my opinion once something hits the front page it is too late to edit it (other than minor things such as adding year or [pdf/video]).
To clean up my feed, I hide topics that I am "done with." That is, I gave it the amount of attention I'm willing to give it. This keeps my feed updating quicker as new items enter the stack at the bottom to replace the ones I've hidden.
I started doing this because it mimics the behavior of one of reddit's settings, which is to hide all topics I've voted on. I found the feature so convenient that I started using the hiding behavior here in the same way.
So for each title you can see the original text.
Trends in the San Francisco poop crisis
Trends in the San Francisco (dog) poop crisis
Trends in the San Francisco (mostly dog) poop crisis
One thing that worries me about HN is that I've had a _link_ changed by a moderator in the past. And I don't mean removing query junk or changing between mobile/non-mobile sites -- I mean changing the link to an entirely different site. Even worse, by the time I noticed, I couldn't edit or delete my post (which I wanted to do because I disagreed with the new link). I was essentially forced to post content I didn't want to post! I think this is crosses the line from curation to impersonation.
We change URLs every day, for lots of different reasons—for example if one article is mostly just copying from another source, or if users suggest a better URL. When we do that we nearly always post something saying what we did, and including the previous link:
I remember commenting on a different article but the one that it points to now is a different article: https://news.ycombinator.com/item?id=21384151
Normally when we do that we explain so in the thread, like this: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que.... But this one happened overnight when no one who is public as a moderator was awake. That's a flaw in the current moderation system.
HN's mods (dang and sctb) are amazingly responsive and tolerant. I try to make their job easier in requests by keeping those short and clear. Others may find this useful or have futher suggestions.
- Action in the subject: title, clickbait, spam, link disintermediation (pointing to a primary rather than secondary source), self-promotion (I've landed on "one-note flute" to describe this), and behavioural issues. Also occasionally vouches (for flagged posts/comments), or best-of nominations (there's a curated list put out by HN monthly).
- Followed by the title-as-presently standing. Should make identifying the post easier.
- Link to post (or comment) as first line.
- Often a link to the submitted article.
- A suggested change or revert. Often these are clear, sometimes not. My view is that submitting both a "this needs changing" and "here's my suggestion", along with a possible rationale (usually a subhead, lede line, occasionally a good overview line from the article) makes the editors' job easier.
These are often accepted, sometimes with a slight change, sometimes as is. I generally don't follow up with a thanks or further acknowledgement, but usually do note when the request isn't accepted that I'm OK with it.
There are times I've differed with views (usually tech-politics intersections). I really wish HN could discuss such topics better than it does, though that also seems to be ... somewhat ... improving with time. Discussion on HN is almost always superior to other venues I frequent online.
Response times are generally a few minutes to hours, longer during off-hours. Rarely, less-critical issues may take a few days to generate a response. But there's very nearly always one, which I appreciate.
Does the script stop checking them at some point?
I have no idea how often / how quickly an article could oscillate between rank 30 and 31 for example.
It might add some context to some of the edits.
I'm using it to track common items here and on Lobste.rs (and Proggit):
Here's the endpoint for the latest 500 submissions:
here's the one for the current top stories:
It's actually quite nice to work with. I don't know how to keep track of comments moving from thread to thread, because that's not a metric I'm interested in, but it should be possible to track somehow.
I never said it was a good idea, just interesting. :)
In the past I got the IP of my server banned from accessing HN for sending too many requests in too short of a time span. I found the unban interface that you provide, lowered the request rate of my crawler and tried again but was still sending too many requests in a limited amount of time and got the IP of my server banned again. If I recall correctly, I got it unbanned a third time and lowered the request rate even more but then got banned again and then I think it would not allow automatic unbanning.
Don't remember if I just gave up at that point or if I sent an email about it, or if I just waited some amount of time if there was a statement about how long I would have to wait before being able to use the unban interface for the IP of my server again.
Anyway, an official answer about the acceptable request rate would be nice. Perhaps put it in the FAQ?
Also, if people doing automated GET requests were to create a unique UA string for their scrapers that include a way for HN staff to get in touch, like for example (but with actual names of bot and site)
If you need more than that, you should use the Firebase-based API (https://github.com/HackerNews/API). The public dataset is also available as a Google BigQuery table: https://bigquery.cloud.google.com/dataset/bigquery-public-da....
Edit: since this subthread is not really on topic I detached it from https://news.ycombinator.com/item?id=21617478.
Instead of further optimizing the already light design surely it would be easier to do a gofund me and get a server upgrade? I certainly wouldn't mind chipping in a buck or two if it relaxes limits.
I think it's awesome that there's a site this popular out there that _doesn't_ run on resume-fashionable Google/Facebook scale infrastructure that it does not require.
"Yeah boss, we're going to have to pull out memcached and replace it with a kubernates cluster of MongoDB shards. For 'reasons'..."