
Show HN: Hacker News Title Edit Tracker - petercooper
https://hackernewstitles.netlify.com/
======
petercooper
I built this several months ago as I was interested to see how the titles of
posts evolve/change on HN. It turns out lots of titles are edited every day
(in both subjectively and objectively good and bad ways!) and I've found it
interesting to see how titles have evolved.

The whole thing is automated and is built around a Ruby script that pulls down
the titles on the front page at a frequent interval. Any titles for specific
story IDs that change get tracked and rendered out to a static HTML page
hosted on Netlify. It runs by itself without incident so far.

This has been on Show HN before but I was kindly invited by dang to repost it.

~~~
samstave
This would be really good to apply to watching spin on various verticals.

Such as entertainment and financial articles...

Recall during the banking crisis of 2008 there was the article stating that
the EU was about 16 TRILLION in debt crisis, but then that was too alarmist
and they didnt. Want that getting out, so they edited the title, but they
forgot to change the UrL which had the original title in it.

~~~
dang
News outlets do that all the time. NYT in particular. Often they don't change
the URLs, and I've always assumed that's for technical reasons—either there's
a system that's treats URLs as immutable, or it's copied through multiple
systems that the people who change the title can't edit, or some such thing.

~~~
CPLX
Perhaps you’ll appreciate this:

[https://mobile.twitter.com/nyt_diff](https://mobile.twitter.com/nyt_diff)

------
dang
That is such a great UI. I feel like the list should be longer. Can you do
that without it costing more?

Not all of those edits are by mods, of course. Some are made by submitters
(edit: as the site points out!). Also, some are because we switched the URL
and thus to the title of the new article (example:
[https://news.ycombinator.com/item?id=21616157](https://news.ycombinator.com/item?id=21616157)).
Those look weird if you assume they're moderation edits.

It looks like the list is sorted by reverse ID, which means articles that were
submitted earlier are lower down on the page. But sometimes we re-up those
([https://news.ycombinator.com/item?id=11662380](https://news.ycombinator.com/item?id=11662380)),
so from a front-page perspective some 'newer' stories are below older ones.

~~~
octosphere
> I feel like the list should be longer

Isn't there a small window of time to edit a submission before it gets
committed permanently to HN? I imagine the list would be bigger if the grace
period lasted much longer.

~~~
dang
Yes, but moderators sometimes edit titles and do other moderatory things after
that limit has expired.

------
mft_
Thanks - really interesting.

Take this one, for example:

 _nvidia Drops Support for CUDA on macOS

\--> CUDA 10.2 is the last release to support macOS

\--> CUDA Toolkit Release Notes_

The first two titles are quite interesting to me, as a macOS user and a
general follower of the tech space (and I'd note neither are sensationalised,
or click-bait, from what I can see). The last...? I'm not going to click on
that in a million years, as I don't work with CUDA.

I'm not totally clear what the moderators' motivations always are, but might
it be true that in _some_ cases, maybe they're prioritising strict accuracy
over interest, or discoverability? And as a result, their actions are actually
diminishing the value of HN as a discussion forum?

~~~
kemyd
You have the answer here:

[https://news.ycombinator.com/item?id=21617317](https://news.ycombinator.com/item?id=21617317)

"[...] The big no-no is extracting some interesting fact from inside the
article and using it as the title"

~~~
Tenoke
Ironically, that's exactly what dang often does, including today.

~~~
dang
Often? That doesn't seem true to me. Can you give examples?

------
dorkwood
This sequence provided some comedy:

> 21:00 Trends in the San Francisco poop crisis

> 21:15 Trends in the San Francisco (dog) poop crisis

> 22:25 Trends in the San Francisco (mostly dog) poop crisis

~~~
dang
[https://news.ycombinator.com/item?id=21610152](https://news.ycombinator.com/item?id=21610152)

~~~
quickthrower2
Title was updated because of some primary research on the data set. Thats
kinda cool.

------
Karunamon
Some of these edits seem really questionable when laid out like this.

 _Pew Research: 2.2% of Americans produce 97% of political tweets_

↓

 _Small share of U.S. adults produce majority of tweets on national politics_

Why remove the exact figures?

 _Former Apple chip executives found company to take on Intel, AMD_

↓

 _Three of Apple and Google’s former star chip designers launch NUVIA_

Isn't "star designers" more subjective than "executives"?

 _1.2B people exposed in data leak includes personal info, LinkedIn, Facebook_

↓

 _Personal and social information of 1B people discovered in data leak_

Why make the headline _less_ informative? Data leaks happen regularly. Data
leaks from Facebook and LinkedIn has different implications than a leak from
LexisNexis or a random blog.

 _Cloudflare open-sources Flan Scan, a network vulnerability scanner_

↓

 _Flan Scan: Lightweight Network Vulnerability Scanner_

Again, why _remove_ info? The fact that CloudFlare is behind this is more
interesting than yet another random tool.

 _Mozilla: “Dear Facebook: Stop cross site tracking by default”_

↓

 _Dear Facebook: Stop cross site tracking by default_

Same complaint. This distinguishes a random person making a random gripe from
freakin' Mozilla who has the control to make Facebook's tracking more
difficult.

\----

Every single one of these headlines actually are less informative or less
interesting (in general, of lower quality) than their original submissions.
They actually served to make HN less informative. WTF?

That gripe aside, _most_ of the edits are useful (typo fixing, adding dates,
and such). These just leave me scratching my head.

~~~
dang
> _Why remove the exact figures?_

That submitter broke the site guidelines by changing the article title when it
was neither misleading nor clickbait—so we changed it back. Also, we've found
that when a title is gratuitously numerical, it makes for worse discussion.
Why? I don't know. It just does. Therefore, if anything, we take numbers out
rather than add them in. For the same reason, we wrote software to abbreviate
"1,000,000" to "1M", "1,000,000,000" to "1B" and so on. Numbers in titles are
baity and long numbers baitier.

> _Isn 't "star designers" more subjective than "executives"?_

That title changed because we switched to a different URL and updated the
title to match the new article. See
[https://news.ycombinator.com/item?id=21616157](https://news.ycombinator.com/item?id=21616157).

> _Why make the headline less informative? [...] Data leaks from Facebook and
> LinkedIn has different implications than a leak from LexisNexis or a random
> blog._

That submitter broke the site guideline against editorializing. It's
editorializing to cherry-pick the details that you consider important and put
them in the title. That amounts to the power to determine the story for
everyone else, and on HN, submitters don't get such power. We prioritize
authors; submitters have no special rights over a story. If a submitter wants
to say what they think is important about an article, they're welcome to do
that in the comment thread, on a level playing field with everyone else.

In fact there was a lot of data leaked in that leak, not just LinkedIn's and
Facebook's. That's another important. Putting famous names in a title makes it
baitier and evokes lower-quality discussion, because it activates everyone's
pre-cached responses about the famous names. If anything, we are inclined to
take famous names out of a title, and certainly not to add them in.

> _Again, why remove info? The fact that CloudFlare is behind this is more
> interesting_

Because cloudflare.com is right next to the title:
[https://news.ycombinator.com/item?id=21605719](https://news.ycombinator.com/item?id=21605719).
From the guidelines: _If the title includes the name of the site, please take
it out, because the site name will be displayed after the link._

> _Same complaint. This distinguishes a random person making a random gripe
> from freakin ' Mozilla_

Same answer: mozilla.org is right next to the title:
[https://news.ycombinator.com/item?id=21599496](https://news.ycombinator.com/item?id=21599496).
Avoiding repetition is part of HN being organized around curiosity.

~~~
nickjj
> Also, we've found that when a title is gratuitously numerical, it makes for
> worse discussion. Why? I don't know.

This is interesting and an example of this happened recently in a post that
ended up on the front page with 50+ comments. It was titled "100k+ page views
a month for $5 with a self-hosted static site".

I chose that title because it kind of sets the stage of what to expect (a
small / medium tier site being hosted cheaply) but it did bring in a number of
comments where some people dropped in with "but that's only 0.04 posts per
second, anything could host that!" which kind of detracts from the content of
the submission which had nothing to do with saying those numbers are
impressive in any way.

It's definitely a tricky balance and is so context specific. I think that post
without the numbers wouldn't have gotten much engagement because "How I build,
deploy and host my static site" isn't that interesting at a glance and I
wonder if you came to the same conclusion because the title wasn't edited
other than capitalization.

~~~
dang
Yes, that one is a borderline case. Actually I included it in my GP comment as
an example, and said we decided to leave it up except for downcasing it! But
then I removed that bit, because the comment is so long.

------
bonoboTP
I am very thankful for this policy. The HN titles are essentially always
better and more honest representations of the article contents.

Keep up the good work!

------
tjoff
One thing I dislike about title edits, even when it is for the better, is that
it makes it hard to know if you already have read the article.

Popular topics often get multiple submissions and trigger related topics. When
the title changes it is hard to know whether it is a new link with
new/different information or whether it is the same one that I've already
read.

In my opinion once something hits the front page it is too late to edit it
(other than minor things such as adding year or [pdf/video]).

~~~
joncrane
My browser (FF) displays the link in a slightly lighter grey if I've visited
it.

To clean up my feed, I hide topics that I am "done with." That is, I gave it
the amount of attention I'm willing to give it. This keeps my feed updating
quicker as new items enter the stack at the bottom to replace the ones I've
hidden.

I started doing this because it mimics the behavior of one of reddit's
settings, which is to hide all topics I've voted on. I found the feature so
convenient that I started using the hiding behavior here in the same way.

------
kylebenzle
20:40 Is the San Francisco Poop Crisis Out of Control?

↓ 21:00 Trends in the San Francisco poop crisis

↓ 21:15 Trends in the San Francisco (dog) poop crisis

↓ 22:25 Trends in the San Francisco (mostly dog) poop crisis

~~~
dang
It's missing one now though.

------
kevlar1818
This is great! I have a feature request:

One thing that worries me about HN is that I've had a _link_ changed by a
moderator in the past. And I don't mean removing query junk or changing
between mobile/non-mobile sites -- I mean changing the link to an entirely
different site. Even worse, by the time I noticed, I couldn't edit or delete
my post (which I wanted to do because I disagreed with the new link). I was
essentially forced to post content I didn't want to post! I think this is
crosses the line from curation to impersonation.

~~~
dang
What was the link?

We change URLs every day, for lots of different reasons—for example if one
article is mostly just copying from another source, or if users suggest a
better URL. When we do that we nearly always post something saying what we
did, and including the previous link:

[https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...](https://hn.algolia.com/?dateRange=all&page=0&prefix=true&query=by%3Adang%20changed%20url&sort=byDate&type=comment)

[https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...](https://hn.algolia.com/?dateRange=all&page=0&prefix=true&query=by%3Adang%20switched%20htttp&sort=byDate&type=comment)

~~~
kevlar1818
It was over a year ago I think. I can do some digging, but not right now.

~~~
dang
If you find it, please email hn@ycombinator.com so we can take a look. If we
got something wrong, we're definitely interested in learning from that.

~~~
ignoramous
One such instance:

[https://news.ycombinator.com/item?id=21378471](https://news.ycombinator.com/item?id=21378471)

I remember commenting on a different article but the one that it points to now
is a different article:
[https://news.ycombinator.com/item?id=21384151](https://news.ycombinator.com/item?id=21384151)

~~~
dang
I checked the logs and you're right. The URL was originally
[https://techcrunch.com/2019/10/28/google-reportedly-in-
talks...](https://techcrunch.com/2019/10/28/google-reportedly-in-talks-to-
acquire-fitbit/). That article is simply pointing to the Reuters source, so a
moderator changed it in accordance with the site guidelines: " _Please submit
the original source. If a post reports on something found on another site,
submit the latter._ "

Normally when we do that we explain so in the thread, like this:
[https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...](https://hn.algolia.com/?dateRange=all&page=0&prefix=true&query=by%3Adang%20%22points%20to%22&sort=byDate&type=comment).
But this one happened overnight when no one who is public as a moderator was
awake. That's a flaw in the current moderation system.

[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html)

------
dredmorbius
As someone who frequently submits title change requests to the HN mod address
(hn@ycombinator.com), this is an interesting and insightful tool. No, none of
my recent requests (the most recent is a few days ago) are shown. Which
suggests I'm not too much of an annoyance, yet.

HN's mods (dang and sctb) are amazingly responsive and tolerant. I try to make
their job easier in requests by keeping those short and clear. Others may find
this useful or have futher suggestions.

Generally:

\- Action in the subject: title, clickbait, spam, link disintermediation
(pointing to a primary rather than secondary source), self-promotion (I've
landed on "one-note flute" to describe this), and behavioural issues. Also
occasionally vouches (for flagged posts/comments), or best-of nominations
(there's a curated list put out by HN monthly).

\- Followed by the title-as-presently standing. Should make identifying the
post easier.

\- Link to post (or comment) as first line.

\- Often a link to the submitted article.

\- A suggested change or revert. Often these are clear, sometimes not. My view
is that submitting _both_ a "this needs changing" and "here's my suggestion",
along with a possible rationale (usually a subhead, lede line, occasionally a
good overview line from the article) makes the editors' job easier.

These are often accepted, sometimes with a slight change, sometimes as is. I
generally don't follow up with a thanks or further acknowledgement, but
usually do note when the request _isn 't_ accepted that I'm OK with it.

There are times I've differed with views (usually tech-politics
intersections). I really wish HN could discuss such topics better than it
does, though that also seems to be ... _somewhat_ ... improving with time.
Discussion on HN is almost always superior to other venues I frequent online.

Response times are generally a few minutes to hours, longer during off-hours.
Rarely, less-critical issues may take a few days to generate a response. But
there's very nearly always one, which I appreciate.

------
dang
Arguably a bug: some of the titles listed have subsequent edits that don't
show up on the page. Examples:

[https://news.ycombinator.com/item?id=21619671](https://news.ycombinator.com/item?id=21619671)

[https://news.ycombinator.com/item?id=21609572](https://news.ycombinator.com/item?id=21609572)

Does the script stop checking them at some point?

~~~
usr1106
I understood the script only checks the front page. Could the change have
happened when the article was not on the front page? Or are all articles that
have ever been on the front page supposed to be checked "forever"?

I have no idea how often / how quickly an article could oscillate between rank
30 and 31 for example.

------
Kinnard
This is really cool! It'd be great to have a tool that tracks all
editing/curation/censorship on hacker news.

~~~
Kinnard
Or you know . . . we could just throw it onto the blockchain . . . ;)

------
ocdtrekkie
I'd love to see the domain added to the end of these, like they are on HN
itself. For example, one might wonder why NVIDIA CUDA Toolkit is shorted to
CUDA Toolkit... unless you see that (nvidia.com) is after it anyways.

It might add some context to some of the edits.

------
OrgNet
you should also track the comments moving from thread to thread (when an admin
moves comments from one thread to anoter), comment shadow banning, post
popularity based on upvote counts (lots of times, post get to the front page
with only a few votes), or maybe even track how fast posts get deranked

~~~
gerikson
HN has an API:
[https://github.com/HackerNews/API](https://github.com/HackerNews/API)

I'm using it to track common items here and on Lobste.rs (and Proggit):

[http://gerikson.com/hnlo/](http://gerikson.com/hnlo/)

Here's the endpoint for the latest 500 submissions:

[https://hacker-news.firebaseio.com/v0/newstories.json](https://hacker-
news.firebaseio.com/v0/newstories.json)

here's the one for the current top stories:

[https://hacker-news.firebaseio.com/v0/topstories.json](https://hacker-
news.firebaseio.com/v0/topstories.json)

It's actually quite nice to work with. I don't know how to keep track of
comments moving from thread to thread, because that's not a metric I'm
interested in, but it should be possible to track somehow.

------
yellow_lead
This shows how much work goes into moderation here. Almost all titles seem
more accurate and less clickbaity.

------
iamwil
What is the rule for titles being edited? Is it just no editoralization in the
titles?

~~~
tptacek
See the guidelines. The titles should be the original article title, unless
that title is clickbaity, in which case either a less baity subhed can be used
or a carefully written neutral headline. The big no-no is extracting some
interesting fact from inside the article and using it as the title;
submissions are community property, and the person who submits them isn't
entitled to decide what the most important angle in the article is.

------
Fnoord
I remember there being an extension for Firefox which showed editorialized
(news) site titles and content. I forgot the name though. Does anyone have any
recommendations for this feature?

------
phailhaus
Have you considered adding a "sort by amount of change" option? It would be
very interesting to see why larger changes happen compared to smaller ones.

------
carrozo
This would be interesting for a number of news sites too.

~~~
vageli
This is one for the new york times [0].

[0]:
[https://mobile.twitter.com/nyt_diff](https://mobile.twitter.com/nyt_diff)

------
julienreszka
This is great! Transparency, accountability, predictability etc should be more
prevalent in the web.

------
edoceo
Could you also match to the source article/post title?

------
jcims
It would be interesting to let submitters provide multiple titles and a/b test
them from the outset.

~~~
bonoboTP
The entire point is to make titles less clickbaity. A/B testing them to get
more clicks exactly defeats the entire purpose why mods edit titles.

~~~
jcims
Yes but the vast majority of folks on the Internet, including HN, don't get to
directly experience the effect that headlines have on engagement.

I never said it was a good idea, just interesting. :)

------
codetrotter
What frequency of sending GET requests to the servers of HN is an acceptable
rate for a bot? I tried to look for an answer on this but didn't find any.

In the past I got the IP of my server banned from accessing HN for sending too
many requests in too short of a time span. I found the unban interface that
you provide, lowered the request rate of my crawler and tried again but was
still sending too many requests in a limited amount of time and got the IP of
my server banned again. If I recall correctly, I got it unbanned a third time
and lowered the request rate even more but then got banned again and then I
think it would not allow automatic unbanning.

Don't remember if I just gave up at that point or if I sent an email about it,
or if I just waited some amount of time if there was a statement about how
long I would have to wait before being able to use the unban interface for the
IP of my server again.

Anyway, an official answer about the acceptable request rate would be nice.
Perhaps put it in the FAQ?

Also, if people doing automated GET requests were to create a unique UA string
for their scrapers that include a way for HN staff to get in touch, like for
example (but with actual names of bot and site)

    
    
        Examplebot/0.95 (+http://www.example.com/bot.html)
    

where the page on that URL would list an e-mail address for getting in touch,
as well as having a statement about how to verify that a given crawler belongs
to that service, would that help in not getting server IP banned
automatically?

~~~
dang
Once per 30 seconds. That's in our robots.txt:
[https://news.ycombinator.com/robots.txt](https://news.ycombinator.com/robots.txt).
We've been working for a long time on (edit: what we expect to be) some
serious performance improvements that might allow us to relax that limit. For
now though, HN's process still runs on a single core and we don't have much
performance to spare.

If you need more than that, you should use the Firebase-based API
([https://github.com/HackerNews/API](https://github.com/HackerNews/API)). The
public dataset is also available as a Google BigQuery table:
[https://bigquery.cloud.google.com/dataset/bigquery-public-
da...](https://bigquery.cloud.google.com/dataset/bigquery-public-
data:hacker_news).

Edit: since this subthread is not really on topic I detached it from
[https://news.ycombinator.com/item?id=21617478](https://news.ycombinator.com/item?id=21617478).

~~~
Havoc
>single core

Instead of further optimizing the already light design surely it would be
easier to do a gofund me and get a server upgrade? I certainly wouldn't mind
chipping in a buck or two if it relaxes limits.

~~~
jdsully
I think the problem is more the software is single threaded. I’m sure YC can
afford more than a single core machine.

~~~
dang
The software is multi-threaded but it runs on a platform (Racket) which
implements that as green threads, meaning it runs them all on a single core.

~~~
Havoc
ah. That makes a lot more sense than hn being short $100. Thanks for modding.

~~~
soegaard
Note that Racket besides threads also has places, which allows programs to use
more than one core.

