Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Hacker News Classics (jsomers.net)
864 points by jsomers on Feb 23, 2018 | hide | past | favorite | 107 comments

Occasionally you see articles on HN with a date in the title, like “(1998)” — and over the years I’ve noticed that these tend to be some of the best posts.

It makes sense: on a site devoted to news, an article posted so long after it was published has to be especially good.

So I hacked together this page, which links to every HN post with a date in its title earning more than 40 votes. It’s sorted in chronological order to encourage wandering.

Really cool!

Out of the 2k+ posts you list, some people are good at finding good quality classics!

The top 10 here combined accounts for more than 10% of all the posts:

    USER         POSTS COUNT
    Tomte        66
    luu          42
    tosh         33
    ColinWright  27
    adamnemecek  26
    vezzy-fnord  26
    brudgers     25
    pmoriarty    24
    networked    23
    shawndumas   22

Unfortunately, I have so far failed to get a (1538) submission to stick.

Well, let's try again: https://news.ycombinator.com/item?id=16444460

I think it does not stick, because I don't understand anything from the title

Well at least you know that it's by Andreas.


Looking at the source code, it only considers titles from 1900 to 2010 as classics.

Tomte is Swedish for "Santa", which seems fitting.

Tomte is not really Santa. It's more a gnome like creature. He's normally living near farms and protects the family and animals. The connection with christmas is quite new and seems more connected with it's other name Nisse, or Julenisse in particular. "The Tomten" by Astrid Lindgren seems to give a good overview of the folklore.

Why? Does he know when you are naughty? Do you also feel like watching out and not crying when he comes to town?

If I recall correctly, the story of Tomte is one where he is alone on Christmas night, pondering what life and death is, and decides that the art of giving is best virtue. He then knocks on everyone's door with a pig beside him and hands out presents to everyone who opens it for him.

we need more Tomtes in this imperfect world of ours.

For links that are now defunct/gone (such as #2 on the chronological list), can you pull the web archive at the closest date to the submission?

(dead) http://yorktownhistory.org/homepages/1900_predictions.htm ===> https://web.archive.org/web/20100108205037/http://yorktownhi... (archive from 6 days later)

I don't think this can be done programmatically though... thanks for putting this together. I enjoy the old posts a lot, too.

The url can be found via the archive's API, and you can specify a timestamp.


It returns a json with, notably, the closest archived page given the timestamp.


That's way more fantastic than I thought their API would be! Thanks. Hope OP sees it.

FWIW, #4 "Two 4000 ft plumb bobs hung down a mine shaft, with baffling results (1901)" can be found at http://www.lockhaven.edu/~dsimanek/hollow/tamarack.htm

I also stumbled upon this too. The article was archived, I wonder why it's not redirecting. Now I'm finally enjoying this post too.


Very cool! Reminds me of the Ezra Pound quote, "Literature is news that stays news."

Ezra was a... complicated man, but he had keen insights.

He was also a fascist to say the least of him. I would refrain from using any quote of him

can "bad people" not have good ideas?

You are absolutely right, some articles are indeed timeless yet it's quite easy to miss them. This is a very good idea. A personal favorite of mine that gets posted here often is the Golden Rules for Making Money By P.T. Barnum (1880).

This is a clever and simple way to dredge up some great posts.

I wonder if you could ensure fewer false negatives (i.e. find even more great stuff) by doing the opposite: attempting to filter out every post whose link is to a page that came into existence within a month of the post's submission date.

This would likely require scraping the source links (unless you can get that from the https://cloud.google.com/bigquery/public-data/hacker-news dataset or somesuch), but it might be worth it anyway. It'd literally be "Hacker News, minus anything that looks like News."

I wonder how would you determine when a page came into existence.

Someone mentioned the archive.org/wayback API. You could check the oldest archive.org/wayback snapshot is over a certain age.

google certainly has some approximation of that, for example

It helps that for us readers we've got a lot of time after they were published to to evaluate those articles with.

I probabbly wouldn't understand most of them nearly as well if they weren't proven out by recent history ;)

hi james, i miss you

merci, mr. j.

So in case I’m not spending enough time on the front page of this HN, good to know there’s a nearly infinite depth to plumb.

And now I’m reading Arnold Bennett’s ‘How to Live on 24 Hours a Day’ originally published in 1908. Good to know tongue-in-cheek self-help books will never go out of style!

[1] - http://www.gutenberg.org/files/2274/2274-h/2274-h.htm

I actually enjoyed reading this today. One highlight was the below on "reading the news every morning"

" The idea of devoting to them thirty or forty consecutive minutes of wonderful solitude (for nowhere can one more perfectly immerse one's self in one's self than in a compartment full of silent, withdrawn, smoking males) is to me repugnant "

You can also to go https://news.ycombinator.com/front to bounce around how HN's front page looked on past days, like https://news.ycombinator.com/front?day=2016-06-20.

Or click on the date in an account's profile to see what HN looked like on its birthday.

Note that https://news.ycombinator.com/front?day=2007-02-18 doesn't let you navigate backwards, but https://news.ycombinator.com/front?day=2006-10-09 is the first day.

Out of curiosity, for the date rendering, did you xdef racket's date libraries or implement your own date algorithms in arc?

I guess you could've also shelled out to `date -r <timestamp>` which would get the feature done in about ten seconds.

> 2007-02-18 doesn't let you navigate backwards

Yes, and we specifically left it as an exercise for the pokey reader to figure out why.

Arc does its own date stuff. We extended it a bit to be able to print things like "x months ago". Edit: and "Feb 31, 2017".

Just curious: do you (as in Y combinator) use Arc for anything else or is it a "hobby language" for HN?

All of YC's original investment software was written in Arc, though it has been gradually factored out into other systems.

We use it for a browser extension that helps a lot with moderation work. And for general experiments. If I do something more ambitiously new it will probably be in Arc.

We use it for a browser extension that helps a lot with moderation work.

Oho, so Arc does compile to JS now.

You gonna release the compiler or what? (Sometime this decade.)

Probably started logging front page on that day.

No, we only started that in Nov 2014. Before then, the /front pages sort things by points because that's all the data we have.

There seems to be a bug, I can see pages of Feb 30, 31 - https://news.ycombinator.com/front?day=2017-02-31

Yikes! Thanks for letting us know.

Edit: should be fixed now.

The secret is out now.

You can't deny it anymore. Everyone now knows that 31st of February exists.

Nice thing is, the go back a day link links to Mar 02

Cool, would it be possible to put two dynamic links 'yesterday' and 'two days ago' in the footer? I would really use those.

That's a good idea. Do you mean in the text where it says "Guidelines | FAQ | ..."? or somewhere else?

Yep, that would be a great spot, out of the way of the main UI and yet easy to reach.

With regards to Classics, the "You're probably using the wrong dictionary" blog post at http://jsomers.net/blog/dictionary was probably the most interesting pieces I've read on HN.

Very nice indeed. Immediately ran downstairs to verify a Websters encyclopedia dictionary I bought in the UK around 1997 only to find it wasn't the version with all the prose. It's still a very nice illustrated (and heavy!) dictionary, but not quite like what jsomers describes.

So downloaded the version he offers for download and am now installing the dictionary using the steps he provides.

Thank You, really looking forward to using it.

edit: and his install steps still work, using it in the dictionary app right now. Super!

I wish I had a good word to describe my gratitude for this post! I've though reverse dictionary could work but doesn't. Another dictionary I would love to have access to is one where you can find words in other languages that don't exist in yours.

Wow that was a great read, thanks!

This is very cool! Based on the title “HN Classics,” I thought it might be an implementation of another idea I’ve had. I had this idea to search for stand-out articles that get reposted every so often and always get a lot of good attention and discussion: so-called classics of HN.

I think that data could be very interesting, and would also serve as a sort of “hall of fame” of articles the HN community loves (moreso than just a list of the most upvoted articles of all time).

Of course, you could also figure out the optimal duration between successive posts, and figure out when to repost yourself, if anyone wants a karma-grab. ;)

I thought reposts (and therefore scattered discussions) were discouraged on HN.

From what I’ve seen, it’s perfectly acceptable for older articles that are reposted after a considerable amount of time.

The best example I can find on short notice is this hexagonal grids article:


You could likely find more by searching HN comments for the phrase “previous discussion,” because people tend to post links to the previous HN submissions.

It doesn’t bother me, since there are likely always new HN readers who haven’t seen great “classic” articles.

The HN FAQ says:

> If a story has had significant attention in the last year or so, we kill reposts as duplicates. If not, a small number of reposts is ok.


I'd prefer to sort them by points. By age is interesting, by the way.

Do you do some deduplication. Some classics are resubmitted a few times successfully. Perhaps add a third sort criteria, that is sorted by the number of big reposts.

Yeah I'd prefer it too, so I've converted the stories.json into a spreadsheet here[1] using this simple PHP script[2]. Feel free to edit, sort them as you like!

I created this for my own use but sharing it here because others may find it useful (hope this isn't breaking any HN etiquette and if you're the original author and want me to take it down please let me know).

[1] https://docs.google.com/spreadsheets/d/16MNPM9fhpglC1s1OV-tp...

[2] https://gist.github.com/san-kumar/0b7d218a08d4d8b4a7d9c07db5...

Problem with by points is that over time as more people come to HN the number of points are worth less, unless HN does like Reddit does and weighs them but even then such a system tends to change over time and so it’s still not comparable.

For example say that several years ago a post might hit front page and 500 people would vote on it. If 300 people voted up and 200 down then it gets 100 points.

Fast forward and now there are more people on the site. That means greater confidence in the percentage but because only points and not percentage of up/down is revealed, a post with similar percentage up- vs downvotes appears to be better. 600 up and 400 down is still 60% up vs 40% down but it results in 200 points total.

Like I said though maybe HN does account for this and 100 points today means the same percentage of people upvoted it, I don’t know.

You can't downvote posts on HN, you can flag them but you can't downvote them. On HN, downvoting is only possible on comments.

You are right I forgot that, heh. Anyway, the problem remains that over time there will be more people to upvote so a post is likely to get more points if it was posted in the future than in the past.

Not really. Just tried to upvote an old post from 10 years ago:


As a result, its score did not increase, although upvoting "worked".

No I mean that a front page post five years ago had fewer people to vote on it so it would get fewer points simply for that reason compared to a front page post today.

Yes it looks like they do deduplication, based on the HTML source:

  for (var i in data[yr]) {
    var story = data[yr][i];
    if (seen_titles[story.title]) continue;
    seen_titles[story.title] = 1;
  // ...add story to list of stories

Since the first post is 1900 and it appears to be sorted by date, I wanted to verify: there were no posts from the 1800s worth showing?

Any way this could be done without requiring javascript?

For security, ad-blocking, and privacy reasons I have javascript blocked in my browser, and really don't like to turn it on except for sites like my bank.

"Any way this could be done without requiring Javascript?"

Could be done many ways.

Lets imagine you like text, you dont mind reading something somewhat structured and regular like json and you dont need all the html, css, etc. tags and window dressing.

5-minute quick and dirty solution

1. curl -4o 1.htm http://jsomers.net/hn/stories.json

    tr , '\12' < 1.htm \
    |exec sed '

    /\"url\":[^{]/{s/\"url\":/&<\/pre><a href=/;s/$/>[FETCH]<\/a><pre>/;};
    /\"objectID\":/{s//&<\/pre><a href=https:\/\/news.ycombinator.com\/item?id=/;s/$/>[FETCH COMMENTS]<\/a><pre>/;s/\"//3;s/\"//3;};

   ' > 2.htm
2. Navigate to file:///2.htm

This is a totally reasonable request and I was looking to say much the same thing. Anything resembling HN, or in the spirit of HN, shouldn't require JS to view, I think.

You could render the http://jsomers.net/hn/stories.json some other way, or examine the javascript yourself, as it is not obfuscated or anything

For security, noise blocking, and public unreasonableness I have only HN allowed in my browser, that is, if a site doesn't work without js, there's a high chance I don't want to read it. My bank page luckily doesn't need js either, to function.

Wow, this is great.

Were there no pre 1900 refrences or is that just were you chose to start?

I started it somewhat arbitrarily in 1900.

ahh was about to publish something in BC just to see what happens ;)

The Wheel: A More Efficient Method for the Production of Round Ceramic Wares (3500 B.C.)

But more seriously, I remembered the complaint tablet to Ea-nasir[1], which was on HN a few months ago[2]. Written in 1750 BCE.

[1] https://en.wikipedia.org/wiki/Complaint_tablet_to_Ea-nasir

[2] https://news.ycombinator.com/item?id=15669759

ah i was thinking of making something like this but never did anything. i would have called it Hacker Olds :)

well done!

It may need some broken link tools. I clicked on the 4th link and the site no longer exists, Wayback Machine has it though - https://web.archive.org/web/20170606131659/http://www.lhup.e...

Perhaps Hacker Classics could automatically ensure all remaining live links get archived?

> Perhaps Hacker Classics could automatically ensure all remaining live links get archived?

Done. I'm going to run through every URL ever cited on HN at some point when time permits.

"it's easy to forget that the web is the greatest library in the history of the world"

Until someone stops paying the host server or the domain expires and then those books just disappear.

There's something odd about the file checked in to Github; it's cut off on line 50:


Suggested title: Hacker Olds

Interesting how both now and 'then' Tesla is still in the top stories.

This is pretty nice. I also think a browse by year feature would be neat. By that I mean in the same fashion as OP, so I could for example look at posts that had (2011) in the title by going to 2011, etc.

wow, this is really interesting..

and I just reliazed one of the reason why classic writings are special -- especially for another writing material purpose is because it has passed the the test of time..

Thanks for this. Can't upvote this enough. Still going through lots of gems that I would never have read without this. p.s thanks for ruining lot of my work hours. but thanks!

Anyone fancy putting all of these into a Kindle/PDF? :)

Weekend project challenge! I do lots of scrape-to-ebook projects, so this is fun for me.

Check back a few hours later.

Progress so far: https://git.captnemo.in/nemo/hn-classics

(Scraped content to markdown, simple Jekyll website in progress)

Let me know...

I'm not sure how I feel about having three items on this list (including the current #1). I might need to take my buggy out for a long trot to contemplate.

Hmmm... lovely idea, but in practice what I'm getting is a bunch of 404s. The internet is a library with a bunch of books missing...

Interesting how as we approach from 1903 to 2010 we ultimately go from more general and deep to more specialized and disconnected.

There are some great comment threads there that had me itching to chime in, 5 years too late. What an excellent project.

Why the cutoff at 2012? (line 71 of index.html)

> if (data.hasOwnProperty(yr) && parseInt(yr) < 2012)

this is fantastic. Great snapshot of the past. Thank you jsomers for sharing this.

It's fascinating to dig into HN literature from 1900!

this is really brilliant. But is there a reason you sort the list from the oldest to the newest ? I think it would make sense to have it the other way around.



You could say they are… evergreen (title bar color)

Second link is dead. So is the fourth.

Very cool! how will you maintain it?

This is glorious. Thank you.

Paywall on article from 1930. That's a sad state of affairs.

hacker news is awsome


It's working.

Really slow for me on chrome 64 for android.

I found this page of undocumented HN tips [1], finds Hacker News Classic [2] and the top post there was Hacker News Classics [3].

[1] https://github.com/minimaxir/hacker-news-undocumented

[2] https://news.ycombinator.com/classic

[3] http://jsomers.net/hn/

Quite a cool coincidence!

Nice to see E.M. Forster’s The Machine Stops there near the top of the list when clicking the link. Thanks for sharing.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact