Don't put the low-point stories in your list because it makes the load times incredibly high for your page.
Additionally, there's a flaw with your definition of evergreen: a large portion of the (YYYY) submissions are made in order to provide ironic juxtaposition with current events, which makes the submission meaningless outside of that context.
Agreed. But I got a laugh out of the fact that the least valuable "evergreen" is called "Malcolm Gladwell on spaghetti sauce." So that's what that guy is on...
His grandmother spent 10,000 hours making it every Sunday.
In all seriousness, this list, however flawed, is nice to take a look at as I see I've missed a few of these. But a cutoff would probably make more sense, at least on the main page.
Tellingly, Malcolm Gladwell talking about spaghetti sauce is just him talking about someone else's research. It should really be called "Malcolm Gladwell on Howard Moskowitz on spaghetti sauce".
I think you're being a bit too harsh by repeating that the author's analysis is flawed. Like Wikipedia, it's good enough. It gives an interesting overview of past submissions without trying to be perfect.
I’m also wondering what fraction of evergreen articles are actually marked with a date. For example this submission [1] was much more popular than the same submission with a date [2].
Here is the section that addresses the randomness you speak of:
"For a given month, consider the collection of scores of evergreen stories (or non-evergreen stories). It is reasonable to assume that the observed collection of scores is only one of many possible ways the collections of scores might have occurred, i.e. the scores could have occurred with different values from what we observe. Rephrased as a thought experiment, if we had the ability to repeat the story submissions for a given month many times, we would expect the scores to vary from one attempt to the next. Let’s formalize this concept.
In the analysis that follows, we treat each measurement as a realization of a random variable. For example, in a given month, let’s say there are n evergreen stories, each with a score. We view the collection of scores in a given month as being generated by a sequence, X_1, …, X_n, of independent random variables."
The "randomness" I'm referring to is the answer why one submission got more points than another. It depends on multiple factors, such as time submitted, density of other submissions, current events etc.
This is the same reason your theory is flawed: assuming "independent random variables" only works if events have an equal probability of occurring.
Good point. Ideally, the data should be normalized wrt. total number of HN accounts. But it’s likely just meant as a treasure trove of submission, which it probably still is to some extent.
To me, an "evergreen" story would be one that has been posted multiple times throughout the years, always getting significant points. That's more of a squishy definition, through, which means it will be a bit harder to write a script to find. (Need to pick a number of submissions, number of points while taking into account inflation, and do some URL dupe-checking.)
An evergreen story is any story where the difference
between the submission date of the story and the
publication date of the story is two years or more.
These stories need not necessarily be any more interesting than your average high-ranked ones on the front page.
Plenty of important stories go unnoticed, if they aren't picked up and submitted instantly after their inception, to various news outlets, with even bigger subscriber bases than HN.
Once a story is no longer fresh, it is highly rare that it retains its relevance, reader-interest or uniqueness.
If a story, despite not attracting much attention in it's 'first release', is periodically submitted by various people at various times, then it indicates timelessness.
However the fact that some very worthy stories get lost and never really see traction or virality, is in itself cause for constant fine-tuning of the way stories get weighted, ranked and thrust onto the first page(s) of not just HN, but just about every other similar news board out there.
Personally, I'd like to see a 'second chance' ticker that continuously scrolls the most feverishly upvoted AND downvoted non-front page, stories of the hour, occupying a sliver of real estate at the top of the page.
Whether on HN or Twitter or on the most trafficked general-interest blog out there, recycling older content should not be frowned upon.
A content recycling and re-purposing program should be part of a comprehensive publishing plan for any digital outlet that rapidly generates content.
It is the responsible thing to do.
Stellar content is missed by readers for a whole host of reasons. This happens even when that content is widely shared by friends or coworkers.
This brings us to the deconstruction of what stellar content really is.
Information ( and thus content ) is judged by not just the truth value or the interesting-ness of the insights therein, but also
* the timeliness of those insights and more importantly
* the perspective a fresh ( and a keen ) pair of eyes brings
to those very same insights
It is interesting to observe how in an age of countless distribution channels and dissemination models, a thoroughly flat world for information access of all kinds with very few old-world gatekeepers and in an age of roaring democratization of most content[1], we cannot escape the tyranny of the hive mind and groupthink. If anything, it seems to have gained fresh legs.
I don't know if this recycling, second-chancing and re-purposing of content is the perfect antidote to hive mind and groupthink, but it certainly is a step in the right direction.
[1] I say most content because if you are not an English speaker or if your content is exclusively in Saami (of the Uralic language family) or in any of the hundreds of languages with few bi-lingual speakers in those languages, your content and the profound insights contained therein - gleaned from the tradition of oral histories passed down from generation to generation - is mostly lost, at least for now. In that sense, the digital divide is still very much here.
Ryan Singel here from Contextly (my co-founder wrote the blog post).
I think you are absolutely right on with the insight that stellar content is missed by readers for many reasons.
We are part of a solution for publishers that want to have a re-purposing program. Some of that can and should be very editorial, but it can also be complemented or informed by a service like ours that works on a publisher's own domain.
Our definition of "evergreen" for the purposes of the study of the HN archive differs from the one we use for our publishing clients.
That said, I do think it would be interesting to see what stories continually get re-submitted, as that may well show off the most unchanging evergreen.
(Defined in that case as a story that continually has a fairly high value for a substantial number of people over a steady amount of time. Compare that to say David Sedari's SantaLand Diaries, which is also an "evergreen," but I would strongly suspect a highly seasonal time of interest.)
I am hoping to update this resource every month or so. If you have suggestions that you think might make the resource better I would love to hear them. I will try to include them in the next pass. Thanks!
I just skimmed thru the list and found dozens of very interesting stories that I missed, mainly because I haven't been with HN for that long. Looks like I have some catching up to do tonight :) thanks a lot!
Additionally, there's a flaw with your definition of evergreen: a large portion of the (YYYY) submissions are made in order to provide ironic juxtaposition with current events, which makes the submission meaningless outside of that context.