Looking back at 9 years of Hacker News (debarghyadas.com)
207 points by dd367 504 days ago | hide | past | web | 79 comments | favorite

The investment fidelity of this information is likely pretty high - not necessarily with this analysis ... but investment picks from topics popular on hn (ex: tesla, bitcoin, apple, amazon[ec2]) were ahead of the market.

Products, services, or companies repeatedly lauded in the comment section, in my experience, are remarkably indicative of future broader trends.

For instance, this user, in 2010, lamented about the rampant bitcoin discussions as excessively overflowing on hn like some irritating internet meme: https://news.ycombinator.com/item?id=1998630 ... at the time of posting, bitcoins were selling for $0.06 each. Would it have been a smart idea to buy 10,000 after reading that? Probably.

I can imagine an arb-style subscription to the right sql queries could be packaged and resold for extremely good profit to the right people.

The same signal would have also fired, much more strongly, from August 2013 through December 2013. The LPs of the VC firms who share your view of its predictive power are presently not very happy.

That's a super interesting thought. You should consider that the sum total of popularity of topics on HN up till today can't be used in hindsight as a predictor. It would be interesting to see if we merely looked for past spikes in keywords and used that to govern investment decisions. Even then, I fear that for every "bitcoin" and "apple", there may be other technologies and companies (especially smaller startups) that didn't work out so well, although I hypothesize a net positive.

Despite it being public data, because the information circulated on HN is at the core of technology, it could prove valuable to investors with limited knowledge of it (and might well be worth packaging and selling, haha).

I'd like to postulate that the average disposable income of an active hn user is probably, with respect to forums of the same class as hn (metafilter, reddit, digg, etc) one of the highest. (There's been historical self-reported polls eg. https://news.ycombinator.com/item?id=6464725 - 44% of respondents are in the top 10% income, ~25% are in the top 5% and ~4% are in the top 1%)

I'd also like to postulate that if you were to segment the market into "early adopters", hn would have a larger share of this segment then other forums in the same class, of an equivalent or greater volume of traffic.

If this postulation is correct, then effectively hn is "trendsetters with money" ... a good group to listen to.

I don't have data to back these claims up, but intuitively I feel they are pretty safe.

This of course doesn't give any indication of market velocity. I've done a number of investments based on HN at the wrong velocity - I presumed the stock had been undervalued because of hn content, when in fact, the market had YET to undervalue it. I forecasted a distant chance of success given an undervalued stock (in this case blackberry) - knowing that they were going to do an android with a physical keyboard <eventually>, and I invested upon this speculation --- well before the market doubted the future of the company.

As a result, I bought it way early and it fell precipitously and is only rebounding slightly now. So no, this isn't a magic sauce to time the events or how they will affect the market price, just perhaps one to forecast their eventuality.

See also my personal HN analyses, although they are atleast a year old but the overall trends are still unchanged.

Analyzing submissions: http://minimaxir.com/2014/02/hacking-hacker-news/

Analyzing comments: http://minimaxir.com/2014/10/hn-comments-about-comments/

More recently I made a few charts about upvote probability by time slot: https://news.ycombinator.com/item?id=9864254

The Pokemon story you show in an example (and you wrote and submitted) looked interesting so I looked it up. I recall now that I had started reading it but never got through more than the beginning, because I got totally sidetracked by Twitch plays Pokemon which you linked to in the very beginning of your article. I guess I get to revisit and read that, so thanks. ;)

Granted, the primary reason I had written that Pokemon article was because the response of tech media outlets was essentially "lol weirdos" when the mechanics are pretty interesting.

Link for the interested - I know I missed it the first time around: http://minimaxir.com/2014/02/glory-to-the-helix/

I've always been interested in seeing a statistic that shows how often the top comment is a negative comment that attempts to controvert the original story.

Didn't someone try to run stats on this recently? Maybe it was the post that announced the same HN dataset was on BigQuery? I recall that some people weren't convinced by the accuracy of the sentiment analysis of the top comments though. I'll see if I can find it.

Edit: I think this[1] is it: "Hacker News as a case study to test the wisdom of the crowd theory". Not quite what you were asking, but you might find it interesting.

1: https://news.ycombinator.com/item?id=10412465

Oh damn, super cool stuff. I wish I'd seen this before. Looks like I replicated a lot of your work, but yeah trends seem to have stayed the same.


My old Posterous blog is one of the top domains ranked by average upvotes. That says something about the time when I was a better and/or more prolific essayist... And something about walled gardens.

I still haven't found anything that made it so easy for a regular person to become a better essayist, in volume or quality, than Posterous and its auto email feature. Boy, how I miss it.

I'm a lazy programmer who would love to blog again, but needs something as easy as Posterous. Any suggestions, anyone?

FWIW raganwald, you're one of those who people should continue to pay attention to, even 140 characters at a time. I know I do. Please keep 'em coming!

Posthaven[1] (by the founders of Posterous), Svbtle[2] and Ghost[3] come to my mind.

[1] https://posthaven.com/

[2] https://svbtle.com/

[3] https://ghost.org/

I wrote that feature as part of an unrelated product in Rails over the course of a few days with the help of mailguns email-to-POST-request feature.

It's free for up to 200 emails per day. The most difficult parts were formatting issues between different HTML e-mail programs that you can probably ignore for your use case.

Edit: Oops, you said regular person. Didn't read carefully enough.

> As of 13th October, 2015, out of nearly 2 million Hacker News (1,959,809) submissions, merely 217 have managed to rake up over 1000 upvotes. That's about one out of every 2000 posts.

Math is hard. One out of every 9031 posts.

Math is hard.

On the graph of total posts over the days of the week, do you know what time and timezone are the peaks? it seems very regular, like if only one/a few timezones where concerned. Do we have such a little posting power in Europe ... ?

As someone from Europe: This looks exactly like the 15h UTC peak in most US websites and chats when all the Americans come on.

The peak starts at 12h UTC, is largest at 18h UTC, and goes down at midnight UTC – exactly what I’m used from US people in the chats I am,

and exactly 4am PST, 10am PST, and 4pm PST.

or 8am Eastern Time, 4pm Eastern Time, and 10pm Eastern Time.

Which is Silicon Valley Morning/Workday, East Coast Workday, and European Evening.

Same as reddit.

I charted the peak times here:


(Python notebook - renders well on desktop, but GitHub might not show a nice rendering if you try it on mobile)

I should've mentioned that all the times are in UTC. I'll work on normalizing them to PST - it's pretty confusing right now. Thanks for letting me know!

Please leave them in UTC if possible. I do realise from your data that the bulk of readership of HN is US based.

Interesting to see who some top usernames are. Also interesting how little I care who anyone who posts here actually is in real life. All about that post quality, gents.

Pretty interesting that the daily post volume has plateaued.

Personally, I'm glad the growth has been curbed. Too bad we can go back to the good ol' days.

It seems the trend with everything, people want to close the door behind them. They should have only allowed 16-bits of user ids on slashdot.

In a way reddit is the ultimate model. When the main room gets to big you can go and make a new room (subreddits), but still in the same house where everyone else is. Brilliant model in my opinion and one that I believe will be followed by successful future discussion board systems in years to come.

An in-between option is the http://lobste.rs model of having tags on stories, and letting users filter on tags. Allows me to ignore a few topics I just don't care about, without really splitting the community (some of the community opts-out of a few topics here and there, but it's by and large one community).

I've been meaning to do a content analysis for most popular animal among HN users, based on subject in headlines. My guess is something along this order:

1. Cats

2. Honeybees

3. Dolphins

4. Pythons

cat, fish, and man

I wonder where unicorn(s) would rank if included

grellas isn't mentioned on here? He writes some of the highest-quality posts on HN.

By "contributors", the linked post means article submissions rather than comments, and grellas doesn't submit a lot of articles.

I wrote an overview of the 20 users with most total karma points (submissions+comments) about two years ago, which he is on when you count that way. Maybe still interesting: http://www.kmjn.org/notes/hacker_news_posters.html

> With a runaway total of over 7000 posts on Hacker News, Clement Wan averages 2.24 posts a day since Hacker News took off (It's been 3,158 days since Feb 19, 2007). Two very mysterious users appear on this list.

Is this submissions and comments, or just subs, or just comments?

Not only that he stopped more than a year ago so the average was higher while his account was active.

It's just submissions - I think I should go over and make the wording less ambiguous.

OK, cool. EDIT: I forgot to say this is a cool submission!

There are people who submit about 5 items per day, so I'd be mildly interested to see how many people submit eg more than one article per day.

Running this query:

SELECT author, COUNT(1) AS c FROM [fh-bigquery:hackernews.stories] WHERE author IS NOT NULL GROUP BY 1 ORDER BY 2 DESC LIMIT 1000

and armed with the knowledge that HN has been in existence for 3158 days, there are 11 people who post strictly more than once a day. They are: 1 cwan 7077 2 shawndumas 6602 3 evo_9 5659 4 nickb 4322 5 iProject 4266 6 bootload 4212 7 edw519 3844 8 ColinWright 3766 9 nreece 3724 10 tokenadult 3659 11 Garbage 3538 Just under 1 a day: robg 3121

I don't have a problem... really; I don't need an intervention.

Have you checked your profile settings page recently? There's a cute option called "noprocrast" there. Much cheaper than a visit to a specialist. ;).

(seriously though: keep on submitting!)

"6 bootload 4212 28759 Peter Renshaw, British creative learning consultant and researcher"

A quick inspection of user id would have confirmed this. Should read:

6 bootload 4212 28759 PR Programmer, Melbourne, Australia

My bad, fixed.

thx @dd267, not bad by the way, this is an amazing post. Did this post by @minimaxir inspire this work? ~ http://minimaxir.com/2014/10/hn-comments-about-comments/

Not really :P I wish I'd seen it before. I only learnt about it when minimaxir commented on this thread.

I'd like to see 'Erlang' on the WordTrends graph, though the plateau of story volumes may mean we can void that eternal September failsafe.

The one time pg got super mad at me was when I triggered the second Erlang stampede. It was the evening of Demo Day by the time he saw the front page full of nothing but Erlang stories and he had to go through them on his phone and kill them all manually. He then searched to figure out who had started it and... mea culpa.

> To me, the most surprising entry was Kalzumeus, which I've never heard of.

'dd367, as you probably are aware by now, Kalzumeus is the company/blog of 'patio11.

Anyway, thanks for the great analysis! One thing that surprised me was the word "lisp" not appearing in "Most Commonly Upvoted Words" table.

In case anyone is interested, I broke down the posts on HN by TLD in a blog post a couple months ago: http://blog.park.io/articles/hacker-news-posts-by-domain-tld...

I have to disagree with the most upvoted contributors in the article. The #1 on here has over 200,000 karma points. https://news.ycombinator.com/leaders

I've worked with this dataset.

Since the dataset is derived from the official HN API, there is no tabulation for Comment Karma, which will result in misleading rankings if attempting to reverse-engineer overall karma.

He's ranking submitters and commenters, right? I almost never submit.

Two main themes of the top 100: Death of a respected person or shutdown of a popular company.

[edit: spelling]

Does anyone know who nickb is?

I don't know who he is, but he's not Paul Graham. The story behind that is that pg emailed me on April Fool's asking me to help him with a hoax to make it look like he was really nickb, who was the most prolific contributor on the site at the time. pg just manually changed the account name on a reply to make it look like he was accidentally replying under the wrong username, and my job was to submit a story looking like I had discovered this.

(This was also already publicly discussed somewhere on HN previously, albeit several years ago.)

It's now part of "ancHNt history" (nickb hasn't posted for 6+ years). I did recall some discussion about nickb = pg, but don't think I had seen the 'smoking gun'. Noting how Reddit was started, I'm neither surprised nor concerned if there was such an account in the early days (either by pg or by the yc partners).

[1] https://news.ycombinator.com/threads?id=nickb

[2] Smoking gun? https://news.ycombinator.com/item?id=151461

[Edit] Dammit. Now I've read Alex3917's response, I wish I'd done the accurate calculations on that "Smoking Gun" link to note it was posted on 1 April, 2008.

If Paul Graham is involved with that account, he's not the only person; there are 'nickb posts that don't sound at all like things PG would write.

Colin Perceival




9 years ago, using tables for layout was already considered a bad practice. Yet here we are...

9 years ago, separation of presentation and content was already considered a good practice. Yet here we are with application frameworks and component-based designs that throw it all out the window...

To expect any consistent design principles on a development medium as ad-hoc and devoid of principles as the web, is wishful thinking.

Things evolve. Something being considered a good practice is not a necessary a good practice when we learn more information.

HN isn't about using good practices. It's about getting to the heart of the matter. Content is content, who cares how it's displayed, for better or worse. But people keep showing up. So it must be working just fine. If it ain't broke, don't fix it.

Can you imagine how hard it was to write a browser addon that allows collapsing a specific subthread of comments?

I have to calculate the indent of a comment based on the width of an image in the table layout!

Instead of nested tags for comments like

these people have all comments on main level, just indented with image width!

    <tr><td><image src="" width="40"></td><td>Text</td></tr>
    <tr><td><image src="" width="60"></td><td>Text</td></tr>
    <tr><td><image src="" width="60"></td><td>Text</td></tr>
    <tr><td><image src="" width="20"></td><td>Text</td></tr>

In other words, it's very easy? I've made the equivalent of a browser addon for HN before, the only source of difficulty was when the markup changed.

In other words, it’s quite annoying. Especially when you want to do nested styling.

But it is broke, and needs fixed. The site is nigh unusable on a mobile browser, for instance.

Agreed. I've been using this on my phone: http://hackerwebapp.com/

Works pretty well.

Thanks for this.

I'm using the web version and it is so much better than the mobile changes that have just been rolled out.

As it stood my reading of HN was about to decrease exponentially with this new flat comment style. https://news.ycombinator.com/item?id=10531710

But I'll be using http://cheeaun.github.io/hackerweb/#/ now.

If it would just be possible to comment...

What is broken?

I use mobile devices for 99% of my Hacker News activity.

Here, that’s what I see: http://i.imgur.com/cwx6Wyk.png

And sometimes it just doesn’t limit the size at all, and I can scroll 5 screen widths for every single sentence back and forth.

Without that broken upvote button I couldn't even tell who's replying to who. http://i.imgur.com/yHki0zO.png

Yeah, the table layout definitely sucks. Doesn’t work in many browsers, horrible for writing addons...

It's hard to maintain or add new features if good design/development practices aren't followed.

You sound like my managers.

I'm not saying it's the right thing. I'm just saying this is the reasoning I hear for this kind of thing most often.

One could say that about pretty much any website.

