
Hacker News Data Analysis - robertjmoore
http://blog.rjmetrics.com/surprising-hacker-news-data-analysis/
======
edw519
#1 Lesson from all of this: Instead of talking about your product to your
prospect, talk about something your prospect cares deeply about to your
prospect.

I had no idea what you did and didn't really care until you used it in context
of something I did care about: Hacker News. Now I know what you do, understand
how it applies to me, and best of all, I'm starting to visualize how else I
could use it.

We should all approach our prospects like you just did here. Nice job!

~~~
cs702
I would be curious to see how 'meta-submissions' (i.e., posts about Hacker
News itself, about Y-Combinator, and about YC-backed startups) rank in the
charts versus the five categories shown in the rjmetrics blog post. (The blog
post, which made it to the top of the HN front page, is itself a meta-
submission!)

~~~
nealabq
The HN Community is fascinated by the outside view of HN and eagerly gazes at
the held-up mirror. One of our shared interests is emerging online social
structure, of which we're an example.

We're also attracted to self referential expression, such as HN comments about
HN self-organization, HN self-policing, and HN self-referential comments.

~~~
lbearl
Which really could just indicate that we all have an interest in HN being a
good resource for us to read.

~~~
pattern
I enjoy finding out what I like, and exploring _why_ I like it. I posit that
this is an inherent property of appreciation which allows one get more out of
something.

The "meta-HN" stuff is an incarnation of this, and shows that HN is
appreciated by a critical mass of people. This is a huge boon to the community
as it leads to things such as people using Markdown syntax for clarity, proper
spelling/punctuation, well thought-out comments, and often (but not always) a
positive and constructive viewpoint aimed at advancing the discussion.

------
pg
Actually the reason his posts stopped making it to the frontpage is that the
last 3 before this all set off the voting ring detector.

I don't know how accurate his other conclusions are, but it seems unlikely
that new signups are down, considering the trend in traffic:
<http://www.archub.org/hntraffic-17oct12.png>

~~~
jcr
I've never before gotten to see data like this from a popular website, so my
questions might be a bit ignorant or naive.

Do popular sites normally have the big (daily?/weekly?) swings seen in the HN
traffic data?

Have you done any research correlating the high/low days versus the
submissions present on those days? (i.e. are the swings content driven?)

~~~
msellout
Most websites are busier during the week, while we are all procrastinating at
work.

------
fusiongyro
Another possibility: people have tired of your formula. Andrey Karpov used to
submit blog posts with the results of running his fancy commercial static
analyzer on various open source code to Reddit. The first several got a lot of
upvotes; a while later it became clear that it was mostly hocking a product.
The more your blog comes to resemble an infomercial the less you can expect to
be on the front page.

------
Alex3917
"If anyone out there suspected that the 'old guard' had given up on HN, this
chart proves them wrong."

Of the people here since the first year, probably only 25% still participate
regularly. Occasionally I'll stumble across some discussion from the early
years in Google, and it's crazy how different the site was back then. There
are still good comments now, but back then there were entire conversations
that were good. I don't even bother to write the kind of comments that I used
to, because they wouldn't work at all on the site as it is today.

~~~
tokenadult
Alex, based on your kind reply to DanBC, you are tired of people calling you
out for relying on Google University for your knowledge on controversial
subjects. That may be tiresome, but it may also be good for the overall level
of factual discussion here.

My general observation of what excites people on Hacker News is that negative
metathreads get more upvotes, by two orders of magnitude, than positive
metathreads. I would love to see more replies to the old thread "Ask HN: What
do you like about the Hacker News community?"

<http://news.ycombinator.com/item?id=4399678>

from 60 days ago, if people are so inclined. I posted that soon after a
metathread that complained about comments that were insufficiently kind and
affirming, from someone who has asked advice about external website designs in
days past. I figure if I ask for advice about a website, people are very well
going to give me advice, and I might as well man up and take the advice. But,
yeah, one thing I like about HN is that people look things up in good-quality
offline sources in many instances, and ask other participants here to check
their facts. And there are other good features of the community here that help
me learn and develop in my work, in my community citizenship, and in my family
life.

~~~
Alex3917
"you are tired of people calling you out for relying on Google University for
your knowledge on controversial subjects"

I think it's funny how the more books I read, the more 'controversial' my
knowledge becomes. I realize this is about as self-serving an argument as
possible, but honestly I don't think I'm inherently interested in
controversial topics, I think it's just that the more you know about
something, the more wrong you're going to sound to the average person.

To give an example that most people on HN will agree with, why do you think
the general public thinks eVoting is completely secure, while CS folks are
generally horrified by the products that are actually used in elections? The
same phenomenon applies to virtually every other area of life. Doesn't matter
whether you're talking about medicine, agriculture, religion, climate change,
education, etc. The more you know, the more wrong you're going to sound. (C.f.
the presidential debates.)

Also, I'm usually fairly anti-Wikipedia these days. That's why I generally
avoid linking there in the first place, though also to encourage people to
actually read quality books or academic research. Wikipedia is often useful
for finding primary sources, but

A) it's rarely comprehensive. Usually they just link to one or two primary
sources at random, rather than the best ones or all of them.

B) it generally does a poor job at properly characterizing the arguments for
or against something. Usually you just get 'some people believe this, other
people believe that' without any indication that the case for one side might
be much stronger than the case for another side.

C) It also tends to be out of date at any given time. Often not seriously so,
but if you look up any of the statistics that the CDC or BLS publish on an
annual basis then more often than not you're getting last year's data.

D) I agree with Jaron Lanier's point of view in his Digital Maoism essay.
Specifically I think knowledge is inherently tied together with authorship,
and that an article of 'facts' without a voice is just a "faux-authoritative,
anti-contextual brew."

Granted Wikipedia has a lot of advantages, but I think it's a poor substitute
for reading actual books. And I especially think that it's generally a dick
move to pretend that you're an expert on something after having only read the
Wikipedia article, except for in certain niche areas where there is no
authoritative source.

And again, I'm not trying to claim that I know everything by any stretch of
the imagination, just trying to explain why some of my comments may seem
'controversial'.

------
jgrahamc
If you look at my submission history of my blog then I think it's clear that
HN likes things that are original and/or well thought out. My weaker blog
posts go nowhere, but ones that are detailed make it. So, if there's a formula
for appearing on HN, it's write something original and/or deep.

~~~
udp
And then hope some people see it before it gets pushed from the "new" page,
after which it doesn't matter how original or deep it is.

~~~
sageikosa
Which can happen _very_ quickly. Sometimes I am not certain whether a posting
just isn't interesting, or the wind is so strong the voice gets lost in the
roar.

------
duck
Very useful analysis. After running Hacker Newsletter for the past 2+ years I
have seen basically this. However, the analysis seems to miss looking at
things on a smaller scale like the day and time you post it which has proven
to be a big factor [1]. I know even on a weekly basis (which is what I do for
the newsletter), it seems some weeks have an abundance of high quality
articles compared to others.

[1]: <http://news.ycombinator.com/item?id=3251877>

------
fecak
I do think that the day/time an article was posted and also who posted are
fairly large contributors to being on the front page. I've written a few
articles that have made the front page this year.

In at least two instances, I posted the article myself with no upvotes. Then
another HN user reposted my articles a few days later (my blog is republished
by a couple tech sites), and the same exact content makes the front page. Same
article content, same title, just posted by someone else and linking to the
mirrored site.

Good post Robert. If you're looking for help growing the RJM team, look me up.

------
willvarfar
I once worked out there were 100:1 visitors to voters for a link.

Most of the people I know who peruse HN regularly are _not_ registered users.
They are happy to let others do the commenting (which they read).

[http://williamedwardscoder.tumblr.com/post/18839832580/reddi...](http://williamedwardscoder.tumblr.com/post/18839832580/reddit-
vs-hacker-news-vs-twitter)

It was super-surprising to see my own blog getting an average of 55pts on HN;
I hadn't wondered about that before.

~~~
ClifReeder
Certainly fits in line with the 1% rule
<http://en.wikipedia.org/wiki/1%25_rule_(Internet_culture)>

------
waterlesscloud
I suspect the NYT/WSJ gap is more a result of WSJ's much more restrictive
paywall.

~~~
hollerith
Yeah. I don't understand how the OP came to believe that the average quality
of WSJ articles is drastically lower than that of NYT articles unless he never
reads the WSJ or the NYT.

------
tokenadult
"Interestingly, if you look at the number of upvotes cast each day, the trend
is similar. For the past two years, the same number of stories have been
competing for about the same number of votes each day." This statement, backed
up by the analysis in the submitted blog post, is interesting. I visit the new
page

<http://news.ycombinator.com/newest>

as many times per day as I visit the front page, looking for good new
submissions to upvote. The limit on the number of users who cast upvotes on
new stories appears now to set a limit on the number of new stories that have
been submitted in the last two years. As the blog author points out, if HN
largely stays on topic, there are only so many new stories each day that fit
HN's topic.

~~~
sesqu
I found it very interesting that the contributing userbase has gone up but the
vote count has not. I've certainly noticed the userbase expanding, but would
have guessed that voting followed.

------
asdf333
Fascinating. However, one must be careful about jumping to conclusions from
analysis like this. I see a few items where the author that might have come to
the wrong conclusion.

\- New user growth. I don't think its b/c a 'saturation point' has been hit
for the HN community as the article hypothesizes. There was a period in the
last few years where there was an conscious choice by HN to restrict user
growth in order to maintain a higher signal to noise ratio. Newbies are now
marked with green and there is no register link on the homepage. for a while
there wasn't a way for new users to sign up.

\- The NYT more favored compared to the WSJ? most likely not due to the
quality of the writing but b/c WSJ articles are not available to non-
subscribers by default.

------
nanijoe
Granted, it is natural to want people to hear what you have to say, but I did
not think the reason for posting on HN was so you could try to make it to the
front page. The blog post could have been titled "How I'm trying to get my
submissions to the front page of HN".

------
larsberg
My takeaway --- from the fact that Matt Might's domain is second only to pg's
--- is that you should write up easy to understand lecture notes on deep PL-
related topics.

~~~
Thrymr
The fact that this story went straight to the top suggests that a meta-post
about HN is the way to go.

~~~
pygy_
Data-based posts about how to get to the front page, sure.

But otherwise, I'm not sure it really pays off.

------
dmansen
My interpretation of how this one shot right to the top: Hacker News loves
posts about itself. :)

Nice analysis - the user engagement stats were very different from what I was
expecting (I think I would have agreed with Jake before I saw the data).

------
mjn
The retention rate actually seems relatively low as an absolute percentage,
though the way it plateaus is interesting. I did an analysis of the retention
of the oldest Slashdot users
(<http://www.kmjn.org/notes/early_slashdot_users.html>), and it was much
higher: about 70% after 2 years, rather than 30%. Took about _10 years_ to
drop to 30%. Granted, that's for the earliest users, so retention rates are
probably (much?) lower among later signups.

------
sputknick
you say the two possible reasons you are not making the front page are: your
content is weak, or people's taste's have changed. The fact that the number of
submissions has not changed suggest to me a third and more plausible option:
The quality of submissions, and therefore the competition for the "front page"
has increased.

~~~
user24
Doesn't "My content is weak" contain the implicit qualifier "compared to the
average within the domain we're discussing"?

~~~
chris_wot
No. You can have a domain of discourse where ALL content is weak. In fact, you
can have content in this domain that is stronger than most of the content, but
it might still be weak.

------
narag
Maybe I'm understanding it wrong. But the data seems to be saying that HN has
succeeded defeating the eternal september effect. That'd be big news!

------
kunle
> Also interesting is the enormous gap between the New York Times, whose
> content tops this list, and the Wall Street Journal, whose content performs
> among the worst.

I think this might actually be more related to the WSJ paywall. If you dont
have a subscription, you can't view many WSJ articles, whereas the reverse is
true for the NYT.

On an unrelated note - I wonder how the category of HN related posts do,
relative to other (basically same analysis of the "Pinterest" category).
Judging by the success of this post, I suspect HN + Data are a good mix. Are
posts about "Data" just as successful?

------
Adrock
I wish that he had included the stats for titles containing the words "Hacker
News".

------
dfc
_"I chose to categorize content by the mention of things like big companies
(i.e., Amazon, Google), Hot Startups (i.e. Pinterest, Instagram),
Sensationalism (i.e. Best, Worst, First), Programming Languages (everything I
could think of), and Profanity (which was fun)."_

What happens to stories that use sensationalism and profanity? Or
sensationalism and a new startup?

~~~
timmclean
I assume those stories would be included in both averages?

~~~
001sky
The non-exclusive sets may overlap with interesting results, if there is a
multiplicative effect. That's a hypothesis worth testing, perhaps through a
quick stratification of the data. There is also an issue about multi-
colinearity. For example, if one were to consider modifiers vs nouns. Is it
the specific noun that is of interest? Or the subset of that specific nouns,
delineated by the modifier? what about the modifier generally, applied to a
general noun? etc. Are there specific combinations that are significantly
different from the average? etc.

------
ewest
This is an interesting analysis yet the information can be derived from using
your site's analytics and your observational skills to come to the author's
conclusion.

It's like a painting - the subject matter is important, yet the stuff around
the main subject is what makes it stand out.

Analyze what your stats _don't_ have, or seem to have 'less of', as compared
to other content.

I think the data analysis could have been more interesting to a broader
audience by making it more 'newsworthy' rather than a raw analysis targeted at
a relatively small community (compared to a more general audience).

By 'newsworthy' I mean something along the lines of 'NYTimes, WSJ used by
technical users too' - or something like that - or something like - 'Hackers
in controversy - observers and participants'.

------
deltaqueue
I think the basis for evaluating the quality of a community lies in the
discourse and communication. Submissions are a part of that, but the
discussion that follows (i.e. comments) is the most important indicator of
change. Personally, there seems to be an influx of reddit-style comments
(little substance, meme-oriented) this year, but that could be a general
evolution of the English language given the heavy influence of the internet.

That said, evaluating change in the number of comments along with comment
upvotes vs. sentiment analysis seems like the only logical way to demonstrate
any sort of quality meta analysis. I'm not really versed in qualitative
research, so here's my ASK HN: is this even possible?

------
rickdale
I think what you are doing is challenging in the sense that you have made your
goal to write a post that will go viral on HN. Remember, every story here,
pretty much, is content from somewhere else. You are right that you aren't
hitting your audience, but your audience isn't HN, its those reading your
blog. If someone in your audience is also on HN then maybe they will find it
relevant to post.

Writing to be a big story on HN is like betting a number in roulette. You had
beginners luck at first, now its time to find a new game...

------
drpgq
Is it really surprising that Hacker News doesn't care about Pinterest?

~~~
michaelty
It's not a YC startup and it's wildly successful.

------
rickyconnolly
I've noticed that some submissions drop off the news feed like a rock, while
other submissions of the same story posted just a few hours later can gather
considerable discussion, with submission time being the only apparent
variable.

This leads me to speculate that there may be an optimal submission time or
times throughout the day. I'd like to see analytics that look at the variation
in the average number of comments/upvotes for submissions (or some other
metric) to see if this theory holds any weight.

~~~
henrik_w
There is a tool that attempts to predict when it is a good time to submit
something to HN, <http://hnpickup.appspot.com/>

It's been discussed several times on HN, for example here:
<http://news.ycombinator.com/item?id=4058492> and (original submission I
think): <https://news.ycombinator.com/item?id=3251877>

------
Camillo
Just a heads-up: your site works really poorly on mobile. The text column is
too narrow, while the charts are too big, and their interactive features make
it hard to scroll the page. They also don't work right (touching a chart seems
to mess up the y axis labels), but the impediment to scrolling is more
annoying. I only ever read HN on my iPhone, so this is an upvote you're not
getting simply because of technical problems with your website.

------
capkutay
These are excellent visualizations, I'm glad they put this together, showed it
to the hn community while also demonstrating one of rj metrics use cases.

A note about the product. How do they differentiate themselves from other DW
analytics companies like datameer? (<http://www.datameer.com/>) I can tell
they specialize in e-commerce, but couldn't any DW analytics service give you
that AND more?

~~~
Smcavinney
I work at RJMetrics, thanks for the feedback. While we specialize in
e-commerce, we have many clients that are outside of that industry. As
robertjmoore demoed in the post, we can take almost any data you can throw at
us, and help you find actionable insights in easy to understand charts.

------
javajosh
I have to shake my head in admiration. What a powerful story - to start with
not one failure, but three failures, and then to use the _same tool you were
trying to hock_ in those failures to figure out why you failed...and then,
remarkably (at least for me) succeed wildly.

At least in this case, your tool provided some very valuable insight.

------
andrewkkirk
Thanks to these metrics, I've cracked the HN code:

We should always publish our content on paulgraham.com

That's the takeaway of these metrics, right?

------
pi18n
This looks cool and now I want to mess around with it. I wish there was a
torrent for that dataset.

------
sgdesign
Wow, I'm number 10! I don't know if I should be happy that people like my
stuff, or scared that I've spent so much time submitting and commenting on
Hacker News this year...

------
nwienert
As someone who'se mildly colorblind, a few of your charts were near impossible
to read. Especially the bottom three lines in Average Score by Category. Just
a heads up.

------
DanBC
Does this article correct for increased thresholds to perform some actions?
The down-vote used to be easier to get, for example.

------
apeace
Just a suggestion, he should compare MongoDB and Riak on Hacker News. For
laughs

