
Is HN changing ? Part 2 - jacquesm
http://jacquesmattheij.com/hn-long-term-change-2
======
niyazpk
It is so sad that _Startups_ category is getting less and less prominence over
time.

Is HN becoming reddit? Considering the quality of articles, I would not say
so. The high quality of discussions may have stayed the same too. At the same
time, the graphs suggest that the topics discussed here seem to have changed a
bit.

~~~
nailer
"Is HN becoming reddit? "

A post with a badly rhyming version of 'The Real Slim Shady' about single
founders (OK startup related, but not really insightful) was sent to the front
page a few days ago.

Right now there's an article from Boingboing about putting a gun in your
carry-on luggage so the TSA secure it (which is very much Reddit) and the
Haiti earthquake, and genocide of natives in America (general interest yes,
'Hacker News'? No).

I've gotten into the habit of flagging non-tech related posts in the hope that
will fix things.

~~~
pxlpshr
I do think there has been a noticeable change but I don't think it's entirely
bad. For example, a discussion around Haiti with a bunch of technologists on
HN - as oppose to on Reddit - could manifest a new challenge and opportunity
for another hacker that impacts the world in a meaningful way as oppose to
yet-another-photo-sharing-site.

I think it's pretty inevitable that as a site grows its core focus will become
diluted. I'm not entirely happy to see less startup/tech news either, and I
believe there are only two solutions for that: 1. growth with it and deal or
2. convert to a private community with limited invite-only (or application
based).

~~~
jacquesm
It's not that there is less of it, it's just that the attention is divided
over many other subjects.

The more people frequent HN the more submissions there will be on _any_
subject, startups included.

But those articles have to compete for attention with all the other ones.

------
tokenadult
The large "unspecified" category leaves me in the dark about whether this is
various stuff possibly well related to the HN guidelines

<http://ycombinator.com/newsguidelines.html>

or various stuff not well related to those guidelines at all. A trend within
that category probably does more to change the general flavor of HN than the
observed relative size changes of some of the specified categories.

~~~
jacquesm
The fourth installment will deal with that, the next one will be about the
effects of the increase in size over time.

------
swombat
Why is "blogs" a separate category? Aren't blogs about technology, startups,
etc?

~~~
jacquesm
That's the catchall for blog articles that were not identified as technology,
startups etc.

Probably a more in-depth analysis of the content of the pages linked to would
give more insight in to that.

------
tdoggette
_In case it wasn't clear yet, the 'units' for the axis are simply the rank
number of the postings, 1000 postings vertically, blocks of 1000 postings
horizontally. That makes the time-axis non-linear._

No, it wasn't. Why aren't these graphs labeled? They make no sense without
knowing what they are already.

~~~
jacquesm
The text is meant to illustrate what the graphs are all about, how much
clearer than that sentence could it be ?

~~~
tdoggette
It's not that the necessary information isn't communicated-- it is. The
problem is that there's a complicated graph that's the main content of the
page, and it lacks a title and axis labels.

~~~
jacquesm
I'm sorry, I really did my best to make it look as good as I could, I will
have to study open office a bit more.

------
chris123
If you don't change, you die. But I hear ya. It like when your favorite
neighborhood or neighborhood establishment gets "discovered" and it's
character changes forever. That's the nature of things, the good, bad, and
ugly.

------
marknutter
This highlights the fundamental problem of social bookmarking/voting sites:
the more people that join the site, the lower the common denominator of the
content gets. I've seen it happen to Digg and Reddit, and it is bound to
happen to Hacker News at some point. It makes me wish that there could be
groups formed within these websites that are kept small, and you would only
see stories that are voted on by those members. This would be different than
the sub-reddits solution because it would apply to all content, not just a
specific category.

------
yannis
Just a little less of mainstream would be nice! I don't mean things like what
is happening with Google and China right now but in general there is too much
mainstream.

~~~
jacquesm
Right now the original google & China posting is at 1087 points, the runner up
is the posting by dfranke about hacking HN:

<http://news.ycombinator.com/item?id=639976>

If anything that is a really good indication how big the karma inflation
really is.

Dfranke spent a lot of time and effort to come up with a brilliant hack, used
every trick in the book and was rightfully the top HN posting of all times.

A virtually unknown HN users posts a blog posting from googles official blog
and aces the previous #1 (and will probably go up a lot further still).

So much for the theory that submissions get voted up to 'what they're worth'.

~~~
ErrantX
_A virtually unknown HN users posts a blog posting from googles official blog
and aces the previous #1 (and will probably go up a lot further still). So
much for the theory that submissions get voted up to 'what they're worth'._

I'd say that is proof that who the poster is doesn't factor into how people
are voting on a threads worth. I'd call dfranke's thread an exception to that
rule.

(it depends on your feeling of worth; the HN hack was extremely impressive and
clearly, locally, worth lots of votes and kudos. On the other hand the Google
announcement is huge news for lots of the tech world :) )

~~~
jacquesm
Yes, but knowing about it was pretty much unavoidable.

It is about as mainstream as it gets. Not that I didn't get a few valuable
insights from the discussion here but still, HN is interesting because it has
the other stuff, not the mainstream stuff. (or maybe both?)

~~~
ErrantX
Both I think. Most of the other commentary elsewhere about Google.cn has been
mundane (or "OMG ROXXORS") :)

------
elblanco
I think perhaps a more interesting way of looking at this is to simply take a
category, plot all of the posts over the pseudo-time x-axis, then calculate a
least squares trend line for that category.

Categories that slope down on shrinking, and categories that slope up are
growing.

In the end, a graph isn't even necessary, just a list of categories ordered by
slope.

------
paraschopra
Hi James,

If you would like to automatically classify HN Posts into relevant category,
please feel free to use our ContextSense API -
<http://www.wingify.com/contextsense/>

PS: I'm not writing this to promote the product (because it is not actively
marketed in the first place)

~~~
jacquesm
It's Jacques, not James :)

Thanks, that's nice of you!

~~~
paraschopra
Sorry for mis-spelling your name!

~~~
jacquesm
No problem.

I've been working for a long time on a hobby project that may one day be a
little more serious called autotagger.

It's main function is to detect the subject of a text automatically given the
text, but the titles are too small for it to function reliably.

Do you have some kind of formula for your software that will give you
confidence based on the length of the sample text ?

~~~
paraschopra
Actually I use the text on the URL to generate semantic classifications. A
single title in itself may be too small produce meaningful classification.

------
gopher
I've left HN some time ago, when I came back here today, I've seen three PHP
stories on the front page. I'd say: yes, it changes. But I must credit one PHP
story was on Pharen, "A lispy language that compiles to PHP".

------
gojomo
Perhaps comparing the relative prevalence of words in headlines (or comments)
over time would give more insight.

~~~
jacquesm
There are two more steps in this series, the first is concentrating on the
increase in volume over time, in terms of users, postings and karma inflation,
the step after that is vocabulary analysis of the titles.

I'll put in a cut-off on noise words and one on obscure terms, the part in the
middle should give some idea about trends.

I'm surprised how much work it is to do all this, I figured I'd just slap it
together in a day or two but it has been a full week now.

So much for those 'weekend' projects...

------
joubert
Put legend on the axes. Use a logarithmic scale.

~~~
jacquesm
Logarithmic ? Why that ?

~~~
joubert
Because it is useful in comparing rates.

Here's a nice write up:
[http://www.health.state.pa.us/hpa/stats/techassist/arithlog....](http://www.health.state.pa.us/hpa/stats/techassist/arithlog.htm)

~~~
jacquesm
I know what a log chart is and how to make one, it's just that I fail to see
how making either a time axis or a per-mille axis logarithmic will clarify
things.

~~~
joubert
Can I download the dataset?

