
News.YC statistics - community cohesion - matstc
http://clipboarded.blogspot.com/2008/02/newsyc-statistics.html
======
Alex3917
It's not entirely clear to me that news.yc would be a better place without all
the submissions and comments that get only one or two upvotes. For example,
I'm still convinced that idiagram's wicked problem solving visualization was
one of the intellectually interesting (and beautiful) news.yc submissions
ever, even though I'm apparently the only one who voted for it:

<http://www.idiagram.com/CP/cpprocess.html>

Looking through my own submissions I see that the more intellectual stuff I've
posted, while it's done well, has never gotten as many votes as the pithy one-
liners.

~~~
whacked_new
Just looked (again) at that link. I recall this page and recalling myself not
getting far through it. On my revisit I still couldn't read it through
entirely. It looks like one of those handouts that presents an unspecific
problem, to which it throws a bunch of unspecific waypoints that lead to an
unspecific solution.

I suspect you see something that I didn't; how do you approach reading this?

Also, between reading and coming back, you have been downvoted once. I have no
idea what is downvote-worthy in your post... sometimes I just don't know what
I am missing -- or if there is anything to miss in the first place.

~~~
ratsbane
I've wondered about that too - what motivates people to vote something up or
down? Is voting up more a measure of interest, agreement, surprise,
appreciation...? Would you vote down because you don't agree with a comment or
because you think it's irrelevant, self-serving, unoriginal, or poorly-stated?
(I didn't see anything I thought worthy of downvote on the parent comment
either; I vote it up as thought-provoking even though I didn't quite get the
point of the thing he linked to.)

~~~
BrandonM
Correct. :)

No, seriously, all the reasons you present a perfectly legitimate reasons for
up/downvoting and are reasons that I have considered at various times.

~~~
jdueck
Is it even possible to vote something down? I don't see any down arrows. Maybe
I don't have enough karma?

~~~
BrandonM
You can't downvote until you have enough karma, and a comment cannot be
downvoted after it's been around for more than 24 hours. This is to prevent
someone from vindictively going through a person's "threads" and downvoting
all of their comments in an effort to kill their karma.

You can never downvote article submissions; they can only be upvoted.
Downvoting sumbmissions has merit in some cases (removing spam), but in
practice it has been used on other sites to kill everyone's submissions except
your own, rapidly leading to a downmodding war where the winner is whoever is
more committed to their karma score, instead of the one who is submitting the
best articles.

~~~
jdueck
Interesting, thanks!

------
ivankirigin
Interesting. I post lots of things that I find obscure and don't expect others
to upmod.

I think part of the reason that news.yc is interesting is that there are a
number of non-overlapping interest areas within a tight niche. Think about
that. Diversity within a narrow field. I like that.

------
Spyckie
Thanks for posting this interesting data set, but after thinking about it for
some time, I don't think this data offers any insights into yc.news. The
leader to user ratio is too high, which skews all the numbers so that nothing
really meaningful can be extracted. It's also hard to spot any trends from the
point system alone - user identity, time of post relative to other news, and
the controversy of the post content can play important factors in the post's
point value, and all of these don't necessarily have any correlation with the
user who posted it. I think in order to figure out something about the
community, someone will have to get their hands dirty and actually look at
post content.

~~~
cawel
Those are nice stats, and I do think we can get some interesting
characteristics about our community. I thought it is quite powerful to point
to paul, palish, sharpshoot, or pg, and say that they _are_ the community,
since they have the best points per post measure. They are nothing less than
the best representatives of our community (with regards to posts, not
comments).

I'd like to think that the factors you mentioned (traffic, time of the day,
controversy of the post) iron out statistically (since all posts are affected
by those, there is an argument to ignore their respective influence).

This initiative makes me wish it is only the beginning: I want some more
stats! For example:

\- It would be interesting to see what's the points per post (PPP) of the
leaders, compared to the PPP for most of the others (I agree with you that in
the dataset, the leader to user ratio is too high). Maybe one could have a
plotted distribution of PPP's across all users.

\- I know HN has 8000 uniques/day, what is the proportion of those that are
posting? e.g. how many of those users with less than 5 posts?

\- also interesting is the proportion of points between posts and comments,
per user. Different profiles: some people discuss a lot, some people post a
lot, as it was often repeated in discussions. But what's the proportion of
those 2 groups?

------
aston
Also, pg could probably run this on the complete data set, which might better
illustrate what's happening with the more prolific members of the community.

~~~
ratsbane
Nice analysis. Does the complete data include timestamps on up/down votes? The
ratio of up to down votes might show how controversial is a posting while the
rate of rise/fall might show... whether the posting was submitted at an
opportune time?

Or what about looking at voting habits? What's the average/min/max/etc up/down
voting ratio?

If you showed to what degree showing more statistics, particularly about new
postings or first-page postings would change behavior?

------
edw519
Fantastic! A digitally generated picture is worth 2^10 words!

You've certainly embodied the philosophy of Edward Tufte:

[http://www.amazon.com/s/ref=nb_ss_gw/102-8193666-1218533?url...](http://www.amazon.com/s/ref=nb_ss_gw/102-8193666-1218533?url=search-
alias%3Daps&field-keywords=%22edward+tufte%22&x=0&y=0)

There must be many cofactors that are hard to capture in an analysis like
this. There's the "binaryness" of making it to the front page. There's also
"time of day". A very interesting submission posted at a time when those who
would be interested are not online may never gain enough traction to make it
to the first page. A time-of-day analysis would certainly be interesting.

~~~
matstc
I'm not sure if you're mocking me or giving me too much credit here.. Really I
was hoping to be more rigorous and go more in depth but at some point it just
felt silly to hammer a website, scrape the page and never have access to the
whole data. Doing this got me thinking about issues like "time of day" though
as there must be a way to take into account the rate of new posts and make old
posts more 'sticky' when the rate accelerates, in the morning for instance.

I feel like starting my own social news website just to have a play with
different algorithms!

~~~
edw519
"I'm not sure if you're mocking me or giving me too much credit here."

Neither, I'm sure.

(Funny. lately when I get sarcastic here, people think I'm serious, and when
I'm serious, people think I'm joking. I guess I need a little more practice
writing to strangers.)

Scatter diagrams are one of the greatest underrated ways of conveying
information in a heartbeat. You've done a great job of it.

You've also opened up a can of worms. There are so many ways to look at this
data, you can keep yourself busy for quite a while. I suspect that won't be a
problem, considering how much you already care about this. Keep up the good
work and keep us posted.

~~~
matstc
Sarcasm is hard to convey in writing. We used to have smileys to help us but
they are going out of fashion. We're gonna have to stop being sarcastic :)

------
ldambra
This kind of topic worries me because from my experience when a community
starts worrying about "cohesion", "trolls" and like, this is usually the first
sign of it's fall down.

The real problem is what most of us would call a fall down today, that will be
called tommorow (and interpreted as) a rise up from other people (thus the
extreme difficulty to create efficient moderating systems based on
quantitative data). You know your community is irreversibly "corrupted" when
this kind of people becomes the majority, and this usually happen surprisingly
quickly with the faster growth of visitors.

For some reason I have some hope for HN because of PG being behind it, he
kinda knows what he wants and certainly knows what he doesn't want. Still,
this might be one of the thoughest challenge he'll be faced with if HN keeps
growing. I've known no community that escaped this syndrom yet.

------
manvsmachine
As a relatively newer member here, it would be interesting to know how these
numbers came about over time; i.e, did the more prolific members start out
with a bang, or did they gradually build up steam to reach where they are now?

------
aston
Cool.

I'd be interested in seeing similar analysis on a per-comment basis.

------
shafqat
Very cool. Was fun trying to find my username on your charts (although I wish
it could have been a bit higher up the points per post axis!).

------
DarrenStuart
nice little bit of hacking, one thing I would of done is left pg out, I think
he is messing your data up. Fans being fans his karma is off the chart.

~~~
matstc
Yeah his approval rating is ridiculous, above 13 points per post on his last
180 posts. But that's just one of many things messing up the statistics. It's
only good if you want to have a general idea.

------
npk
These plots are begging to be placed on a log-log axis!

