
Analyzing Hacker News Users’ Join Dates, Karma, and Profiles - kradic
http://breckyunits.com/?p=58
======
byrneseyeview
Linear fits? Why? _Why_?

Karma is a function of time since joining, participation, and quality of
contributions. And it starts at 1. 'Participation' can be determined by
looking at contributions per length of time. Quality is average score of each
submission -- separating it from participation is a useful way to extend this
to a more complicated model taking into account the fact that people stop
using HN. So the line of best fit should be something closer to 1 + t * q * p,
or the sum of 1 + (t0 * q * p0) + .... (t _n_ * q * p _n_ ) to describe folks
who are off-and-on contributors.

~~~
timr
If nothing else, it's legitimate to filter non-participants before doing a
regression analysis on the rest. It's _technically_ true that you can't
predict karma over time, but that's not really an interesting statement until
you eliminate the large number of people who sign up, then never post or
comment.

My instinct is that once you filter these people, you'll see a much stronger
linear relationship between time and karma, since karma isn't normalized by
the number of contributions, and number of contributions is probably a poisson
process.

~~~
breck
Removing all 1's and 2's improves the relationship somewhat (moreso with
log(k)) but still not a whole lot.

~~~
timr
Sounds like there's a vast gulf of people with little (but not zero)
contribution, then. Can you plot # of contributions versus membership time?

~~~
breck
don't have the contributions data, just the karma score.

------
DougBTX
Could you post "General Composition of the Dataset" with a logarithmic
vertical scale please? Should help compensate for the outliers so we can see
more detail at the bottom of the graph.

~~~
breck
Done.

------
edw519
"I didn’t really expect to find a whole lot of interesting things, and found
what I expected."

Which is a great way to conduct research! Nice work.

This reminded me of my senior project in number theory, when I manipulated a
large data set, wondering what I'd find. Eventually, I found quite a bit.

Also reminded me of this quote by Wernher von Braun:

"Basic research is what I am doing when I don't know what I'm doing."

------
xirium
You may want to take into account differences in file timestamps because the
data was collated over many days.

~~~
breck
Ahh, I didn't notice that. That would affect things. The timestamps and counts
are: ('04/15', 630), ('04/16', 1993), ('04/17', 1994), ('04/18', 491),
('04/20', 270), ('04/21', 1049), ('04/22', 59), ('04/24', 33), ('04/25', 342),
('04/26', 29), ('05/03', 165), ('05/06', 86), ('05/07', 23). So most of the
members have older join dates than I figured.

------
wallflower
The interesting thing about karma.. I find that I can't/I get tired of posting
insightful comments day after day..and take breaks and lurk..edw519 I don't
know how you do it. I probably won't make the "leaders" (but I don't think its
important)

~~~
thaumaturgy
| (but I don't think its important)

I think that's the crux of it. Somebody could monitor all the various news
sites and spend an hour a day here posting comments and stories and so forth,
but I suspect most folks would rather spend that time doing something else.

That said, edw519 is a pretty cool guy.

~~~
nostrademons
I tend to post in bursts, so most of my karma comes over periods of a week or
two when I'm posting several times a day. It's actually a bad sign, because it
means I'm not working on my startup. ;-) Then there are periods where I'll
post like twice a week and my RescueTime log'll show that I'm spending like
80-90% of my computer time coding. So yeah, it's a tradeoff, and ultimately
the code is more important, but I find that I burn out if I spend too much
coding.

------
pierrefar
Nice work. I love manipulating large data sets :)

Which program did you use to produce the plots?

~~~
breck
JMP

------
mooneater
Scatterplots are too dense!

------
omfut
Just curios, how does the karma point work?

