

Predicting HN upvotes from headlines with python - vikp
http://blog.dataquest.io/blog/predicting-upvotes/

======
yconst
"We could also take other data into account, like the user who submitted the
article, and generate features indicating things like the karma of the
user..."

I'm wondering how helpful could that be in prediction though? Would it
actually help if I wish to predict how many upvotes my headline would get, and
I add my karma as a feature? I think in fact such features would degrade
generalisation performance, as they stand in like placeholders (when
training), i.e. high karma users are correlated to higher probability for a
"hit story".

~~~
vikp
It depends on how much data you can include in the prediction phase. If you
train with karma earned right at the time of the article submission, and then
predict with the exact same methodology, it will work fine. If you put in a
"fake" karma value when predicting, then it will of course not work well.

