
Predicting the Popularity of GitHub Repositories - nextjj
https://arxiv.org/abs/1607.04342
======
itschekkers
Umm they use the first 6 months of data in the model. How is it surprising
that trajectory of stars in first 6 months predicts the rest of the
trajectory.....

------
BinaryIdiot
> when newcomers are not considered, there is a very strong correlation
> between predicted and real rankings

I'm having trouble parsing the meaning of this. It seems like their data can
predict the popularity but only on existing repositories in which they can
look at the first 6 months, calculate the popularity and see if it matches up.
Am I understanding that correctly? Seems...weird. Can this be used in any way
for new projects?

For instance my msngr.js[1] library initially had like 3 stars from friends
then it was posted to HN and within 2 weeks had about 150 then it slowly
climbed to what it is today over a year or so and lately it's kinda stalled.
I'm not sure their data would predict anything correctly about my repository
and I would imagine a large majority are more like mine where you might have a
handful of users and some pop when being on the front page of something like
HN but then traffic dies back down.

[1]
[https://github.com/KrisSiegel/msngr.js](https://github.com/KrisSiegel/msngr.js)

------
xapata
So... They found that the number of stars is autoregressive? What's the
punchline?

~~~
wyldfire
IIUC the utility is that github or someone could use the arrival of stars to
predict which repos are likely to attract more future stars. This could be
used for ranking up-and-coming new repos worth attention.

~~~
xapata
From what I remember of reading the paper yesterday, they simply estimated the
coefficient of various lag terms. That might help if you wanted to estimate
how many stars a project will have next week (because of some prediction
market?). For the task of predicting which projects will attract more stars,
the fitted model is largely unnecessary -- more means more.

