

An algorithmic approach to GitHub exploration - doppenhe
http://blog.algorithmia.com/private/90051922739/tumblr_n7rox0sDY81tqavzc

======
ptwobrussell
This post highlights that there are indeed some significant untapped
opportunities in mining GitHub user and repository data. As I was working on
the 2nd Edition of Mining the Social Web last year, I observed the very same
thing and introduced an entire chapter that models GitHub as a interest graph.
(Think: users are interested in projects and programming languages by
extension.) The IPython Notebook with all of the sample code is available with
all of the other source [1] but really just begins to scratch the surface with
some rudimentary centrality techniques. Like any other interest graph, the
possibilities are fairly endless.

[1] [http://nbviewer.ipython.org/github/ptwobrussell/Mining-
the-S...](http://nbviewer.ipython.org/github/ptwobrussell/Mining-the-Social-
Web-2nd-Edition/blob/master/ipynb/Chapter%207%20-%20Mining%20GitHub.ipynb)

~~~
doppenhe
absolutelly. Everything is up and running on Algorithmia inclduing some
graphing algorithms - would love to collaborate on such exploration if you are
interested. diego (at) algorithmia.com

~~~
ptwobrussell
I'll definitely reach out. It would be fun to jointly tackle a fun problem
together.

------
idunning
Tried with /JuliaLang/julia and got garbage results - my guess is that the
build instructions in the README dominate. Trying something like
/JuliaOpt/Optim.jl, which has a very on-topic README, faired slightly better
but still had some bizzare things like /sergiotapia/go-style-guide

~~~
doppenhe
Yeah some of the results are not great, a lot of READMEs are purely build info
or license - we tried to a degree to filter those out. Hopefully you at least
discovered one repo that was relevant and hadn't seen before - so much hidden
treasure in GitHub.

------
sitkack
I like the idea of your project, but it seems like the algorithmic database
version of wikipedia that you plan to profiteer off of?

Words like marketplace, crowdsourced, and open platform played well in 2005
but now they kinda smell like a scam.

------
andars
Another related site that attempts to do a similar thing:
[http://kare.progger.io](http://kare.progger.io)

~~~
doppenhe
thanks for sharing, I tried it out but i keep getting erros. Do you know the
owner of the project ?

~~~
arshsab
I am the owner of the project. What errors are you getting?

~~~
doppenhe
404 Not Found

The resource could not be found.

/search/bootstrap

~~~
arshsab
We can't determine which repo to give results for just by the name of the repo
because multiple owners could have the same repo name (i.e. bob/bootstrap and
jack/bootstrap). So we need both the owner and repo. This works:

[http://kare.progger.io/search/twbs/bootstrap](http://kare.progger.io/search/twbs/bootstrap)

------
hitlin37
Strangely, they didn't mention what kind of topic algorithm they are using. Is
it LDA based?

~~~
doppenhe
yes this one is LDA based.

~~~
hitlin37
I had the similar feeling.

