

Counting Stars on GitHub - Adrock
http://adereth.github.io/blog/2013/12/23/counting-stars-on-github/

======
prezjordan
I have a strange relationship with GitHub stars. On one hand, I love them -
they give my projects exposure (I always mention cleaver [0] during my
interviews) and let me know how many people are "using" my project.

On the other hand, I feel it sort of pushes me in the wrong direction. Stars
aren't a good indicator of use, or even importance. For instance, cleaver is
not technically challenging, yet it shares the same number of stars as numpy
[1].

So, not sure how I feel about them - not sure what they measure.

[0]: [https://github.com/jdan/cleaver](https://github.com/jdan/cleaver)

[1]: [https://github.com/numpy/numpy](https://github.com/numpy/numpy)

~~~
sheetjs
To an extent, projects that lean more on github tend to receive more github
stars.

I don't know when numpy switched over to github, but their website still uses
SF to distribute the official binaries. On the other hand, many projects
initially started as github repos and naturally draw people that way. As an
example, consider d3, where the main page rests on d3js.org but most of the
links point to the github project wiki.

------
misterdai
Interesting post, it's worth playing around with the GitHub API to see what
you can uncover. Watch out with basing information off their search API,
sometimes it seems to miss users or repositories from being indexed, screwing
up your results. Caused me a few headaches recently as I've a little site
built on daily statistics collected from GitHub. For example: Mojomdo vanished
from their user results making Linus Torvalds most followed user, fixed on
GitHub now but still evident on my site (boo).

Here's my site for anyone curious:
[http://hubreports.yougeezer.co.uk](http://hubreports.yougeezer.co.uk)

------
Cthulhu_
Well, his interpretation of the numbers is rather off, I'm afraid. Mostly due
to the language categoriziation; Bootstrap, Node, Zurb Foundation and Adobe
Brackets aren't Javascript projects, for example.

~~~
Adrock
OP here. I agree that the language categorization is questionable. It's all
based on what the GitHub Search API returns as the language.

Looking at Bootstrap, the classification doesn't seem to be strictly based on
percentage:

\- CSS 59.8% \- JavaScript 39.7% \- Python 0.5%

...but GitHub Search says JavaScript. I do see that if I do a search query
with a language filter, I get different results.

~~~
misterdai
GitHub uses its Linguist project to detect languages and index them as far as
I know. I've used the list of languages from this file:
[https://github.com/github/linguist/blob/master/lib/linguist/...](https://github.com/github/linguist/blob/master/lib/linguist/languages.yml)

I did pick an older version as they don't always run the latest release. Handy
as you can filter out items flagged as "markup". Plus, have to watch out for
languages like Mirah that are oddly indexed (same repo count as Ruby).

------
hashtree
As to the language rankings, some are at a large disadvantage when their
standard libs are outstanding. A prime example being Julia
([http://docs.julialang.org/en/latest/stdlib/](http://docs.julialang.org/en/latest/stdlib/)).
The same goes for smaller languages like Roy
([https://github.com/puffnfresh/roy](https://github.com/puffnfresh/roy)) since
GitHub doesn't detect them.

------
kevinbowman
I like that the visualisations are done with D3, the 5th-highest project on
the list.

~~~
Adrock
D3 is awesome. It was really nice when working on the post to just be able to
re-run the queries to generate new CSV or JSON and not have to go through a
whole separate visualization generation pipeline.

------
pedalpete
As a few have mentioned, the numbers are flawed, but I see them as flawed
differently than others have mentioned.

The problem with 'language popularity' is that we looking probably 90% at web-
projects being based on Github, so within those projects, CSS and Javascript
are pretty much a given, as they are the default client side-libraries. You
can't build a website without CSS and rarely do without Javascript (on the
front end).

------
chimeracoder
This is an interesting idea, but it suffers from one maor problem: Github's
language detection is rather inflexible, and sometimes misleading.

For example: I have a number of small Go projects that are incorrectly
categorized as CSS or Javascript. One such example is a project explicitly
designed to be a Go webserver that you can just "go get" (or git clone) and
run immediately, with no extra setup, sort of like Twitter Bootstrap, but for
the backend: [https://github.com/ChimeraCoder/go-server-
bootstrap](https://github.com/ChimeraCoder/go-server-bootstrap)

Because it self-hosts the Bootstrap files, and because the lines of Go code
required are relatively small, less than 5% of this project is detected as
being written in Go, even though I didn't write a single line of Javascript or
CSS myself for the entire project.

(Yes, I know I could pull in the Bootstrap, etc. from a CDN, but this is just
to illustrate the limitations of an automatic language detection model that
the developer has zero control over).

~~~
thedaniel
> an automatic language detection model that the developer has zero control
> over

Pull requests welcome :)

[https://github.com/github/linguist](https://github.com/github/linguist)

~~~
chimeracoder
I've seen that, and this is the corresponding issue:
[https://help.github.com/articles/my-repository-is-marked-
as-...](https://help.github.com/articles/my-repository-is-marked-as-the-wrong-
language).

Looking at vendor.yml, I'm actually not sure why this is happening. The
repository I linked to above only has two files that end in _.js: bootstrap.js
and bootstrap.min.js[0], and only one that ends in_.css (bootstrap.css).
Unless I'm misreading the regex in vendor.yml, it seems like they should be
excluded[1] but they aren't, and this isn't the only project of mine for which
this is happening.

I'm fairly certain it's picking up those files and not miscategorizing the Go
files for some reason, because when I delete those files and instead serve the
CSS/Javascript from a public cache, it categorizes the repository correctly,
as is the case with this project (based off of go-server-bootstrap):
[https://github.com/ChimeraCoder/pluto](https://github.com/ChimeraCoder/pluto)

[0] [https://github.com/ChimeraCoder/go-server-
bootstrap/tree/mas...](https://github.com/ChimeraCoder/go-server-
bootstrap/tree/master/public/static/bootstrap/js)

[1][https://github.com/github/linguist/blob/master/lib/linguist/...](https://github.com/github/linguist/blob/master/lib/linguist/vendor.yml#L34)

~~~
coyotebush
So the jQuery regex, for comparison, is

    
    
      (^|/)jquery([^.]*)(\.min)?\.js$
    

while the Bootstrap regex in question is

    
    
      (^|/)bootstrap([^.]*)(\.min)\.(js|css)$
    

with the ".min" not optional. (Looks like this was the intent [1]).

The 68.9%/26.5% percentages match the 130k/50k ratio of bootstrap.css to
bootstrap.js, so it looks like only bootstrap.min.js is being excluded?

[1]
[https://github.com/github/linguist/commit/d5002ef06a9f346391...](https://github.com/github/linguist/commit/d5002ef06a9f3463919a3fe8919844c3aea46fc0)

~~~
chimeracoder
Thanks for pointing that out. You're right - it does seem like it was
intentional. I filed an issue for this though just in case, and perhaps
they'll explain the reason:
[https://github.com/github/linguist/pull/856](https://github.com/github/linguist/pull/856)

------
EGreg
Said no more counting dollars, we'll be counting stars....

~~~
Adrock
OP here. I was waiting for that :P

~~~
EGreg
Glad to oblige :)

EDIT: I just realized that the entire song can be about switching to open
source development on github.

[http://youtu.be/hT_nvWreIhg](http://youtu.be/hT_nvWreIhg)

A young, but not that bold, startup founder is losing sleep, dreaming about
the things that they can be. He doesnt think the world is sold, he's just
doing what they're told by VCs. All his risky career decisions make him feel
alive. And sure enough like many other startups his hits the deadpool. Take
that money watch it burn, sink in the river, the lessons are learned. So he
moves to release the tech as open source software and starts counting the
stars on github instead of dollars...

LOL

------
danso
I'm really amazed that at how Github is such a vital service to developers
everywhere, yet there hasn't been a huge amount of alternative ways to dig for
hidden and popular repos across different metrics. Even the metric the OP
uses, just pure number of stars, has a lot of interesting things in it, as the
OP shows. Maybe it's a reflection of how well-designed Github's site-explore
is that people haven't yet reason to augment it.

When I say that the even pure star count has a lot of interesting things in
it...I mean even in the simplest sense. I know or have used all of those repos
with the exception of jQuery-File-Upload and adobe/brackets, but I can be a
real dilettante. But in the top 100, there's plenty that I've never heard of.
Even within plain Ruby-land, I'm always running into gems that have been
around for a few years or are highly starred (Spring and Middleman among them,
in the last couple of weeks).

What I'd like to see is even further categorization of the gems...something
similar to the Ruby Toolbox ([https://www.ruby-
toolbox.com/categories/Active_Record_DB_Ada...](https://www.ruby-
toolbox.com/categories/Active_Record_DB_Adapters))...where you can find repos
based on use-case, not just language. I do like Toolbox's emphasis on release
dates and latest updates, as well as pure total download numbers. I've found
that number of contributors, rate of pull requests/issues dealt with, and rate
of recent updates, is as useful a metric as star count.

