
Popular languages on GitHub ranked by number of bytes stored - sant0sk1
http://gist.github.com/118810
======
sant0sk1
Love stuff like this. Wonder if C is _that_ popular or if its verbosity is
skewing the results?

~~~
evdawg
Yes, its verbosity _is_ skewing the results. Same with PHP, C++, and Java. We
can see that Ruby is the most popular language on Github here
<http://github.com/languages>, with 28% of all projects being in Ruby. C, PHP,
Java, and C++ rank 6%, 5%, 4%, and 4% respectively.

~~~
WilliamLP
How do you know it's verbosity skewing results, and not a preference for
certain languages depending on the size of the project?

~~~
silentbicycle
Good point.

Also, I wonder how those would be weighted if you counted projects that
consist of >90% identical code as the same project (for example, how many of
those Ruby projects are thinly veiled forks of Rails?), filtered out projects
that are less than (say) 10k of source, etc. Maybe more Python projects are on
bitbucket because of Merucial.

The 90% and 10k there are pretty arbitrary; just, those language stats need a
lot more clarification before their meaning is clear.

~~~
wizard_2
I wonder if you could pull all the projects into one repo, repack and figure
out it's unique objects using git. I always wondered if a global object store
was possible or even a good idea for a site like github. I guess that's a
different project though.

~~~
silentbicycle
FWIW, finding duplicated code (particularly in a language-independent manner!)
is a surprisingly difficult problem. You need to scan not just for large copy-
and-pasted blobs, but code chunks that have just had constants changed,
variable names adjusted, indentation style changes, etc. I've been working on
a tool for this in my spare time, and I hope to have it out sometime this
summer.

------
ii
Interesting. A good example that illustrates why C is the top language is
_why's shoes -- it's written mostly in C, but still belongs to Ruby-land.

~~~
mpk
So is potion, _why's project du jour (though it doesn't have any relevance to
Ruby).

On a side note - is anyone here following potion's development?

~~~
barrybe
371 people are, according to github :)

------
jmtulloss
I know that I personally have a Clojure project that has many more lines of
JavaScript than Clojure just because I have a large JavaScript library checked
in with my other code.

I wonder if there's a way to figure out unique lines of code, or unique files.
jQuery really shouldn't be counted 5000 times, I would think.

~~~
defunkt
We exclude certain paths, especially popular JavaScript libraries.

------
mprovost
My project shows up on github as Shell even though it's in C just because of
all the autoconf/automake/libtool files. I imagine that's why Shell places so
high on the list.

For example, my 267 line configure.in file produces a 12952 line configure
script. And a 6 line Makefile.am gets you a 683 line Makefile.

------
geirfreysson
Also, check out the list of the most watched projects. Ruby and Rails seem to
dominate there: <http://github.com/popular/watched>

------
aston
cloc does a rough estimation of the terseness/verbosity of the almost 80
languages it supports. It'd be pretty easy to incorporate those scaling
factors for this analysis.

------
ossenabled
I guess it's not clear to me how you rank popularity by number of bytes (or
LOC). I really do like this information though! Thanks!

------
grandalf
Don't forget that many projects include jquery, prototype, etc., which add
quite a few points to js.

------
cubicle67
33: SuperCollider

er, what?

~~~
donaq
It's a language for real-time audio synthesis and algorithmic composition.
<http://en.wikipedia.org/wiki/SuperCollider>

