
GitHub Language Trends - spatten
http://redmonk.com/dberkholz/2014/05/02/github-language-trends-and-the-fragmenting-landscape
======
sheetjs
FWIW many projects written in web languages like PHP seem to be treated as
javascript by Github because they also have javascript code (which happens to
be more significant / larger than the underlying code). It's unfortunate that
there is no way to specify the primary language of a project.

The Github language system is also somewhat unpredictable:
[https://github.com/SheetJS/test_files](https://github.com/SheetJS/test_files)
seems to alternate between AppleScript and Shell with each commit (even if no
.scpt or .sh file was changed or added)

~~~
zongitsrinzler
If a project mainly consists of JavaScript it should be a JavaScript project,
no?

~~~
ubernostrum
Well, to take an example: I work on the Mozilla Developer Network (MDN). MDN
is mostly a wiki, and the underlying codebase for that is all in Python.
There's also the Demo Studio, which lets people upload web technology demos,
and that's Python too.

But the wiki also has an embedded script-ish language (letting people create
macros/templates for specific purposes and re-use them across multiple
articles). That's JavaScript-based, and so is implemented in node.js. And
there's a WYSIWYG editor for the wiki pages, which is a couple of MDN-specific
plugins we've developed, plus off-the-shelf components (the editor + jQuery,
which we also use for a few other things on-site).

So MDN is a Python project with a couple JS utilities attached. And if you set
up a local copy, you can pretty easily see that Python is by far the heaviest-
used language. But due to the way GitHub counts and reports statistics, it
shows up as a JavaScript project.

To pick on an easy example: the copy of jQuery we have in our repository
weighs in at just over 9,000 lines. So just having jQuery means you need to
write over 10k lines of code in order to get the "real" language of your
project recognized.

~~~
unfunco
One could argue that jQuery, and other libraries shouldn't be included in the
repository proper, instead it could be a submodule or included in a package
manager manifest, such as bower.

~~~
Hovertruck
From the bower docs:

"N.B. If you aren't authoring a package that is intended to be consumed by
others (e.g., you're building a web app), you should always check installed
packages into source control."

~~~
munro
The Node core dev actually went back and forth on this, and came to the
conclusion to not commit dependencies to the repo, and introduced "shrinkwrap"
[1].

It felt wrong committing dependencies to the repo, which I agree with. I hate
my diffs being drowned in external changes, I'd rather see that someone simply
upgraded a dependency. Plus it does skew the project, both for what the
primary language is, as well as how much a committer is contributing.

I would love to see this eventually in Bower! They have an issue for it [2].

[1] "Why not just check node_modules into git?"
[http://blog.nodejs.org/2012/02/27/managing-node-js-
dependenc...](http://blog.nodejs.org/2012/02/27/managing-node-js-dependencies-
with-shrinkwrap/) [2]
[https://github.com/bower/bower/issues/505](https://github.com/bower/bower/issues/505)

~~~
unfunco
I agree. Hovertruck quoted a piece from the Bower documentation, which I can
see the reasoning behind, if you're working on a web application then
committing the library removes the possibility that the dependencies cannot be
resolved because a library has been removed from the package manager. There
are benefits to both options, but my personal preference is to keep third
party libraries out of the VCS.

~~~
munro
:D Just to take this a little further, at my old employment, our Ops wanted
the dependencies to be committed to the repo. To make rolling back faster,
since `npm install` can take awhile. Also, there was a time NPM register went
down wrecking havoc.

But that's shouldn't put weight in either direction. Instead, there's a much
deeper issue here: Ops should be building packages for deployment. Whatever
that means, RPMs, VMs, Docker images, ... Just separate the concerns. Seal it
up, and store it forever and ever, perfectly.

Because even if dependencies are there, safe and happy, most applications have
a build process. Which takes time, and could be different on the machine it's
building on. Only you can prevent production fires.

------
m_ram
> Violating all expectations and trends, new Java users on GitHub even grew as
> a percentage of overall new users, while everything else went downhill. This
> further supports the assertion that GitHub is reaching the enterprise.

This is more likely to be due to the rise of Android since 2009.

~~~
gurkendoktor
I would also expect a few CS students to push their homework to github
nowadays, and most homework is Java.

------
hcarvalhoalves
Javascript is really hard to measure I believe, the numbers are always skewed
because of 3 things:

1\. A lot of repositories include 3rd party libraries.

2\. A lot of software includes a web interface, even if the backend language
is something else, but the LOC for Javascript can be equal or even higher
because of 1.

3\. JSON is counted as Javascript sometimes

------
kyberias
It would be much more useful to show the absolute numbers. Or provide the raw
data for others to do more meaningful graphs.

~~~
arscan
It depends on what story you are trying to tell... Absolute numbers might not
be all that meaningful when you are talking about relative popularity. Or it
might be. Data visualization is hard because you typically have to trade
information density for comprehensibility.

I assume the data came from
[http://www.githubarchive.org/](http://www.githubarchive.org/)

------
gexla
I don't know this is a good indication of trends. As the article (somewhat
shallowly) shows, there are stories behind of these graphs.

Ruby has probably just settled to a normal position post early adopter. Shows
that Ruby is still strong.

Javascript is probably building the sorts of libraries that other languages
already have. These guys have had a lot of work to do.

~~~
davidw
Yes, Ruby was very much "overrepresented" in the early days of github, so the
decline in its growth relative to other languages is fairly normal sounding,
rather than indicative of its actual decline.

Also, Javascript is something that is going to get checked in to lots of web
projects one way or another. I wonder if they weed out duplicate copies of,
say, jQuery.

~~~
coyotebush
> I wonder if they weed out duplicate copies of, say, jQuery.

Linguist tries to filter out things like that:

[https://github.com/github/linguist/blob/master/lib/linguist/...](https://github.com/github/linguist/blob/master/lib/linguist/vendor.yml)

~~~
nox_
"Tries" is the operative word here. There is a documented history of utter
incompetence in Linguist.

~~~
jhcnt
Likewise for most of GitHub's other libraries. Sundown is some of the worst
code I've ever seen.

------
rumcajz
This looks like commercial ecosystem (Java, JavaScript) migrating to GitHub
rather than hobbyist ecosystems (Ruby, Python, Perl) being on decline.

~~~
camus2
say what? Ruby isnt for hobbyists, neither is Python(dont know Perl).But you
just wanted to hear yourself say that didnt you?

~~~
threeseed
It isn't entirely untrue. Ruby/Python are more popular in the hobbyist space
and less so in the enterprise. Java is the opposite.

It shouldn't be seen as a reflection of the platform.

~~~
destron
You have set up a false dichotomy here between enterprise and hobbyist.

------
pecanpieyw
Github should really just enable the developers to specify the language of
their repo by themselves. Bitbucket get this right in the first place. Auto
detection for language sounds cool, while doesn't work for most of the web
project.

------
dham
Just need to have a dropdown of the primary language like Bitbucket does.
Probably have different results.

------
pron
GitHub is slowly starting to reflect the software world at large, although the
true picture is Java and C leading by a huge, huge margin. I don't expect
GitHub to ever fully reflect that, as most Java shops, and nearly all C shops,
would never host their code on GitHub.

~~~
gurkendoktor
I think it's safe to say that most projects on github are libraries and
snippets, not finished products. Not even Ruby shops push all of their actual
work to github. But at least there is a culture of sharing libraries, which
drives the Ruby % up, and I don't see this happening a lot with C.

------
skrebbel
The last 2 graphs look like they were drawn by someone who _hates_ colour
blind people.

------
err4nt
Very interesting graphs, I was surprised that CSS has had a recent uptick but
as somebody who has specialized in responsive layout in the past two years I
guess that uptick represents my life too.

I can't get enough language statistics on Github! I run 'gitinspector' on my
web server to compute language stats in individual git repositories, but one
thing I haven't been able to figure out it how to chart the language stats for
one git repository over time in a branch.

Does anybody know how you can chart language use over time in one repository?

~~~
AYBABTME
I have projects (markdown repos) marked as CSS projects for which I haven't
written more than 10 lines of CSS. Bootstrap makes it appear so, I guess.

------
lkrubner
To my mind, the big trend is toward polygot programming, which perhaps reveals
what a transitional and perhaps revolutionary time this is in the world of
computer programming. This paragraph struck me as the most important:

"Almost every language shows a long-term downhill trend... My initial guess is
that users of languages below the top 12 are growing in share to
counterbalance the decreases here. It’s also possible that GitHub may leave
some users unclassified, which would tend to lower everything else’s
proportion over time."

------
largehotcoffee
>Language detection is based on lines of code

~~~
Karunamon
This is important to note. As a huge ruby fanboy, one of my favorite things
about the language are the shortcuts and syntactic sugar that are provided to
make things easier.

I saw the decline and got a bit worried, then came here and had those fears
assuaged. I do wonder if the LOC difference has anything to do with people
getting more familiar with the language and doing more things in less code.

------
PeterisP
It would be interesting to see which languages are rising. In those stats,
everything except Javascript seems to be declining, and the total relative
decline is much larger than JS growth, so _something_ must be growing - but
what it is?

------
jcbrand
One of their reasons for the growth of Javascript:

> the JavaScript development philosophy that encourages bundling of
> dependencies in the same repo as the primary codebase

I wonder how much this is still an issue with the rise of npm, bower and other
package managers.

~~~
mateuszf
node_modules directory is usually added to .gitignore. I don't think it is
usually commited by accident as it results in a huge file list in diffs and
git status command.

~~~
iLoch
That's what he's saying. While the author's statement may not necessarily be
incorrect, the trend these days is to manage your JS (both server and client
side code) with package managers like NPM and Bower.

------
platz
2010-2011 appears to be the "year of the great inflection point"

------
alexchamberlain
What is the cause for the correlation between Java and JavaScript?

~~~
pcmonk
Probably random noise. Or else because it kind of tracks the arc of github
going mainstream, and Java and Javascript are very mainstream.

------
louthy
Not sure if it's reasonable to make this call, but it seems the dynamically
typed languages have significantly higher percentage of issues overall.

~~~
kijin
It could be that dynamic languages have a lower barrier of entry, which
encourages many more people to fork the code, find bugs, and make pull
requests. It's easy for a PHP guy to find and report bugs in a JavaScript
project. It's not so easy for a JavaScript guy to find bugs in a C++ project.

------
chx
I find the complete lack of Go surprising.

~~~
threeseed
Go is a new language so not that surprising. It is also over represented here
on HN. You don't see Go jobs in the wild for example.

~~~
chx
I do not think Go as something that would go on itself as a job. Instead if
you already have a site (like Drupal which is my primary skill) and you want
to add a very performant REST interface then Go provides a pleasant way to
write one.

~~~
camus2
I would really question the choice of drupal at first place,or any use of the
PHP plateform.

~~~
chx
There's still nothing, by far, that would match the versatility of how much
can you do from the Drupal UI.

------
peteretep
Presumably the big Perl spike is the migration of CPAN modules over to Github
en-masse.

------
jhcnt
In my experience, GitHub's so-called "detection" gets it wrong far more often
than right. It's worse than useless -- it's misleading.

