

Programming Language Popularity on GitHub and Stack Overflow - stesch
http://langpop.corger.nl/

======
lispm
Programming Language Popularity on Github and Stackoverflow.

Okay.

There are a lot of funny bits. For example the 'language' 'IO' is tagged quite
a lot on stackoverflow. Or was it 'Input / Output'?

Common Lisp has fewer tags on stackoverflow - maybe because many questions are
just tagged 'Lisp'.

'J' has very little lines changed. That could be a good sign, since it is
based on APL, which is famous for its one line programs.

Perl6 is at the bottom. Okay.

CSS is now a programming language.

XML is now a programming language.

Self has some tags on Stackoverflow. But mostly in the context of other
programming languages, not the programming language Self.

Etc.

This is flawed in so many details.

~~~
toppy
"CSS is now a programming language. XML is now a programming language."

Seconded. Could we please stop call CSS a language. If not, add PDF to the
list.

~~~
xymostech
Interestingly, PDFs actually do contain an embedded programming language in
them, a subset of PostScript. [1] PostScript, meanwhile, is actually Turing
complete with conditionals and loop structures inside of it. [2] I'd argue
that makes PDFs a whole lot more programming language-y than CSS or XML.

[1]:
[http://en.wikipedia.org/wiki/PDF#Technical_foundations](http://en.wikipedia.org/wiki/PDF#Technical_foundations)

[2]:
[http://en.wikipedia.org/wiki/PostScript#The_language](http://en.wikipedia.org/wiki/PostScript#The_language)

~~~
userbinator
"PDFscript" (as I like to call it) is not Turing complete. It's more similar
to SVG in terms of capability. From the PDF spec:

"To simplify the processing of content streams, PDF does not include common
programming language features such as procedures, variable, and control
constructs."

------
cessor
I feel that these popularity indices are pretty flawed and most often used as
arguments under a confirmation bias. Defining "Popularity" is no good measure
in my opinion. This graph just shows that with many lines changed many
questions arise, it does not say whether people enjoy doing so. I would be
much more interested in spots like "Logos" in the bottom right - 160 Mio Lines
changed, only 38 questions of StackOverflow. So what does that mean? Is
everything about the language crystal clear? Beat that, JavaScript.

Github is a reliable source but I am not sure whether it is representative. At
least it appears to be very popular with web folk. Other sources are missing
such as bitbucket, especially since it allows for free of charge private
repositories. Like this, the only thing this says is that github use
correlates with the lines of JavaScript. Not sure whether this is a good
measure for popularity. Much like a kid in kindergarten, who shows up every
day and talks to everyone. Yes, his presence correlates with the presence of
others, but what if he is just a jerk and nobody really likes him?

~~~
mjw
Fair points.

And of course basing the github measure on lines of code gives quite an unfair
advantage to verbose languages like Java.

Perhaps they could look at compression ratios when (say) gzipping a decent
sample of each language, and use these to weight the LOC metric?

~~~
cessor
You are absolutely right, I didn't think about this when I wrote my comment.
LOC is in itself a very poor metric, therefore "Changed LOC" doesn't improve
any interpretation. Including the number and sizes of repos would be better,
since LOC pushes verbose languages. Java and C# code usually features a lot of
empty or hollow lines (in C# for example, one opens the curly brace for a
function in a new line).

I mean, yeah, most of the comments indicate that we agree that this kind of
graph or "false statistic" is flawed.

But can we find any value in it? How do we interpret this graph, despite its
basic problems? What good stuff can we do with it?

~~~
mjw
A scale for cross-language comparisons seems a hard ask because everyone is
implicitly interested in answering different questions. Which languages do
people enjoy using the most? Which have the most code "out there" in some
setting or other (open source, commerce, in deployment, scripting, ...)? Which
have the most engaged and vocal communities? Which have the biggest pool of
skilled workers? which will I get the most career benefit from learning?

Perhaps the useful information is more in the correlations between different
related metrics, than in drawing up ranked lists (surprise surprise, Java >
Haskell!). This graph helps visualise the correlation between these two, which
seems significant but far from perfect. Outliers like SQL can then be
identified, which point to problems with the metrics (e.g. with SQL,
presumably github fails to spot lines of SQL embedded in other langauges).

------
nhebb
I think language popularity falls into two different categories: (1) languages
other people will pay you to work in, and (2) languages that you would work in
of your own choice.

For #1, I would measure the frequency that a language appears in job ads.

For #2, I think the trending repositories in Github would be a better metric
than commit lines.

~~~
d23
Good point. If anything, the number of commit lines has as much to do with the
verbosity of the language as it does its popularity.

------
toolslive
Hold on a second, so a programming language that is very verbose, and needs
lots of assistance to be workable is considered popular?

~~~
scotty79
Sure. For some definitions of popular.

------
bochi
It would be interesting to see the same data broken down by year. Lua,
Clojure, and Go are at about the same position, but I wonder which one is
gaining traction. Probably Go.

------
gajomi
So I have seen many of these kinds of graphs and associated complaints that
come with them. Much of the criticism revolves around the usefulness of
``popularity'' as a proxy for other things that we actually care about in
programming languages having to do with their immediate usefulness to us.
These are all good to discuss, but I think there is a more pressing issue,
having to do with the granularity of the sample. I work in an area of
scientific computing, and have 1-2 languages at any given time that I work
with often, and have familiarity with 3-4 more. However, I have friends that
work in slightly different areas of scientific computing that have a different
set of languages. Integrating information from these people, I have a rough
set of the languages used in my local neighborhood. And I will say there is
actually quite a bit of diversity in the neighborhood, even though its all
within scientific computing. So who is to say what lies in the realm of friend
of a friend of a friend? I don't have a feel for this distribution.

This is where I think integrating these kinds of graphs with a filtration over
distances on some network would be highly useful. It is not clear which
network(s) would be the best, but there are some clear advantages to going
with induced link through github collaborations. What I am suggesting might be
more helpful for niche domains than it would be for the ``median'' programmer
(if such a thing exists), but who knows, maybe there is some usefulness there
too?

------
bhouston
Dart, CoffeeScript and TypeScript all have +50M lines of code.

It is weird that Dart has 50M LOC when Go only has 100M LOC. My analysis of
Dart's popularity suggests that it is actually a lot less than 2x less popular
than Go. I though Go was actually 10x more popular that Dart at least.

I wonder how much double counting and so forth (as a result of forks) is
present in this analysis. Maybe there is a bias as to what is released
publicly to Github versus used in production.

------
na85
I was surprised to see how highly-ranked some of the languages were (like
Fortran and R) that I had previously discounted as mostly abandoned in the
"github crowd".

Obviously Fortran is used a lot in HPC and numerical routines, but I always
figured those were closed-source and rarely updated, with development focusing
on their wrappers.

~~~
pjmlp
Github gets lot of undeserved credit. Most of the code in the world doesn't
even know what is Github.

~~~
dtech
Then again, not that much code is aware of version control systems (mostly the
code of VCS'es themselves and related tools)

~~~
coldtea
He meant "coders". As in: "most coders don't know or use GitHub".

~~~
pjmlp
thanks, typo

------
johnsoft
This is only tangentially related, but I remember seeing a site a few years
ago that ranked languages by crowdsourcing about 100 subjective comparisons
between different pairs of languages. It made statements like:

    
    
        - Programs written in this language tend to work
          correctly the first time they are accepted by the
          compiler
        - The type system in this language helps me to be more
          productive
        - Code written in this language is easy to read and
          understand at a glance
        - I would choose this language to write a desktop GUI
          application in
    

and asked people to choose which of a pair of languages each statement applied
more to. The data could be ranked by language or question, and I found it
really useful and discovered a few languages I wouldn't have heard of
otherwise. But for the life of me, I can't find the site anymore. Does this
ring a bell to anyone?

~~~
jonathansizz
Yes.
[http://hammerprinciple.com/therighttool](http://hammerprinciple.com/therighttool)

~~~
johnsoft
That's it! Thanks a million :)

------
nullc
Github mus-identitifies QT translation files as "TypeScript", assigning all
commits on some very active QT projects (like Bitcoin) to that language, which
massively inflates the typescript usage on metrics like this.

~~~
draegtun
Github uses _linguist_ to identify the codes programming language and
unfortunately it's far from perfect -
[https://github.com/github/linguist](https://github.com/github/linguist)

There are known issues with distinguishing code between certain languages. For
eg. There are issues between Perl & Prolog (and sometimes even Puppet). And up
to very recently [1] most Rebol code was identified as R :)

[1] That is until this patch was merged in (although the repos language stats
only get updated after a new push happens) -
[https://github.com/github/linguist/pull/1005](https://github.com/github/linguist/pull/1005)

------
yaur
Not my data or article but for those who want to poke around with it
[http://langpop.corger.nl/formatted_data](http://langpop.corger.nl/formatted_data)
is the relevant URL.

------
olmo
Nice work! GitHub should overrate the languages of the cool boys, but
underrate expressive languages because less LOC. stack overflow on the other
side should overrate day to day work languages but also overrate complicated
languages that generate more questions.

At the end of the day there's a clear correlation so I think this graph is a
reliable source of information. Is there a way to remove logarithmic scale?.

------
tmalsburg2
Lines of code changed is confounded by syntactic properties of the languages:
For instance, languages that make a lot of use of indentation, like Lisp, will
have more changed lines of code but that doesn't necessarily reflect more
activity. In Lisp, you often have huge diffs even for minor changes because
these minor changes can lead to a change in indentation of many otherwise
unaffected lines of code.

------
_random_
Intuitively it feels quite accurate. C#, Java, C++, Python, JS cript at the
very top. Scala above F#. New entrants not too down in the bottom.

------
exceedhl
I did a analysis of programming languages using github data before:
[http://exceedhl.github.io/blog/2013/05/07/programming-
langua...](http://exceedhl.github.io/blog/2013/05/07/programming-language-
evolution-in-last-year-github-data-challenge/)

------
georgeecollins
I was shocked to see Pascal so high on the list. Pascal was the first compiled
language I ever used. But I don't even know what development environment you
would use for it now.

------
vorg
Flag: The data's from Feb 2013, i.e. over a year old, which should be
reflected in the title.

------
jayvanguard
This is the first accurate programming language popularity chart I've seen.

------
wheaties
I see Puppet but not Chef. That's something that should be fixed. Then again,
this was done in 2013. I bet a fair amount has changed since then.

Also, SLOC for ranking on Github? We don't use that to assess "productivity"
for a reason.

~~~
coldtea
It might not measure productivity, but it sure gives a hint about how much a
language is used.

Sure, a language can be 1/5 as terse as another. But not 1/100 as terse.

------
klrr
Obvisouly Java is most popular because it is the _best_ programming language.
It got OOP and is on the super fast JVM, nothing can beat it.

