
GitHut – Programming Languages on GitHub - jrslv
http://githut.info
======
jvilk
While I dig the idea, it's important to note a few issues with the dataset.
Take the presented data with a huge grain of salt.

First, many repositories are not a single language. For example, this PHP
framework is reported as a CSS project [0]. While it has more lines of CSS
than PHP, it only has a single CSS file [1].

Second, GitHub has a problem with correctly identifying programming languages.
For example, PrimeCoin [1] is identified as one of the most popular TypeScript
repositories, but it has 0 lines of TypeScript code. Instead, it has... large
localization files with the extension *.ts [2]. BitCoin used to have the same
problem, but it looks like GitHub hack fixed it for that particular repository
as less popular forks of BitCoin still have this issue.

It took me a few minutes to find these examples, just by examining trending
repositories [4]. I'm sure there are many more. So do not be rash in drawing
conclusions from this data! :)

[0] [https://github.com/laravel/laravel](https://github.com/laravel/laravel)

[1]
[https://github.com/laravel/laravel/blob/master/public/css/ap...](https://github.com/laravel/laravel/blob/master/public/css/app.css)

[2]
[https://github.com/primecoin/primecoin](https://github.com/primecoin/primecoin)

[3]
[https://github.com/primecoin/primecoin/tree/master/src/qt/lo...](https://github.com/primecoin/primecoin/tree/master/src/qt/locale)

[4] [https://github.com/trending](https://github.com/trending)

~~~
abdias
I have discovered this as well. Although 100% of my own GitHub projects[1] are
JavaScript, more than half appear as CSS due to the included documentation.

Ideally, the project manager should be able to define the language composition
in their own projects. Something GitHub should consider IMO.

[1]:
[https://github.com/epistemex?tab=repositories](https://github.com/epistemex?tab=repositories)

~~~
jrochkind1
I doubt many project managers would bother doing that -- what do they get out
of it?

~~~
tokenizerrr
If I noticed a repository was classified incorrectly I'd open an issue asking
the maintainer to rectify it, if it was possible. They would likely do this
simply because someone asked. If not that, then simply for it to appear
correctly in the searches.

Why open source at all, what do they get out of it?

------
V-2
It's what I call an "AHA" piece of statistics.

Lots of data right there, and nicely visualised at that, only what it actually
means is unfathomable without knowing any broader context.

For instance: C++ has the greatest number of opened issues per repository,
then comes Rust, then Scala. All right.

Does it indicate that they're more tricky than others and hence more bug
reports?

Or perhaps that projects written in these languages are under more intense
scrutiny?

Or that people watching these repositories are just more eager to step up and
file an issue instead of sulking in silence (a trait of programming culture
surrounding these languages)?

And so on, and so on.

Or it could be one in case of C++, another in case of Rust - since they differ
under so many aspects.

Wide field for wildest speculations, but no meaningful correlations
identified.

~~~
mVChr
For a scientific paper, you are correct.

For practical or business purposes, this is a nice bit of incomplete
information to help make a decision. I want to take a serious, time-invested
dive into a new statically compiled language, but which one should I pick? An
old die-hard or the new-hotness? I could make a guess from reading the docs
and such, but I'd also want to know community activity and support. This is a
handy chart for getting a sense of that.

Or, I'm a business owner who just hired my first engineer. He's saying the
backend should definitely be built in Groovy, or maybe he'd be willing to do
Scala or something else, but definitely Groovy, yeah, Groovy man. I might be
able to get a better idea of which would be beneficial for my long-term
business prospects (hiring more engineers, etc) by checking out a chart like
this as I might not have time to do real in-depth research.

As a scientist you require complete, sound and accurate statistical data. As a
business practice (this site is about start-ups, no?) you need to be
comfortable making serious and important decisions based off of incomplete and
possibly inaccurate information because making fast decisions is often
paramount. You can and will always make other fast decisions later and decide
whether it's worth the effort to course-correct if you need to due to new and
more accurate information.

This is maybe too deep an analysis of a fun little infographic, but as a
former professional poker player who made a living off of incomplete
information you got my cockles up.

~~~
V-2
_I 'd also want to know community activity and support. This is a handy chart
for getting a sense of that._

The entire chart? Wouldn't the first column be sufficient? Number of
repositories gives you some idea about language popularity.

Well, kind of: there's bias of hype here. Obviously choices behind open-
sourced projects on GitHub aren't representative for the industry. It's the
software's world avant-garde, if anything.

And even so, that's just one parameter out of five, and it can be very well be
considered in isolation from all the rest.

I wouldn't make business decisions based, for instance, on the average number
of open issues. Because it's an outcome of many different variables. So how
would you know what it means? Is high good? Is high bad?

Interrelations between data - shown by this clever chart - are even more
mysterious.

TeX has a very high number of pushes per repository (second best), while
there's fairly few repositories, and they are rarely forked.

At the same time R has low number of pushes (second lowest of all), whereas it
wins in the "new forks per repository" category (#1).

What do you make of that, businesswise?

~~~
mVChr
I'm talking piece of the puzzle not the entire pie.

------
arcticfox
Am I the only one that doesn't like the visualization? It seems like it would
be fundamentally better if each bar was simply labeled instead of connected
via line. Mouseover could highlight the same language in the other categories
to get the cross-category information.

The question "What is ranked above Ruby for New Watchers Per Repository?"
seems to be a question this dataset should be answering, but it is enormously
difficult to parse here.

~~~
ryanmarsh
Slopegraphs are more useful when you can sort by any of the columns. I guess
the author thought the most important metric is active repositories.

~~~
achr2
Thanks for giving me a term to google. Also your point about sorting is vital,
and would make this type of chart very useful for understanding relationships.

------
bhouston
Languages with near flat or decreases in 2014:

\- Ruby (that was a bit surprising)

\- Dart (I guess the lack of native browser support is the killer here)

\- Typescript (I'm surprised this didn't take off)

\- Puppet (Interesting.)

\- ActionScript (obvious now that Flash is dead)

\- Scheme

\- Common Lisp

\- D

\- Fortran

\- Logos (huh?)

(I know near flat is subjective, but still these are the languages that are
not seeing much growth in 2014, and what likely isn't growing strong in 2014,
is likely to continue that trend in 2015.)

~~~
Touche
> is likely to continue that trend in 2015.

Going to state the obvious and say JavaScript. For all of the obvious reasons
but also because ES6 is going to make it more palpable for those who formerly
found it distasteful.

~~~
vivin
Did you mean palatable? :)

------
ignoramous
OCaml desperately needs some wind on its sail. It fares poorly than PowerShell
in terms of # of repos, and that says it all really. Compared to Haskell and
Clojure, which are soaring to put it mildly.

~~~
amirmc
I think OCaml is doing pretty well for itself. For example see the graph of
package-growth at [1] and the recent news from people at Facebook about Hack
and Flow (both in OCaml). Not all repos are on GitHub and it's not really fair
to expect them to be — especially for the sake of vanity metrics, such as this
visualisation.

In addition, some of the repos that _have_ OCaml code may not be recorded as
such. Repos where the 'brains' is in an expressive language might be
overshadowed by boilerplate from elsewhere.

[1] [http://amirchaudhry.com/towards-governance-framework-for-
oca...](http://amirchaudhry.com/towards-governance-framework-for-ocamlorg/)

~~~
ignoramous
Thanks, I was starting to get a bit depressing seeing the stats on the
website. In fact, I was shocked to find OCaml so far off from Clojure.

I think the adoption problem for OCaml is compounded because it suffers from
lack of stackoverflow hits for any given errors that you might encounter or
any given queries you might have. Searching for something as mundane as "how
to read large files in OCaml" leads to just a single hit (Streams at
OCaml.org) [0].

Also, OCaml needs a "recipe/patterns" book-- on how-to get some of the things
done the right way in OCaml.

[0] BTW, a big fan of your work.

------
RA_Fisher
R's bump in Q1Y14 is probably when CRAN, R's largest "official" repo archive
pushed all of its packages to Github. Pretty neat to see the volume right
there.

~~~
bhouston
Sort of disconcerting that R is somewhat flat in terms of its growth though.

~~~
hadley
I think there's steady growth in active R repos that's masked by the huge
impact of Hopkin's Coursera course on reproducible research. Tens of thousands
of people fork a repo for a homework
([https://github.com/search?q=forks%3A%3E30000&type=Repositori...](https://github.com/search?q=forks%3A%3E30000&type=Repositories)),
and then never touch it again

------
andrewstuart2
Beautiful use of D3. Looks like something Bostock himself would've made for
the New York Times.

~~~
couchand
Agreed. I like the sort of informal feel to it, calling back to a sketch.

I'm a big fan of the parallel lines chart, and this one is well executed. The
data labels are unobtrusive, appearing on hover to let you dive in. The data
set is coordinated with navigation on the timeline above using the principle
of object constancy [0]. I really like that you can click a language to pin
it; you can focus on a few languages and watch their evolution over time by
scrubbing the line chart. (I don't like that if a pinned language falls off
the chart at one point in time it isn't restored when you go back to a time
that it's on the chart.)

I like the idea with the small multiples below, but I wish there was less
wasted space; it's hard to see very many at one time. There's not really a
need for full-blown small multiples here - vertically-aligned sparklines would
be more effective. If they were in the same table as the parallel lines it
would allow a deeper exploration of the data.

[0]:
[http://bost.ocks.org/mike/constancy/](http://bost.ocks.org/mike/constancy/)

------
Fiahil
Rust is showing some interesting traits. Despite its slow start, it's catching
up quickly enough.

~~~
jonalmeida
Intentional pun?

I've been trying to learn Rust myself by spending ~30 mins everyday on on it
for the past 2 months. It's strange how simple it is to make something, but
it's hard if you have no experience in functional programming.

~~~
Fiahil
I can relate to your pain. I came back to hardcore Scala, 2 years after a very
quick introduction to functional programming. But, in the end, every seconds
of head banging was absolutely worth it. Once you step into The Immutable
Functional World, it will change the way you design software and there is no
coming back :)

------
galfarragem
Erlang and Clojure don't seem to take off on github, despite being very
visible in HN..

It seems that everybody speaks about these languages but then they don't use
them.

~~~
slowmovintarget
Or they use them for getting work done and not for open source projects on
GitHub. Clojure seems to be one of those languages where you can be just as
productive creating the software yourself from whole-cloth as trying to grok
someone else's DSL.

If that's true then large bodies of application specific code would exist off
the GitHut radar, I suspect.

Not sure about Erlang.

------
snhkicker
I think that these statistics are a bit under-rated and a bit misleading

-under-rated: CSS: has 80% more pushes than C++ WOW :O Javascript: remains to be super for small projects but man this sure brings a tear to your eye when you see 10.69 pushes per repo i think i may have misunderstood JS alot Safe Languages: are probably not as safe as we think

-misleading: the fact that this isn't talking in anyway about the industry itself but about the LOVE given to each programming language for the following reasons:

a)Developers in general contribute to opensource programming projects with the
same concept gcc devs used when saying "compiling GCC as C++, we are writing
code if you want it as C do so your self" as i understood it

b)Interest and Time and Location on Github diverge from reality: Interest:
Developers are interested in doing new things when it comes to Open Source so
this may affect numbers alot Time:time changes everything Location: i think
Github is number 1 place when it comes to Front-end programmers although every
one likes it but in Javascript i think Github is the super man

~~~
synunlimited
CSS could be explained with the idea that a lot of the pushes could be small
little fixes and tweaks.

------
CmonDev
If only there was a way of analyzing the quality of the repository. Are those
1000s of snippet-size JS libraries and Ruby gems meaningful?

It's interesting that strong static languages have more issues open (top 5) -
easier to spot them?

~~~
davedx
I think module size is a big factor. I would predict that C++, Java and C#
tend to be larger, more monolithic projects, whereas JavaScript and Ruby have
more broken up module ecosystems. JavaScript especially, with its "modularity
shaming"...

~~~
CmonDev
Very well-factored C#/F# projects such as EF7 and F# compiler may look
"monolithic" due to type safety and packaging.

------
ryanmarsh
This is really good example of a useful slopegraph. I find so few of these in
the wild thus I often fail to articulate the value of the approach such that a
client will buy into the idea before I build it.

for reference: [http://www.edwardtufte.com/bboard/q-and-a-fetch-
msg?msg_id=0...](http://www.edwardtufte.com/bboard/q-and-a-fetch-
msg?msg_id=0003nk)

edit: the use of "small multiples" is superb as well

------
hharnisch
It would be incredibly enlightening to see what languages people are moving
to/from. (like this for but for programming languages -
[https://www.facebook.com/notes/facebook-data-
science/coordin...](https://www.facebook.com/notes/facebook-data-
science/coordinated-migration/10151930946453859)). I'd like to know what
people are switching to from Ruby.

~~~
reledi
My guess is the Rubyists aren't switching, but picking up additional languages
such as JS or Go or Rust.

------
vinceyuan
Surprised to see the oldest programming language is Makefile (1970) in this
chart. It appeared earlier than C (1972). Is this correct?

~~~
calibwam
According to wikipedia [1], make appeared in 1976, so it should be younger
than C.

[1]
[http://en.wikipedia.org/wiki/Make_(software)](http://en.wikipedia.org/wiki/Make_\(software\))

------
themoonbus
Languages from the early to mid 90's are doing quite well. 95 alone saw Java,
JavaScript, Ruby and PHP.

------
visarga
The top five languages were all created initially between 1991 and 1996. Is
that by coincidence? Probably languages have a lifecycle and age matters a
lot. The current top crop are about 20y.o. - just becoming adults. Would that
mean that Swift and Rust will get in top 5 after year 2030?

------
davedx
Open issues per repository is an interesting one.

I'd love to see open issues / LoC / repo for each language.

~~~
msl09
I don't think that open issues/LoC is very relevant since some languages
despite doing a lot in few lines of code(or even characters) still require an
amount of mental effort similar to more verbose languages.

I very much like how GitHut used issues/commits. In my interpretation:

(1) If your project has a lot of commits and few issues it has a very high
quality.

(2) If your project has a few of commits and many issues it has either very
low quality or is not being developed.

(3) Having a lot of commits and a lot of issues and vice versa is kinda
expected, since new features(commits) often introduce new bugs and small
projects often have few of both.

When you cross that with popularity(new forks, new watchers) over the years
you can narrow (2) with some confidence.

Using that approach is trickier when it comes to comparing languages, but the
data GitHut gives seems to be in line with common knowledge, at least when it
comes down to open source software and and when you compare the most popular
languages.

~~~
ufmace
Not so sure about that. Lots of commits and few issues could mean that it's
cool or interesting somehow, but isn't actually being used for much. Vice
versa for few commits and many issues.

Hard to say much for sure without breaking down the details, who's discovering
the issues, how many are real, how many are serious/blockers versus minor
annoyances with workarounds or feature requests, are the commits new features,
bug fixes, refactoring, etc.

------
krat0sprakhar
Looks like 1995 was a great year for programming languages - PHP, Java,
Javascript and Ruby were released!

~~~
Argorak
Ruby's development started in 1993 (February 24, to be exact). First full
release was 1995 (December 21, that one I had to look up :)).

This is very interesting in my opinion: whenever someone asks why one of those
languages doesn't do $featureA "like Java", you can just reply: "because Java
wasn't a thing back then".

~~~
vorg
Groovy's creation year is wrong, it should be 2003 not 2004. It was first
announced by creator James Strachan on 29 August 2003, and its very first
release (Groovy 1.0 beta 1) was on 11 December of that same year.

Unfortunately, someone who became a "despot" of the project at its repository
(Codehaus) on 4 May 2004 started referring to himself as Groovy's creator in
publicity articles about a year ago. A few months ago, someone even tried
deleting the Wikipedia link to James Strachan's webpage announcing the Groovy
Language.

------
carsonreinke
It would be really neat to see how many repos have been abandoned per language
(e.g. no activity in X)

------
robbyking
It's interesting to me that so many Objective C repositories have so few
pushes yet so many forks. I wonder if it's because companies like Facebook and
Square "release" open source projects on Github then move on to something
else.

------
gamesbrainiac
Any new change that is of particular interest? I believe this page has been on
HN before.

~~~
Retr0spectrum
I only had a brief look, but I find the recent decline of Ruby quite
interesting.

~~~
kcorbitt
like V-2's top-level comment explains, there are lots of ways to interpret
these statistics and it's easy to jump to false conclusions if you don't take
the time to look for context.

In Ruby's case, the total number of repositories on Github has continually
increased -- it's just that since it was such a huge part of Github's early
user base (the Rails community was probably the first big adopter of Github,
which makes sense since Github itself is written in Rails) percentage-wise it
has dropped significantly as more communities adopt Github.

~~~
Touche
I think he's referring to the number of repositories (which is not a
percentage of total) is pretty much flat in the last year.

------
megalodon
Just a heads up; the page header (and footer?) does not render correctly on
mobile. The top graph is centered and its data is impossible to read because
of label overlapping. Interesting analysis nevertheless.

------
pama
Swift wins the popularity contest: most watches per repository, third most
forks per repository (R has most forks per repository). Anyone up for an
iOS/Mac App with a statistics backend?

------
droob
This is just _public_ repos, right? That might skew numbers a bit.

~~~
Shank
Well of course, but it's not like the dataset has any private repos for any
languages -- so the data is only OSS.

------
tdicola
Is the data behind this visualization available anywhere or do people need to
run the BigQuery queries themselves?

------
vinceyuan
Thought GitHub published it officially. Just found it's GitHu(t) instead of
GitHu(b). LOL Anyway, good job!

------
nXqd
Really nice statistics. I would love to know the name of that kind of
statistics graph ? Thanks !

~~~
couchand
I've heard it called a parallel coordinates or parallel lines chart.

~~~
kaneplusplus
According to the link below a hammock plot is generalization of a parallel
coordinate plot where the lines are replaced by rectangles that are
proportional to the number of observations they represent. This plot is
different from both the traditional parallel coordinate plot and the hammock
plot since the category's width is proportional to it's activity, not the line
width. Maybe it's type is still unnamed.

[http://www.schonlau.net/publication/03jsm_hammockplot.pdf](http://www.schonlau.net/publication/03jsm_hammockplot.pdf)

------
sagivo
a look here
([https://github.com/stars?direction=desc&sort=stars](https://github.com/stars?direction=desc&sort=stars))
shows that from the top 10 all-time-stars 6 are javascript related.

------
okcwarrior
My C# MVC Web App is reported as a Javascript app because of the templates I
use :0

------
bmoresbest55
Java developer learning Django and playing with Node, React and Angular. 3/3.

~~~
tom-lord
I wonder what the graph would look like if "Javascript" was broken down by
framework, though...

------
duderific
Sad to see, my dear old friend ColdFusion did not even make the list.

------
montogeek
This was already submitted months ago :s

------
pla3rhat3r
Beautifully designed page!

------
ayr-ton
It is just beautiful.

------
VOYD
I like it.

------
biomimic
What about Tcl?

~~~
jrapdx3
Tcl is still actively developed and has a sizable community interested in it.
(see [http://wiki.tcl.tk](http://wiki.tcl.tk))

It doesn't show up much on Github since its main repositories are hosted on on
Sourceforge and Fossil. That includes the core language and most of the major
extensions. Check the wiki for details.

------
Eleutheria
When too much interactiveness is too much.

------
zzzcpan
This confirmed some of my suspicions. Ruby seems to be in decline, just like
Perl, but not declared dead yet, maybe in a few years. Python looks like it's
getting to that point as well, hard to tell though, will be clear in a year.
Go is growing fast and already ahead of Perl.

