
Analyzing GitHub, how developers change programming languages over time - nicolrx
https://blog.sourced.tech/post/language_migrations/
======
joekrill
Unfortunate they had to exclude Javascript. I understand why they chose to do
that, but that's a HUGE chunk of data that's been pretty much randomly
ignored. So can this really be considered a fair analysis given that?

~~~
haburka
No one has a choice with JavaScript. Theoretically you could go native or
transpile but it's very rare.

Therefore, if you included it, the data wouldn't lead to meaningful
conclusions. I feel like this was pretty obvious from the article and that the
explanation was enough. For example, substituting node for js seemed to work
well enough.

~~~
remline
No one really has a choice with C, but its tally was very interesting..

I think there is a lot more vertical movement and the horizontal stuff might
be a sideshow without considering overlap or experience with C or JavaScript
as significant in how one transitions between purely competing languages.

But JavaScript's real problem for the analysis might be that its competition
is largely excluded since the walled garden apps rarely have related code on
GitHub, even if their language is present.

~~~
freehunter
There's a huge choice for C. For a lot of uses, you can use C++, Go, Java,
Rust, or a few other systems languages. The only thing that _needs_ C is some
very niche embedded stuff, or the Linux kernel (also niche).

~~~
bluejekyll
All the major Unix kernels are in C. If it continues this trend, it seems
clear that new kernels in new languages will be needed, or existing kernels
will need to be ported to other languages. A memory safe one would be cool...

~~~
muricula
Or maybe C is uniquely suited to the task of writing a kernel, after all
that's what it was designed for.

~~~
pjmlp
Not really, kernels were being written in HLL 10 years before C existed, and
at strange places like IBM research.

The only good thing about it is that the language is easier to write a
compiler for, as it is basically a portable macro assembler, specially the K&R
C variant.

------
usgroup
This looks dubious to me. Not least do to the healthy flow of people moving
from other languages to Visual Basic.

Also byte flow makes little sense for programming languages. lower level
languages are going to be more verbose than high level languages. They'll be
used for different things. Some will have tonnes of boilerplate that travels
with the project (e.g java). Forked projects? etc.

Further, to me it seems that this ought to be a more descriptive thing of how
something happened in the past and not subject to probability unless the claim
is a prediction about next years conversions between languages or that
conversions are stationary over time.

I.e a set of metrics that proxy transitions and an order list of from and to,
would be just the ticket IMO.

------
cloop_floop
Some possible confounding variables: what if certain language users are more
likely to squash commits? what if certain language users are more likely to
have private repos?

~~~
bojo
I don't think squashed commits matter since they used project byte size as a
filter to get rid of "Hello World" sized noise.

Hard to tell how much private repos would sway the results. Maybe a large
number of COBOL programmers are stuck behind their organization's private
repos and all we see are the languages they play with on the side for fun?

~~~
ryanbrunner
I think it could make a pretty large difference, actually. Languages like
JavaScript (I know it's not included here, but still), Ruby and Python have
large open source communities creating time of libraries in common usage.
Compare that with a language like C#, which has an extensive standard library
and comparatively less open source third party libraries. You'd expect that
even if there was an equivalent number of private repos in both cases, the
languages with a larger open source ecosystem would appear more popular in
this analysis.

------
myhf
Java -> Kotlin is still pretty low. It would be interesting to revisit this in
a year.

------
xoroshiro
I used to think I understand Linear Programming, and the transportation
problem. Is there a relationship between this and the Markov formulation? I'm
totally confused now. Posts like this make me feel guilty about not reviewing
them once in a while. And I guess while I'm at it with the questions:

>We have to add an artificial source and sink on both sides of our bipartite
graph to ensure flow conservation

Wasn't there a hack with the slack/surplus variable in the LP constraints to
deal with this or was it a dummy variable? Pretty sure that was able to handle
the case where supply was not equal to the demand.

Also, how were cases where the user stopped using GitHub altogether or a new
user started programming are handled?

------
stared
I would love to see this language transition thing as a graph - matrices are
rarely the most insightful tool to visualizing such data.

------
anon335dtzbvc
For reference
[https://madnight.github.io/githut/](https://madnight.github.io/githut/)
without excluding Javascript

------
LeonB
I'd like to see this data expressed in a sankey diagram. (I mistyped it as
snakey diagram at first... that's a good way to remember it!)

------
freshhawk
People do so much advanced analysis with these outrageously biased datasets
(this says nothing about "developers", this has analysis of "developers who
put repos on github, skewed towards prolific repo creators")

Yes, it's the only dataset you have. You still sound dumb when you inflate the
importance of the population you have data for to make the anlaysis sound more
useful.

~~~
antod
Relax, it's just a blog post from a "machine learning intern", and sounds like
just the kind of project you'd give an intern for experience.

Anyway the first paragraph also says: _Thus, it has become engaging to deepen
this idea and see how the popularity of languages changes among GitHub users._

I don't get the sense anyone is trying to inflate the importance of anything.

------
thehardsphere
Very interesting analysis. It seems to correspond with reality quite a bit
more than the Bernhardsson analysis.

------
pmarreck
In 2-3 years, I guarantee you that Elixir will appear strangely absent from
this blog post

Source: Consistently steep slope over time of Indeed job interest in Elixir
plus the fact that ElixirConf doubles in size every year

~~~
sheeshkebab
you're joking, right?

[https://www.indeed.com/jobtrends/q-elixir-q-java-q-
go.html](https://www.indeed.com/jobtrends/q-elixir-q-java-q-go.html)

~~~
pmarreck
YOU'RE joking, right? Because you're looking at the data entirely wrong. The
fact that a relatively new language has a CURRENT fraction of the interest of
a more established one is an idiotic comparison to make, all that matters are
the rates of change, and I challenge you to find another language with the
slope of the "Jobseeker Interest" line on this chart:

[https://www.indeed.com/jobtrends/q-elixir.html](https://www.indeed.com/jobtrends/q-elixir.html)

For comparison, Go, stagnant:
[https://www.indeed.com/jobtrends/q-Go.html](https://www.indeed.com/jobtrends/q-Go.html)

Java, slight increase:
[https://www.indeed.com/jobtrends/q-Java.html](https://www.indeed.com/jobtrends/q-Java.html)

Scala is the closest competitor I've found (and it's still not close):
[https://www.indeed.com/jobtrends/q-Scala.html](https://www.indeed.com/jobtrends/q-Scala.html)

F#, stagnant:
[https://www.indeed.com/jobtrends/q-F%23.html](https://www.indeed.com/jobtrends/q-F%23.html)

Haskell, stagnant:
[https://www.indeed.com/jobtrends/q-Haskell.html](https://www.indeed.com/jobtrends/q-Haskell.html)

Clojure, stagnant:
[https://www.indeed.com/jobtrends/q-Clojure.html](https://www.indeed.com/jobtrends/q-Clojure.html)

Erlang, stagnant:
[https://www.indeed.com/jobtrends/q-Erlang.html](https://www.indeed.com/jobtrends/q-Erlang.html)

Rust, almost stagnant, extremely gentle slope:
[https://www.indeed.com/jobtrends/q-Rust.html](https://www.indeed.com/jobtrends/q-Rust.html)

Elm is also rising fairly fast, but not as much as Elixir (almost, though):
[https://www.indeed.com/jobtrends/q-Elm.html](https://www.indeed.com/jobtrends/q-Elm.html)

So, my point, WITH data: _One of these things is not like the other_

Here's the kicker: [https://www.indeed.com/jobtrends/q-Clojure-q-Haskell-q-
Elixi...](https://www.indeed.com/jobtrends/q-Clojure-q-Haskell-q-Elixir-q-
Rust.html)

Elixir is about to pass _Clojure AND Haskell AND Rust_ in developer interest,
and shows no signs of abating

