Author here. You are absolutely right. As I mentioned in the notes, I think this...

j2kun · on March 16, 2017

The thing I like most about this post is that it's falsifiable. We will know in ten years whether C and Java are still popular, and whether Go succeeds in the sense this data suggests. So thank you for being concrete and clear, even if it's all in fun and other people don't like it :)

thaumasiotes · on March 16, 2017

> whether Go succeeds in the sense this data suggests

An interesting thing about this methodology is that it is extremely sensitive to the age of a language. It's possible to switch from an old language to a new language, but not the other way around -- so if you happen to do your measurements after a language has had some uptake but before it's been around for long enough that people have built significant projects on it and subsequently gotten sick of it, the future distribution by this method can only be 100% New Language. (Because sometimes people switch to New Language, but no one ever switches away.)

Actually, to predict the future distribution of language use, you also need to know the rate of people moving from nothing ("I just had a brilliant idea!") to each language. If everyone eventually transitions to Go, but everyone starts in Ruby, then the division of market share between Go and Ruby depends in part on how frequently people start new projects.

DocSavage · on March 16, 2017

The sorted stochastic matrix shows that C contradicts your assumption that it's not possible to switch from a new language to an old one. Or, at least, it shows that portions of new language code are occasionally rewritten in C.

kem · on March 17, 2017

What's missing from the matrix is "no language" language or null language. That is, a column and row that represents people who start projects from scratch in a given language.

I agree that the analysis makes it abundantly clear people move to older languages, but the question is what new projects are started in, and how many projects represent new versus transitioned projects.

This analysis is interesting, and gives a rough idea of what people are moving from and to when they decide to do that, but not necessarily popularity.

What the author is indexing in the end isn't really predicted overall language use, it's predicted transition target frequency.

thaumasiotes · on March 17, 2017

I have defined "new language" in my comment as one so new that no significant projects exist in that language, not as one which is newer than C but still arbitrarily old.

> if you happen to do your measurements after a language has had some uptake but before it's been around for long enough that people have built significant projects on it and subsequently gotten sick of it

By this definition, it is not possible to switch from a new language to anything.

It's stated as a binary, but really this defines a continuum of newness, and the metric of the OP is very sensitive to it.

j2kun · on March 16, 2017

I didn't interpret the results in this post as "predicting the future distribution of language use." Rather, I interpreted the ranking as an indicator of a qualitative trend in that distribution. I think the author also made this very clear.

And of course, all things trend toward newness, so your objection there seems more about time or human psychology than the methodology of the post.

thaumasiotes · on March 17, 2017

From the post:

> I took the stochastic matrix sorted by the future popularity of the language (as predicted by the first eigenvector).

Emphasis in original.

j2kun · on March 17, 2017

Nevertheless, it seems quite obvious the author did not literally interpret the eigenvector as "x% of future projects will be written in Go." Rather, the conclusions he drew were along the lines of "Oh wow look Go is on top, C and Java are still relevant."

thaumasiotes · on March 17, 2017

I reserve the right to respond to what people say. Commenting on the accuracy of a label is worthwhile regardless of whether the label was meant to be precise or loose.

erikbern · on March 16, 2017

This is a great point. It's a model that predicts something about the future. I could have backtested it on historical Google stats to figure out if it's a good model :)

j2kun · on March 16, 2017

In an unrelated note: interested in doing a guest post over at Math ∩ Programming? I see you've got lots of cool stuff with high-dimensional NN :)

omaranto · on March 16, 2017

That bit about the stationary distribution not changing if you add a diagonal matrix sounds completely wrong to me. Let me see if I understand what you mean. Given a matrix M with non-negative entries (and no row of just zeros), let S(M) denote the stochastic matrix you get by normalizing each row of M. You are saying that if M is any matrix and D is a diagonal matrix with non-negative entries then S(M) and S(M+D) have the same stationary distribution?

MrQuincle · on March 16, 2017

Moreover data is collected over the entire history. A matrix is a linear operator from time step T_i to T_i+1. By conflating all historical observations into one matrix it definitely is not an ordinary transition matrix.

That apart from the fact that it is questionable that it can be represented by an operator that is finite and linear.

It's more likely a stochastic process (infinite matrix) with births and deaths.

I would be surprised if it became true. :-)

foobun · on March 16, 2017

That was my immediate thought upon reading. A little more accuracy in the description would be helpful. This should be presented as: "If this aggregated data reflected a constant across time, then we can see where language usage would end up in the 'long run distribution'." But the jump matrix shown here will change with time. Most likely, if search results could be binned up by time period, the time dependence of the distribution represented by the eigenvector(s) would be somewhat interesting to watch, even if not remotely predictive. Could animate that or contour plot it.... exercise for the author... :)

erikbern · on March 16, 2017

Right. Both the matrix S and the identity matrix will project the stationary distribution onto itself. So any linear combination of them will project the stationary distribution onto itself. Let me know if I'm saying something really stupid

omaranto · on March 16, 2017

Oh, you meant "multiple of the identity matrix" when you said "diagonal matrix"?

agency · on March 16, 2017

Doesn't that mean you're assuming the number of people staying with any language X is the same regardless of X?

PeterisP · on March 16, 2017

[[1 1] [1 1]] and [[10 1] [1 1]] will have different stationary distributions, the values on the diagonal will likely be different than a multiple of the identity matrix.

jabot · on March 16, 2017

Another thing which seems at least as important:

How many projects are _started_ in a language, and how many _die_?

Given p.e. Java - It may seem that there is a huge flow to go. However, if there are enough new projects started in java, then the number of java projects might still rise faster than the number of go projects.

pmarreck · on March 16, 2017

And I share my Go reservations with you, given the whole error-handling (or lack thereof) philosophy as well as information emerging that it may require 100 lines of Go to do roughly the same amount of work as 20 lines of Elixir or Haskell, according to one example at https://medium.com/unbabel-dev/a-tale-of-three-kings-e0be17a...

(Although I concluded the Haskell-Elixir equivalency myself based on functional semantics)

psiclops · on March 16, 2017

Go is verbose. There is a lot of thought behind that, but that is an intentional design aspect of the language. Personally I would not use gin, either net/http, gorilla/mux, or httprouter are solid choices

pmarreck · on March 17, 2017

What is the empirically-determined advantage of verbosity, then?

psiclops · on March 23, 2017

A lot of it boils down to making it easier for developers to work with each other, rather than any technological benefit.

The less magic that happens and the more code that is commonly used by all developers, the easier it is for others to read your code and understand what it does. Rob Pike and others have some interesting talks and blog posts on this