Hacker News new | past | comments | ask | show | jobs | submit login
On GitHub’s Programming Languages (arxiv.org)
77 points by jes on Mar 31, 2016 | hide | past | web | favorite | 38 comments

For whatever reason, GitHub Archive changed their data schema in 2015. https://www.githubarchive.org

The new schema conveniently lacks a repository_language field, and is likely the same reason the data in the paper abruptly stops at 2014. (Otherwise, I'd try to reproduce this analysis myself)

I'm surprised Go is #10. I don't actually know a single person who uses it. It seems like more of a hipster language. The others all seem like reasonable languages that occupy their niche in life.

For example, I'd expect C# to be more widely used than Go. Maybe the C# people don't share their code on Github, though. Perl should also be more widely used, though maybe that's all on CPAN.

Sure, Go may still be surrounded by some hype, but I wouldn't call it a hipster language. I started reading about it and messing around with it about a year ago. It is really fun to use for certain things (usually Network servers for me).

That is not to say it's a perfect language, but I have friends dedicated to python who love using Go and friends who use C and Java that love building personal projects with it. I really had no idea it was gaining adoption in industry and some areas of academia until recently though.

Go's popularity in hobby projects is probably due to it being a pleasure to work with. After programming in C++ at work it's a language I turn to at home when coding for my own enjoyment.

Some large companies use it: Google of course, canonical, coreos, cloudflare, dropbox, docker, sainsburys, gov.uk, booking.com. Not many hipsters. There's a longer list here https://github.com/golang/go/wiki/GoUsers

You're probably correct wrt C#. It seems most open source C# projects is either on codeproject or codeplex. I suspect it'll change now that Microsoft using github though.

Interesting that Python is clustered as a "system oriented programming" language. A lot of people use it for web dev (e.g. using Django or Flask).

> Interesting that Python is clustered as a "system oriented programming" language.

Perhaps there are lots of people/repos/projects that have both python code and C-extensions to python, causing them to cluster together.

I think of Python as programming language that lets me do anything from web, shell scripts, games, and even server environments (Matrix is a decent example). It's not my top language choice, but I still enjoy it as a scripting language because it has many use cases. I'm sure Ruby could be seen in a similar light as well, and maybe Perl as well.

So, a general purpose language.

I was interested to see that too. Especially how it was tightly clustered with C. Maybe there are a lot of projects of C modules with a Python wrapper bringing them all together?

Why is Python being relegated to "systems programming" along side C and C++?

There are probably literally millions of Python web apps out there, not to mention sites like Instagram and YouTube running on Python. Python hasn't ever really been a systems language.

It's not "relegated to", it's "best clustered into"

Maybe people write infrastructure stuff in it?


I didn't see this stated anywhere explicitly, but it looks like their Github data ends at Dec 2014.

Some of the correlations are interesting and could be used for marketing purposes.

Ruby: CSS is 25.76 Python: CSS is 15.75

Go is gaining fastest adoption from C, Ruby and Python developers.

TBH I can't remember the last time I saw non-normalized correlations in a study and the paper lost some credibility.. who does that, and why? What does a correlation of 25.76 mean? Relative to what? That's why you bound it 0 to 1.

Correlation is constrained to 0 and 1 as a side effect of the mathematics.

The definition of correlation mentioned in the paper makes zero sense whatsoever and apparently invokes Bayes's rule?

the correlation is that out of the people that program in language A, X% of them also program in language B. This can also be expressed as given that someone programs in language A, the probability that they program in language B is X. It looks like the table is expressing the correlation in terms of % even though they dont mention that on the table.

EDIT: I'm dumb

Because they're percentages.

Or are any of them greater than 100?

Python is a systems programming language? What definition are you using where that's reasonable?

I think the consensus on the definition of _system_ languages is currently very shaky. For a long time you would expect languages with the ability to develop operating systems (in the classic sense, providing a hardware abstraction layer, etc.), but more and more I get the feeling that people just mean "not JavaScript".

If this is a good or bad development will (or already is, as we have seen in discussions concerning Go) be the topic of many flame wars.

I guess "avoiding bash by using python" counts as systems programming :-)

I've seen vanilla JS described as "bare metal programming" a few times.

I think there's one or more build systems written in Python, in addition to stuff like Anaconda. A lot of scientists use Python for their work, which (to my mind) falls between web dev and systems programming, but the paper is not that granular.

Python gives Perl and bash a run for the money when it comes to systems scripting, if you're willing to catagorize "systems scripting" as systems programming.

So for the paper, "systems programming" means "not web development"? That's... both sad and appalling.

"GitHub is [...] distributed version control system.", hmm, well.

"Java Script" feels like spelling from 1995.

Are there any popular programming languages whose preferred written name contains a space? Object Pascal and Visual Basic are the only ones that come to mind. (Objective-C dodges it by a hair).

Common Lisp. If you want to count things like <architecture> Assembly as languages, e.g. x86 assembly.

There was also Component Pascal. And Franz Lisp, if you accept dialect names. Also, OCaml used to be called Objective Caml, if I'm not mistaken.

Lisp-Flavored Erlang, though LFE is probably more commonly written.

Sad that the data ends in 2014 :(

Odd, they completely ignore Perl.

They mentioned on page 4 of the paper that Perl ranks 17th by their criteria of language popularity

anyone else viscerally bothered by the use of phylogenetic in the abstract?

I was too busy being upset by how awful the site looks on mobile devices.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact