Hacker News new | past | comments | ask | show | jobs | submit login
Language Trends on GitHub (github.com)
99 points by _jomo on Aug 19, 2015 | hide | past | web | favorite | 59 comments



Er, how is the "rank" defined? # of new repositories in that language during those years? # of total repositories in that language during those years? Or some weighted average?

I really am not fond of the trend of "arbitrary rankings" a lot of startups have been using recently as content marketing to create statistical analyses that cannot be questioned.

I could reverse-engineer the ranking chart using the GitHub Archive on BigQuery to check it, but I have no idea how to actually determine the statistic for ranking.


Rank is number of repositories with that language created in a given year, so e.g., the languages with the most repositories created would be ranked 1, the second most 2, etc.

Source: I ran the query.


So I ran this BigQuery:

   SELECT repository_language, COUNT(*) as num_repositories
   FROM [githubarchive:year.2014]
   WHERE type="CreateEvent"
   AND IS_EXPLICITLY_DEFINED(repository_language)
   GROUP BY repository_language
   ORDER BY num_repositories DESC
   LIMIT 10
which returns the top languages in 2014 by # of repositories created, per your statistic.

The output is:

JavaScript 1273811

Java 930359

Ruby 769712

Python 630549

PHP 601101

C 473113

CSS 445501

C++ 355996

C# 183349

Objective-C 165102

Which doesn't completely match up with the order in the chart for 2014. (Positions for Python and PHP are reversed)


From the original post:

> The rank represents languages used in public & private repositories

The delta is due to private repository usage influencing the rank.


It looks like a simple count of the language(s) detected by the linguist project -- https://github.com/github/linguist

Agreed, does not make much sense if it isn't a weighted average. A simple count could very well be littered with Hello-world repos (I've found a lot of those, in various languages) by people who are 'trying' github on.


Good idea, maybe something like the number of code lines. That would value bigger projects higher then small projects. Which actually make sense to me. Because small projects are probably written in a specific language for a reason: batch files, css, html, examples, tests. Whereas I am interested in what language people choose for bigger projects, where several languages could used but still one has to be chosen.

It would also make sense to weight in the date a project started. Newer (big) projects would then weight higher then older (big) projects.

So I think taking in account the number of lines and age together is a good way to really visualize trends.


This would inflate the popularity of a language in proportion to its verbosity, but that could be approximately normalized by using line count ratios from the Great Language Shootout, which explicitly tracks how many lines various languages use to solve the same problem. Precision is pointless--the line ratios vary from problem to problem--but a rough approximation would be useful as long as you could get the statistics with and without it.


I think number of contributors would be a good metric, even though it penalizes lone projects, but I suspect that would be relatively even across languages compared to other metrics. For example, LoC is proportional to verbosity. Commit count is emphasizes languages that tend to have more bugs.


The problem is that one language might require less lines of code to achieve something of the same size/complexity of another language, if it's terser.


IIRC the guy who wrote this article http://redmonk.com/dberkholz/2014/05/02/github-language-tren... about GitHub language trends scraped the numbers from the search page, e.g. the following search yields all repos created in 2014 and the language breakdown in the sidebar: https://github.com/search?utf8=%E2%9C%93&q=created%3A%22%3E%...


I have a Ruby on Rails project on Github. After I added Twitter Bootstrap and the Ace text editor to my project, Github started showing that my project is 85% JavaScript. I guess it is, but I didn't write all that JavaScript! So I'm a little skeptical about JavaScript being number 1 here.


I wonder if they're doing any sort of de-duplication. Surely 1000s of instances of the same jquery, bootstrap, etc shouldn't count towards JavaScript's language total.


you added the entire bootstrap project to your repo?


I put the minified CSS and JS files under vendor/assets/. Is that not kosher?


As the number of external dependency(i.e bootstrap) increases, I find using something like bower very useful. It's probably not worth it for solo projects where bootstrap is the only external dependency


Would be nice to have these stats just for private repos. My idea (maybe wrong) is that private repos are used mostly by startups and enterprises so will be interesting to know which language is really trending in the industry.


I'm just a casual unemployed student developer, but I use private repositories all of the time. It just seems easier to have a couple of private repositories, work on them until they're actually useful and then make them public and share the hell out of them.


I wish it showed more than the top 10. I'd like to see the rank among languages outside of the top 10. Like even Go and Rust.


Absolutely. This is useful but it is highly unlikely that any of the languages in this chart will disappear anytime soon. As I plan my learning-time investment budget for the next few years, it's only marginally interesting to me that blockbuster language Number 2 goes to Number 4 or whatever. I already know that any of these languages would be a good choice as long as they fit the applicable domain.

What matters much more to me is that language number 40 suddently find itself at 20, for example. To really plan for the future we need to peek at the nascent trends in small-language land. I'm much more interested in Ocaml, Julia, Nim, Rust, or Lua, even Cuda, OpenCL, Chapel or Cilk, than I am in Python, Java/script, or (sigh) CSS.


You can also look at this programming language rankings post by Stephen O'Grady with RedMonk for a longer tail list. (Though I suspect potential biases in the methodology of such rankings become more pronounced the further out on the tail you get.)

http://redmonk.com/sogrady/2015/07/01/language-rankings-6-15...

(Donnie used to be with RedMonk as well.)


Exactly!


I'm really surprised that JavaScript is not a steeper increase, and what's up with Java? Really Java?


Mentioned in the article - Android. It is the most popular mobile phone OS in the world right now.


Also probably that enterprises started migrating to Github. At the beginning it was probably early adopters platform with the languages that early adopters use.


Early adopters also use Java.


This is of course true in the trivial sense, but he means languages disproportionately used by early adopters (Ruby, Python, Go, etc.).


Yup, a large number of apps + a significant number of custom ROMs. It seems like GitHub became a standard for XDA-developers.


Well, JavaScript has already gotten the first place. It doesn't get any 'steeper' than that.


Steeper meaning a larger upwards delta. Starting at #2 and going to #1 is less "steep" than, say, starting at #7 and going to #4.


I think this is due to many existing projects moving to GitHub. With Google Code stagnating (and now shutting down), and a lot of Apache work being done on GitHub, I imagine many of the new repositories are actually just moves.


I'm under the impression that with Java 8 Java is becoming much more popular again. And rightfully so IMO.


Of course Java. It's still one of the best platforms out there, especially if you weigh in performance, stability and availability of open source libraries and free or cheap hosting.


Minecraft modding.


Java is great for middleware and backend systems.


They realized that Python / Ruby isn't that good for backend projects.


> They realized that Python / Ruby isn't that good for backend projects.

That's why both Github and Travis CI are coded in Ruby,hey ?


Like many startup they go with ruby as a PoC because it's fast to implement, now when you need a really solid and performant backend you don't use ruby / python.

http://www.quora.com/Why-did-twitter-move-away-from-Ruby-on-...


This puts things into perspective. As a full-time Clojure dev, this graph makes me feel part of a very niche group, for better or worse.


Except you target and interop well with #1 and #2 :)


It's interesting that with such a huge and diverse, rich set of Clojure repos on Github that everyone depends on and loves, they still didn't even rank enough to be on this list. I'll admit that was a bit surprising at first, but perhaps it shouldn't have been.


Related material, number of modules and average growth/day by [http://www.modulecounts.com/]

           count | avg/day
JavaScript..214,741.|.379/day.[Bower + npm]

Java..........114,860.|.108/day.[MAVEN]

Ruby.........106,195.|..50/day.[Rubygems]

Go..............86,512.|.299/day.[GoDoc]

PHP.............68,276.|..99/day.[Packagist + Pear]

Python.........64,865.|..56/day.[PyPI]

Modulecounts offers info for more languages, I just did a TL;DR

EDIT: is there a good way to present data on HN?


I'd be leery of putting too much trust in those numbers, as Linguist detects the language for many repos incorrectly. I've had a number of Grails projects detected as type "javascript" or type "css" until I went in and added a .gitattributes file to help it get things straight. And I'm pretty sure not everybody bothers to do that.


Ruby: look at me, being cool!

Java/JavaScript: lol, no you ain't!

PHP: whatever...


PHP: I ain't going nowhere.


If I'm reading this correctly, Objective-C is bumped out of the top 10. The line just sort of stops though, which doesn't give much indication of just how far down the list it is now.


Yup, and there's new HTML dot, so I suppose that HTML replaced Objective-C.


This would be better with absolute numbers, for comparison, as well as a longer tail, so we can, say, see how Tcl is doing against Perl.


CSS is the most intriguing I think, why the sudden peak since 2013? I would have guessed that we're writing less and less CSS these days with tools like preprocessors and autoprefixers.

UPDATE: I'm assuming the ranking is based on the LOC number. Might be wrong since, sadly, there is no mention on how languages are ranked...


GitHub counts preprocessors as CSS. For example, I have a project that only has .scss files but it says "2% CSS". Similarly, it says Bootstrap is built with CSS even though it uses LESS.


Just a bit surprised by the steady ascent of Java to #2. Are these enterprise programmers moving to open source?


More like mobile developers. Android development is mostly Java


"The rank represents languages used in public & private repositories, excluding forks, as detected by Linguist."

As it excludes forks, I have doubt about the data representing the actual number of repositories for the language.. as I have seen many forks doing better then their original repos..


All google code java projects are (auto)migrating to github. Also android.


Cirru -- the syntax I created is beyond 200..

I think CSS and PHP are tricky. HTML file are sometimes recognized as PHP files. And CSS, it hardly a general purpose language.


Would be nice to have the origin of the project... I suspect java is growing up because too much projects migrating to github.


Wow, more CSS than C,C++,C#. There is no hope for this world.


I hope it’s a sign that more designers / front-end people are adopting good development practices! CSS is already a nightmare to maintain so using source control can only be a good thing.


Why is there no hope for this world?

Less people using C/C++ just means less security vulnerabilities as far as I can tell. Not entirely sure why C# is listed but not Java here.


Why is there no hope for this world?

Because we are starting to neglect useful and important languages that shaped both our thinking and engineering. Instead, we use languages that babysit us at the expense of expressiveness and speed. Not a fan at all.

Less people using C/C++ just means less security vulnerabilities as far as I can tell.

Yeah, better rewrite Linux in CSS to make it secure. I heard CSS is turing-complete if you click through instructions, so it's possible!

Not entirely sure why C# is listed but not Java here.

Because there's more Java than CSS.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: