
Programming languages ranked by expressiveness - dsberkholz
http://redmonk.com/dberkholz/2013/03/25/programming-languages-ranked-by-expressiveness/
======
tikhonj
Yeah, most of this is just drawing conclusions from what is essentially noise.
Especially the consistency measure: that's bound to be heavily affected by how
many people use a given language. The so-called "tier 1" are not popular
because they're consistent--they're consistent because they're popular. The
same goes for the tier 3 languages in reverse: they probably have so little
activity that a high variance is inevitable.

Also, number of lines per commit is not really a good measure of expressivity.
I don't even see how it's a reasonable proxy: the number of lines I commit
changes more depending on my project (is it simple or complex, for work or for
fun?) than on my language.

Also, it makes literally no sense at all to consider very domain-specific
languages like puppet: you may as well talk about how expressive HTML or CSS
are relative to normal programming languages!

Basically, I think this article draws too many arbitrary conclusions on
limited data.

~~~
lkrubner
"the number of lines I commit changes more depending on my project (is it
simple or complex, for work or for fun?) than on my language."

This amounts to saying: Some days I eat eggs for breakfast, but other days I
eat oatmeal for breakfast, so I am totally inconsistent in what I eat from day
to day, therefore it would make no sense to include me on a survey about what
people in my country eat.

You do understand that a single data point might be useless, but when combined
with thousands or millions of other data points, it becomes useful? What you
do on any one project does not matter, but what you do, combined with
thousands of other developers, all averaged together, starts to get
interesting.

If you honestly believed your own premise, then you would expect all the
results to be the same -- there would be no variation between Fortran, Java,
Javascript, Clojure or Coffeescript, cause, you know, everybody is different
and does different stuff, and its all so crazy, how can anybody make sense of
it?

But we can make sense of it. All that is needed is a good understanding of
probability and a sufficiently large data set.

Mind you, the article above might be total bunk. There might be lots wrong
with the dataset. But its not bunk for the reason you give: " the number of
lines I commit changes more depending on my project (is it simple or complex,
for work or for fun?) than on my language."

~~~
paulhodge
With lots of data there's no doubt that they are measuring _something_ , but
it's not clear that they are measuring what they want.

Your breakfast analogy would be more like:

\- Some days Alice eats crumpets while sitting in her house and it takes her
about 20 minutes.

\- Some days Bob eats a bagel while on the way to work and it takes him about
5 minutes.

\- Therefore, bagels are 4 times faster to eat than crumpets.

See the problem is we have conflated lots of correlations at once. There is 1)
a tendency for certain types of foods (languages) to be used for certain
purposes, and a tendency for certain kinds of people to prefer certain foods
(languages), and neither of those correlations are necessarily caused by the
properties of the food (language) itself.

------
yxhuvud
I'm not certain that commit size is a good measure of expressiveness. Even not
counting obvious outliers like javascript (oops, 3241245 people had a commit
with jquery inside), it will be very dependent on the cultures in the
different languages.

Some languages have users that are less mature in VCS usage than others, some
languages have users that spend a lot more time writing tests (which create
larger commits) etc.

~~~
kevinnk
I agree, for all the time he spends on his results, as far as I can tell he
never really defends the idea that commit size is in any way a good proxy for
"expressiveness." It would be nice if he could explain why he chose commit
size over somthing like rosetta code size or some other metric.

~~~
dsberkholz
I quickly mentioned it in the caveats section up top. The underlying
justification/assumption is that commits are generally used to add a single
conceptual piece regardless of which language it’s programmed in.

As for why this metric and not others, it's the one I could actually get the
data for. I don't have the time or space to download literally millions of
repositories myself, so I used what I could get access to.

~~~
btilly
_The underlying justification/assumption is that commits are generally used to
add a single conceptual piece regardless of which language it’s programmed
in._

That is only true in some programming communities.

Essentially what you are selecting for is languages whose users seem to have
digested some of the same software development memes that you have. Those
users are going to be generally drawn to expressive languages, and will have
very focused commits in those languages. So there is a correlation between
expressiveness and small commits.

But it is only a correlation. For instance I've programmed in both Perl and
Ruby. Of the two, Ruby is more expressive. (Universal opinion of everyone that
I know who has programmed both.) However Perl is very "unhip", so people doing
open source software development in Perl these days tend to be people who have
been programming for some time, which means that they've absorbed a lot of
good programming ideas. (Seriously, once you get past the reputation for
"unmaintainable line noise", a lot of surprisingly good code is written in
Perl.) Thus Perl outranks Ruby in this list.

What would be much more informative is the ratio of lines of code/commit
between languages for users that have programmed in both languages. It would
take more work to do an analysis on the principles of that analysis, and you'd
reveal similar trends, but the analysis would be far more informative.

------
rohern
There is a lot of weak-ass criticism going on in this thread when the data --
whatever about its methodology is troubling -- seems to almost perfectly back
up what is the common experience among programmers. Yes, copy-and-paste
doubtlessly affected the numbers for JavaScript, but I am not at all surprised
to see JavaScript where it is.

Does anyone here really doubt that you can get more done with a single line of
Python than a line of C/Java/C++? Same for Clojure/Common Lisp/Racket versus
Python.

We might not take individual ranking too seriously, and none of this affects
language choice when performance is a critical concern (though the spacing
between Scala, OCaml, and Go is interesting and relevant to this), but do you
guys honestly doubt the trend here? Does anyone have a strong counter-example?
It seems like the authors may have had a decent notion with using LOC as a
measure. There is no proof of this here, but I am intrigued by it.

The final conclusions in favor of CoffeeScript, Clojure, and Python are again
pretty obvious. Is anyone going to suggest JavaScript or C++ is more
expressive than any of these?

~~~
dragonwriter
> There is a lot of weak-ass criticism going on in this thread when the data
> -- whatever about its methodology is troubling -- seems to almost perfectly
> back up what is the common experience among programmers.

So?

I mean, really, I can come up with completely bogus metrics all day, and
whenever one produces results in a domain that happen to align with CW in that
domain post a infographic using it, but that doesn't make that metric
meaningful.

> The final conclusions in favor of CoffeeScript, Clojure, and Python are
> pretty obvious, I would think.</blockquote>

So? A metric that has no intrinsic validity doesn't become valuable just
because it produces conclusions which match what you would have assumed to be
true (whether based on valid logic or not) before encountering the metric.

~~~
rohern
Yes, thank you for that 9th-grade science lesson.

The commenters in this thread are writing off the data because...? They
decided the measure is bad? When the measure conforms to experience, it's
probably worthwhile to look into it. This doesn't mean that correlation
implies causation and yada yada 9th-grade science lesson.

~~~
dragonwriter
> The commenters in this thread are writing off the data because...? They
> decided the measure is bad?

Yes, because what the measure actually measures isn't a valid proxy for what
it purports to measure.

> When the measure conforms to experience, it's probably worthwhile to look
> into it.

No, if the adopted proxy (here, "LOC per commit") has some sound rationale for
being used as a proxy for the actual quality of interest (here
"expressiveness"), then it is worth actually getting some results with it for
which you have a firm expectation of what those results would look like if you
were able to directly measure the quantity (in this case "expressiveness") for
which you are using the proxy (in this case "LOC per commit").

If after such testing the proxy -- which you first looked to for
reasonableness, and then tested on the "simple" data for which you had a firm
expectation of what the results would be for the quality of interest -- seems
workable, its worth investigating what kinds of results in returns for things
which you don't have a firm idea of where they would fall. (Which is the only
reason you actually use a proxy measure for in the first place.)

In this case, the proxy fails at the first test (sound rationale for using it
as a proxy for expressiveness), which makes the second test (do the results
line up with what you'd expect on a known sample set) meaningless.

~~~
rohern
Obviously I and the writer of the article disagree with you that it fails the
first case.

~~~
dragonwriter
> Obviously I and the writer of the article disagree with you that it fails
> the first case.

That's hard to tell in your case, since most of your commentary has been
explicitly skipping past the criticism of the failure of the proxy to have a
clear link to the thing it was taken as a proxy to say that doesn't matter
since the results were about what you would xpect, rather than actually
addressing the criticism.

So it sounds like you were failing to understand the first test more than you
were disagreeing with the criticism based on it. And, as yet, you haven't
stated any _reason_ for disagreeing, just continued to skip to the second
test.

~~~
rohern
The supposition is that a more expressive language lets you do more with a
single line of code on average than a less expressive language. The second
supposition is that commits tend to be done to gather code expressing a single
chunk of functionality in a program, so that on the average commits have the
same utility in terms of what they contribute to the source project.

It's clear from this, I would think, why therefore length-of-commit is
supposed to be a good proxy for measuring expressiveness.

To be clear -- the reason that it is obvious that I and the author disagree
with you on the first case is because your objection was a) an elementary one
and a consideration important to all such investigations, therefore it would
be considered by anyone doing such an investigation or analyzing one and b) we
were disagreeing with you anyway.

~~~
dragonwriter
> The supposition is that a more expressive language lets you do more with a
> single line of code on average than a less expressive language. The second
> supposition is that commits tend to be done to gather code expressing a
> single chunk of functionality in a program, so that on the average commits
> have the same utility in terms of what they contribute to the source
> project.

There's no really good reason to suspect that the second of these suppositions
holds _to the same degree across different languages_ (which basically is
equivalent to the assumption that development practices are independent of
language.)

> To be clear -- the reason that it is obvious that I and the author disagree
> with you on the first case is because your objection was a) an elementary
> one and a consideration important to all such investigations, therefore it
> would be considered by anyone doing such an investigation or analyzing one
> and b) we were disagreeing with you anyway.

It would clearly be considered by anyone _competent_ doing such investigation,
but since your first post on this thread didn't acknowledge the basis and
challenge the correctness of the common criticism in the thread based on
concerns of this type but explicitly and emphatically stated a lack of
understanding of what the complaints were about, it appeared quite clearly
that you didn't get it. The assumption of basic competence may be warranted,
if only out of politeness, when someone doesn't explicitly state something
inconsistent with that assumption, but when they do, that assumption becomes
unwarranted.

------
VeejayRampay
The good thing about Coffeescript is that it actually seems to deliver what it
was promising in the first place, terseness, expressiveness, simplicity and
good Javascript output.

It is really a joy to develop with (though my being a Ruby programmer probably
makes me a somehow biased and enthusiastic candidate).

------
btucker
Now I want to see the same thing, but based on the length of the commit
message instead.

------
chrisdevereux
Wait... JavaScript and CoffeeScript end up at _opposite_ extremes while having
near-enough identical semantics?

That's a big red flag on this as a measure of expressiveness.

------
danso
It's hard to tell if the fact that CoffeeScript is #6 (the highest ranked
major language) and Javascript is #51 (second to last) is a reflection on how
much of a shift CS is from JS, or on the quality of methodology and metrics in
the OP.

~~~
vec
I'm thinking that's mostly due to the testing methodology. In JS it's pretty
common to start a project by downloading and committing jQuery, Underscore,
and whatever framework dependencies you're using. And just speaking from
experience, coffeescript is better, but it's not _that much_ better (I'd guess
~1.5 lines of JS per line of CS, the chart shows somewhere around 8:1).

~~~
randomdata
Coffeescript, I find, is more apt to force you into a more cohesive pattern
for structuring your code which could help avoid large refactoring stints.

------
igouy
> let the results speak for themselves

The difference between Fortran Free Format and Fortran Fixed Format should be
enough to tell us that _lines of code per commit_ is all about how much stuff
you put on a line!

(Is `nl` significant in the language syntax? Were readable wide screen
displays available when the code was written?)

------
gjm11
Note that in fifth place -- ahead of Coffeescript, Clojure, Python, etc.,
etc., -- is eC, a that's basically C plus a few OO features and a GUI library.
It has, for instance, no automatic memory management (besides some rather
primitive refcounting); neither dynamic typing nor type inference; no nice
literal syntax for collection types; in short, whatever its merits it is not
an outstandingly expressive language.

There is surely some correlation between short commits and expressiveness, but
they're far enough apart that I think the title is very misleading.

------
tootie
Terse isn't the same as expressive. Consider Scala vs Java. In Java,
definining a singleton involves implementing a design pattern in one of a few
ways. Maybe you have a private constructor and a static initializer. In Scala
you define your class with `object` and it's done. That's concise and
expressive.

In Java, if your generic class has a lower bound you write Class Foo<T extends
Bar> while in Scala you write def Foo[T<%Bar] which is just an abbreviation.
Replacing a word with punctuation. One is good, one isn't.

------
tmsh
The real measure is features shipped.

It's pretty hard to disambiguate speed of development, fluidity and
flexibility (which potentially increases LOC per commit by bundling multiple
'conceptual pieces') with expressiveness (which decreases LOC per commit) in a
single LOC per commit metric.

The idea that a single commit corresponds to a 'single conceptual piece' is
probably not very precise. It also doesn't measure for the complexity of the
conceptual piece. It hasn't been established that the same level of
'conceptual pieces' are tackled across all programming languages per commit.

Just some thoughts. That said, I think though given all the factors involved
(more expressive concepts per commit are perhaps tackled in more expressive
languages, and that cancels out that less expressive concepts are tackled per
commit in less expressive languages -- that balances out simplicity in simpler
/ less expressive languages can lead to more actual features commited), that
actually the methodology kind of works. But, like others here, I wouldn't
presume it's quite so simple underneath.

------
sharkbot
There are a few surprises in that list. I'd have thought both Coq and Lua
would end up further to the left. I suppose the graph shows an interesting
dynamic: expressiveness is not just due to a language's semantics, nor syntax,
nor libraries. You need a healthy interplay between all three aspects of a
programming language to truly be "expressive".

------
fusiongyro
The whiskers on the Prolog and Logtalk plots would tend to confirm what PG had
said in one place or another, that a language can be both more high level and
less expressive, apparently using a definition of expressive related to line
count. In that light I'm kind of surprised they do as well as they do in this
post.

As I get more serious about Prolog I think it's kind of a shame more people
aren't exposed to it (apart from apparently dreaded university classes). It's
pretty impressive just how quickly one can write a parser, and for writing an
"internal" DSL it rivals or exceeds Lisp (depending on one's taste).

------
sliverstorm
Hah! Take that, Python aficionados!

\-- Perl user

~~~
pekk
What matters more is how much is expressed to you when you have to read your
own code again in a year - or when someone else does

\-- Python user

------
mikebabineau
This assumes that languages are used to solve similar classes of problems, or
that, in aggregate, those different classes result in similarly-sized
"feature" chunks.

I think it's fair to say that nobody's ever written a forum in puppet, and
that few people are working on micro-[web]frameworks in Fortran.

Interesting data, just need to be careful about what conclusions are drawn
from it.

------
jug6ernaut
Does this data take into effect the total lines of code in the project being
committed to? Would this not have an effect on the results?

~~~
dsberkholz
No, it's a combined total across all ~7.5 million open-source projects in
Ohloh, so (potentially real) effects like that are averaged out.

------
hp50g
No sign of RPL, which is insanely expressive :(

~~~
bjterry
Are you referring to the programming language for calculators? Wikipedia
doesn't make it seem that expressive, and I can't find much else associated
with the acronym RPL.

~~~
hp50g
Yes. The Wikipedia article is to be honest, crap. RPL is a functional stack
based language sort of like lisp and forth combined with mathematica thrown in
for good measure.

Its both compiled and interpreted and a system language and a user language.
Couple of examples:

'X=Y^2' 'Y' SOLVE 149 'X' STO EVAL @evaluate first expr for Y=149 .. Including
optimising it by rearranging it

{ "a" "b" "c" } SORT REVLIST << "M" + >> MAP @sort then reverse then add M
string to every list item using anonymous function and map.

Storage is entirely transparent and persistent as well.

Quite my favourite language these days. I can actually do real work with it
and it runs in my pocket on a 75MHz ARM (which is more than enough for it),
has built in context sensitive help, a debugger that even puts gdb to shame
and has 2Gb of persistent storage and lasts a month on 4 NiMh eneloop AAA's.
All for £79 :)

It also doesn't have any distractions like the internet.

------
billsix
>One proxy for this is how many lines of code change in each commit

This premise is complete rubbish.

------
coolsunglasses
APL and J would win this handily, but we don't use those languages for a
reason.

------
stevedekorte
Hm, how can Objective-J have such a different ranking than Objective-C?

------
papsosouid
While their chart confirms my personal bias (that FP is more expressive), I
think their methodology is flawed to the point of making the whole thing
meaningless. They are not measuring expressiveness, they are measuring lines
of code per commit. Having a community that likes lots of small commits
doesn't actually make a language more expressive than one with a community
that likes fewer, larger commits. They even acknowledge that the javascript
numbers are basically meaningless because it is so common to copy+paste entire
big external dependencies into a javascript project.

~~~
simcop2387
I'd agree, I'd expect something like APL to show up on here in terms of sheer
expressiveness. While it can be a pain to program in because of the funky
characters, it's actually very very expressive.

~~~
dsberkholz
APL is insufficiently popular to be considered here, otherwise I'm sure you
would be right.

------
sultezdukes
That's why I think people should take a look at the next generation of Rebol
<http://www.red-lang.org/>.

Fast as C, expressive as Lisp, and more readable than Ruby...well, that's the
goals at least ;)

