
A Large-Scale Study of Programming Languages and Code Quality in GitHub (2014) - pk2200
https://cacm.acm.org/magazines/2017/10/221326-a-large-scale-study-of-programming-languages-and-code-quality-in-github/fulltext
======
defined
Although this is interesting, we should perhaps not hold a high degree of
confidence in the results, because the methodology relies on the content of
commit messages and the number of commits (if I read this correctly).

\- The content of commit messages varies widely in expressiveness and
meaningfulness, from 'Fix bug' to a detailed explanation. This confounds the
classification of a commit.

\- The number of commits can be very misleading depending on the committer's
workflow. Some committers merge topic branches to include all their
intermediate work in progress commits, which could overrepresent commits
flagged as errors. Other committers rebase their topic branches into fewer, or
even single, commits before merging. Or, some commits may fix multiple
defects.

This kind of analysis is conceptually a worthy endeavor; it would be more
meaningful if the metrics it employed were more strongly correlated with the
attributes it was trying to analyze.

~~~
lqdc13
Also the way one works in the various languages is different. Some people are
more likely to push barely working code to github because of the language's
culture.

Perhaps timing should be considered as well - how long it takes to implement a
feature including fixing its associated bugs.

------
rixed
Figure 2 suggests another possible bias favoring functional, managed
languages: a lot of errors for C/C++ are related to concurrency and
performance. But those are mostly non-bugs for other languages, since when
concurrency or performance are a requirement then most of those studied
languages would not be considered anyway.

It seems similar to the paradox that makes the best medicine appear to have a
lower survival rate just because it's given to most serious patients.

~~~
conistonwater
Where do you see this in the paper? They say concurrency errors are mostly the
usual things like deadlocks and race conditions, but those absolutely do exist
in every language.

Also, what do you mean most of these languages wouldn't be considered when
concurrency is required? Concurrency is bog standard everywhere.

It seems like the way the define a bug, a performance bug would be a bug
relative to expectations, per project, so you can definitely have a
performance bug in Go or Haskell, for example, if something works slower than
developers think it should (as opposed to being slower than some external
reference code or something). So maybe it's closer to something like
"developer control over unexpected underperformance"?

~~~
c3534l
Not even every language in that study supports concurrency, as the study
itself points out. I hear a lot of praise for Go because of how much people
like doing concurrency with it. The fact that they observed a higher rate of
concurrency bugs in Go could just as easily support the interpretation that Go
is good for concurrency as it does the interpretation is bad for concurrency.

~~~
muricula
Since Go makes concurrency easier and encourages its use, there’s going to be
a lot more concurrency bugs. By contrast languages like Python don’t even have
proper parallel threads, so fewer people will write concurrent python programs
and fewer bugs will arise. This is a confounding factor found in one sort or
another throughout the survey.

It’s good that they did this research but unfortunately they couldn’t account
for everything.

~~~
didibus
They talk about this, and did do some sort of things to account for it a
little. That's why the conclude that more so then overall defect, languages
are more correlated to categories of defects.

------
RickHull
Uh, where is it shown how "software quality" or "code quality" is measured or
determined? Can anyone provide a succinct definition of _quality_ which the
paper uses?

As best I can tell, they use commit messages to identify bugfixes, and later
they jump to "defective commits". Presumably the bugfix commit is not the
defective commit. There is no explanation I can find that shows how they
arrive at a defective commit from a bugfix commit.

This specific methodology seems rife with weakness, all of which should be
explained clearly and admitted up front.

~~~
zzzcpan
They still have some useful data that you can interpret yourself.

------
hellofunk
Don't be fooled by the October 2017 date of this article. There should be a
2014 in the title, since this appears to be a re-print of a prior study:

[http://web.cs.ucdavis.edu/~filkov/papers/lang_github.pdf](http://web.cs.ucdavis.edu/~filkov/papers/lang_github.pdf)

------
foolfoolz
curl is an extremely successful tool/library, I would consider it high quality
without knowing the ratio of Normal patches to bug fixes. Skyrim is well known
to crash and corrupt save games but regarded as one of the greatest RPGs ever
made. the game programmers obviously did a lot of things right to produce such
a hit. I'm not saying you need to be a worldwide success to write quality
code, just that low bug count doesn't always mean high quality and the other
way around. quality is measured by user experience

~~~
keithnz
In this case, it's not really how useful/appealling the software is, but if
there's some kind of correlation between languages and defects. So if skyrim
was rewritten is haskell, then perhaps they'd have less defects.

But looking at the data and their analysis, while there's some interesting
stats saying typed and functional languages have a correlation with less
defects, there's just too many variables at play.

~~~
katastic
Skyrim actually runs in a VM. So it's already futuristic. And still crashes
like a piece of crap written.

I think the 2nd or 3rd STALKER (or all) run in VM as well and... crash and
corrupt saves like crazy.

And for those who don't get how it's related. A VM is "supposed" to be pretty
damn crash-proof and "safer" much like functional languages. But that doesn't
stop deadlines and bad coding practices from creating broken products.

~~~
nine_k
Every Java program runs on a VM. It does not magically prevent sloppy Java
code from crashing. It does prevent a lot of memory errors typical for sloppy
C code, though.

------
kccqzy
I’m mildly surprised by how well Clojure performs here. It isn’t statically
typed yet fares much better than Haskell/Scala! From my experience Clojure is
also a joy to write, sometimes even more than Haskell.

~~~
hellofunk
I have read elsewhere (in many places, in fact, such as the interesting read
"Out of the Tar Pit") that the frequency of bugs in a project scales
proportionally to the code size. And that this has a greater influence on bugs
than features of any particular language.

Clojure is an incredibly succinct language. It uses about half as many lines
as Elm, 5% as many lines as C++. I love other languages, but nothing rivals
Clojure in elegance. I believe this is the key reason why Clojure projects are
so low on bugs -- they are much simpler to maintain, refactor, or rewrite
entirely than in most other languages, so fixing problems is not the chore it
can be elsewhere.

~~~
jstimpfle
> 5% as many lines as C++

Dude, you need a reality check.

~~~
pmarreck
One only needs to look at RosettaCode for numerous examples

[http://rosettacode.org/wiki/Horner%27s_rule_for_polynomial_e...](http://rosettacode.org/wiki/Horner%27s_rule_for_polynomial_evaluation#C.2B.2B)

~~~
jstimpfle
Oh come on. First, 20 vs 4 lines, which is 20%. Second, not exactly the most
compact C++ version. You can easily make one in 10 lines even in a clean C
style. Third, these code-golf comparisons are beyond silly.

~~~
pmarreck
20% is still impressive and not atypical in my experience from coding in
mutable/OO vs. functional languages

I wonder what Rust would look like here

------
yogthos
I think it's a good starting point to look at a large number of open source
projects in the wild. The individual differences in skill, size, etc. average
out between them. It's important to establish whether any statistically
significant trends exist before anything further can be discussed
meaningfully.

If we see empirical evidence that projects written in certain types of
languages consistently perform better in a particular area, such as reduction
in defects, we can then make a hypothesis as to why that is.

For example, if there was statistical evidence to indicate that using Haskell
reduces defects, a hypothesis could be made that the the Haskell type system
plays a role here. That hypothesis could then be further tested, and that
would tell us whether it's correct or not.

However, this is pretty much the opposite of what happens in discussions about
features such as static typing. People state that static typing has benefits
and then try to fit the evidence to fit that claim. Even the authors of this
study fall into this trap. They read into the preconceived notions that are
not supported by the data in their results. The differences they found are so
small that it's reasonable to say that the impact of the language is
negligible.

~~~
runT1ME
Yes, surely it's a coincidence that two of the most powerful statically typed
languages have the least defects.

~~~
yogthos
Perhaps you should actually read the conclusion before getting too excited:

>One should take care not to overestimate the impact of language on defects.
While the observed relationships are statistically significant, the effects
are quite small. Analysis of deviance reveals that language accounts for less
than 1% of the total explained deviance.

Nor do these most powerful statically typed languages appear to perform any
better than dynamically typed Clojure and Erlang.

------
euske
I think the meta-conclusion that we're getting is that "this kind of study is
extremely haarrrd!"

------
hoytech
> For example, in languages like Perl, JavaScript, and CoffeeScript adding a
> string to a number is permissible (e.g., "5" \+ 2 yields "52"). The same
> operation yields 7 in Php. Such an operation is not permitted in languages
> such as Java and Python as they do not allow implicit conversion.

Regarding Perl, the quoted statement is wrong:

    
    
        $ perl -E 'say "5" + 2'
        7
    

Furthermore, this is not an implicit conversion. The + operator is an
_explicit_ numeric conversion. Here's a more detailed description:

[https://codespeaks.blogspot.ca/2007/09/ruby-and-python-
overl...](https://codespeaks.blogspot.ca/2007/09/ruby-and-python-overload-
operator-for.html)

~~~
gipp
Your link seems to argue that the simple fact that it's common convention
makes it explicit, which doesn't really seem to hold water.

~~~
hoytech
Consider this javascript function:

    
    
        function add(a,b) { return a + b; }
    

Although no conversion is requested explicitly in the function definition, a
conversion may take place depending on the types of the arguments passed in:

    
    
        > add(1,2)
        3
        >add("1",2)
        "12"
    

The article in question defines implicit conversion in this way, and in my
experience it's a fairly common term.

I was pointing out that per this definition, the article is wrong in saying
that perl's + operator may perform an implicit conversion. In perl the +
operator always performs a numeric conversion of both its operands, regardless
of types. By writing + you are explicitly requesting numeric conversion of
both arguments.

In general perl doesn't perform implicit conversion (of course there are some
exceptions -- it is perl after all). It does this by not overloading operators
like + for different operations such as addition and concatenation.

This also has the nice property that you can count on a+b == b+a, unlike
python for instance. (However, in python PEP-465, non-commutativity was a
stated advantage of adopting @ for matrix multiplication instead of
overloading *, go figure).

------
j2kun
They use... varying p-values? Can you do that? It almost looks like they're
choosing p after the analysis is done...

~~~
s3nnyy
That is called p-hacking I guess?

~~~
carbor1
No, not necessarily. It's standard practice to pick a p-value significance
cut-off (0.05), but report the smallest such standard cut-off that any
particular value meets. So "p < 0.001" is reported for values that meet that
threshold. Anything over the cut-off is just not reported as significant.

~~~
j2kun
That seems dishonest to me. They're saying that some results are more
significant after the fact. Is there any mathematical justification for why
this is OK?

~~~
dx034
As far as I know there is not. There is no such thing as "more significant",
results either are significant or they're not.

------
davedx
Fascinating study, but I think a lot of the conclusions in this study are
self-evident. For example:

"However when compared to the average, as a group, languages that do not allow
implicit type conversion are less error-prone while those that do are more
error-prone."

A lot of the conclusions are along these lines: languages with explicit type
conversion have less [type conversion] errors. Well, of course...

Still worth a read though, and makes a strong case for functional, statically-
typed languages.

~~~
rbehrends
> Still worth a read though, and makes a strong case for functional,
> statically-typed languages.

The thing is, it really doesn't. There are too many inexplicable results.
Typescript does significantly worse than Javascript, for example. There's also
no real good explanation why the results for Ruby and Python are diametrically
opposite, basically (the languages are more alike than different). And Clojure
has the best result of them all.

I suspect that there are simply too many confounding variables that are not
accounted for (such as the typical application domains for those languages,
average programmer skill, or complexity of the problems being targeted by
these projects).

~~~
davedx
Yes, I think after reading to the end I agree with your summary.

I still think there is value in using languages that eliminate entire classes
of bugs though, for example using a language that has automatic memory
management is a no-brainer except for certain specific domains where you
_need_ to do memory management yourself. Likewise with static typing: it
eliminates type bugs. There have definitely been times for me recently when
working with a dynamic language like JavaScript and there's been a bug in our
code base that would not have happened had we been using TypeScript. Some of
these bugs also had significant business impacts.

There is of course a trade off, typed languages can be more challenging to
develop with: I've had a number of fights with the Scala compiler. Typically
it's libraries rather than the base language, but it still costs time I
wouldn't have spent if using a dynamic language. Also, the Scala compiler
itself is very slow, to the point where the XKCD comic about "code's
compiling" has been true. On modern Macbook Pros, this shouldn't be a thing
anymore, but it still is :)

~~~
rbehrends
> I still think there is value in using languages that eliminate entire
> classes of bugs though, for example using a language that has automatic
> memory management is a no-brainer except for certain specific domains where
> you need to do memory management yourself. Likewise with static typing: it
> eliminates type bugs.

In practice, it's not so simple. It probably holds for garbage collection
(assuming GC is suitable for your application domain), but static type systems
come with costs. There's plenty of evidence (via studies) that type
annotations are really valuable as documentation, but the argument for bug
prevention is less clear. Most bugs that are caught by static type systems are
also generally prevented by other approaches (because they're basically
clerical errors that you hit fast even during basic testing). Conversely,
there are plenty of real bugs that aren't caught by type systems (or only by
type systems that are essentially full-fledged specification languages, or by
putting lots of work into types).

While there's a huge difference between having a validation strategy for your
code and not having a validation strategy at all (or winging it), it's much
more difficult to assess the relative value of different validation
strategies, especially once you take costs into account.

------
PleaseHelpMe
Summary :

> The data indicates that functional languages are better than procedural
> languages; it suggests that disallowing implicit type conversion is better
> than allowing it; that static typing is better than dynamic; and that
> managed memory usage is better than unmanaged. Further, that the defect
> proneness of languages in general is not associated with software domains.
> Additionally, languages are more related to individual bug categories than
> bugs overall.

~~~
kps
But also:

> It is worth noting that these modest effects arising from language design
> are overwhelmingly dominated by the process factors such as project size,
> team size, and commit size.

------
hellofunk
This article is dated October 2017 and claims to be the first large-scale
evidentiary study. But I have definitely seen either this exact study or
another one nearly identical, also using GitHub and also having similar
results for the languages, and that was at least 1 year ago. So perhaps this
article is a re-print of a prior study?

~~~
lomnakkus
Here you go:
[http://web.cs.ucdavis.edu/~filkov/papers/lang_github.pdf](http://web.cs.ucdavis.edu/~filkov/papers/lang_github.pdf)

Seems to be from 2014 and has the exact same list of authors, I think. (Hosted
in ~filkov which I'm _assuming_ is the www_public of Vladimir Filkov.)

No idea what changed since then.

~~~
hellofunk
Not just the same list of authors; the article and this PDF have nearly the
same verbatim wording.

------
neilwilson
Interesting that the social element isn't mentioned.

Smarter programmers are likely to be able to get their head around the strict
requirements of functional languages and they are the ones using the languages
at the moment.

Java, on the other hand, is pretty much the COBOL of this generation.

~~~
ismail
Maybe I am missing something thing key. But that sounds like an assumption? Is
there any evidence/details you can point me to?

~~~
banachtarski
There's evidence but not the type you're looking for. The op has provided
evidence that the op uses functional languages and not Java.

------
hellofunk
There are several points made about typing in another related post here
yesterday:

[https://news.ycombinator.com/item?id=15378800](https://news.ycombinator.com/item?id=15378800)

------
monster2control
Basically, they didn't figure out shit. What a waste of a read.

