“Most notably, it does appear that strong typing is modestly better than weak typing, and among functional languages, static typing is also somewhat better than dynamic typing. We also find that functional languages are somewhat better than procedural languages.”
But how did they determine that?
They then looked at commit/PR logs to determine figure out how many bugs there were for each language used. As far as I can tell, open issues with no associated fix don’t count towards the bug count. Only commits that are detected by their keyword search technique were counted.
After determining the number of bugs, the authors ran a regression, controlling for project age, number of developers, number of commits, and lines of code.
That gives them a table (covered in RQ1) that correlates language to defect rate. There are a number of logical leaps here that I’m somewhat skeptical of. I might believe them if the result is plausible, but a number of the results in their table are odd.
The table “shows” that Perl and Ruby are as reliable as each other and significantly more reliable than Erlang and Java (which are also equally reliable), which are significantly more reliable than Python, PHP, and C (which are similarly reliable), and that typescript is the safest language surveyed.
They then aggregate all of that data to get to their conclusion.
I find the data pretty interesting. There are lots of curious questions here, like why are there more defects in Erlang and Java than Perl and Ruby? The interpretation they seem to come to from their abstract and conclusion is that this intermediate data says something about the languages themselves and their properties. It strikes me as more likely that this data says something about community norms (or that it's just noise), but they don’t really dig into that.
For example, if you applied this methodology to the hardware companies I’m familiar with , you’d find that Verilog is basically the worst language ever (perhaps true, but not for this reason). I remember hitting bug (and fix) #10k on a project. Was that because we have sloppy coders or a terrible language that caused a ton of bugs? No, we were just obsessive about finding bugs and documenting every fix. We had more verification people than designers (and unlike at a lot of software companies, test and verification folks are first class citizens), and the machines in our server farm spent the majority of their time generating and running tests (1000 machines at a 100 person company). You’ll find a lot of bugs if you run test software that’s more sophisticated than Quickcheck on 1000 machines for years on end.
If I had to guess, I would bet that Erlang is “more defect prone” than Perl and Ruby not because the language is defect prone, but because the culture is prone to finding defects. That’s something that would be super interesting to try to tease out of the data, but I don't think that can be done just from github data.
From Conservation of Expected Evidence:
If you try to weaken the counterevidence of a possible "abnormal" observation, you can only do it by weakening the support of a "normal" observation, to a precisely equal and opposite degree.
It really seems like a stretch to say that higher bug fix counts aren't due to higher defect rates, or that higher defect rates are a sign of a better language. Language communities have a ton of overlap, so it seems unlikely that language-specific cultures can diverge enough to drastically affect their propensity to find bugs.
> It is worth noting that these modest effects arising from language design are overwhelmingly dominated by the process factors such as project size, team size, and commit size.
So I dismissed the paper as "not conclusive", at least for the time being. I wasn't surprised if their finding were mostly noise, or a confounding factor they missed.
By the way, I recall some other paper saying that code size is the most significant factor ever to measure everything else. Which means that more concise and expressive languages, which yield smaller programs, will also reduce the time to completion, as well as the bug rate. But if their study corrects for project size, while ignoring the problems being solved, then it overlooks one of the most important effect of programming languages in a project: its size.
It may be simply that MySQL bugs actually get fixed. The work behind this paper counts bug fixes, not bug reports. Unfixed bugs are not counted.
* Most-starred projects mean these are all successful projects. Much software is not successful.
* A successful project means there are likely more-experienced-than-average engineers coding.
* Most-starred projects will be older, more stable code-bases than most software.
* Open Source development is a small slice of software development.
* Github is a sub-set of Open Source development.
And I expect there are a myriad of holes to poke elsewhere. In general I distrust any research that surveys GitHub and tries to make claims about software development in general. It is lazy.
I have no experience with Erlang, but one reason I'd expect Java to have more defects than Ruby and Perl is that Java is more verbose, i.e. it takes more code to get something done. One would naively expect to find an association between the size of commits and their propensity to contain errors.
The practical result is that language choice doesn't matter since the effects are very small.
Maybe because Erlang and Java are used for projects of higher complexity (larger scope, more interacting components, etc.)? Did the authors try to address this issue at all?
Notice how top projects in popular languages do have tendency to have more fixes than top projects in more obscure languages. Perhaps these projects have simply more users, leading to more reported bugs and community pressure? Interesting paper, but it is all to easy to jump to conclusions.
The data indicates functional languages are better than procedural languages; it suggest that strong typing is better than weaking typic; that static typing is better than dynamic; and that managed memory usage is better than unmanaged.
Obviously not the last word, but interesting study nonetheless.
The results for Typescript are completely wrong. Bitcoin, litecoin, qBittorrent do not have any Typescript code http://bitcoin.stackexchange.com/questions/22311/why-does-gi...
Double Face palm!
EDIT: The Procedural/Scripting split seems overdetermined. All of the procedural languages have static typing, and all the scripting languages have dynamic typing.
Refactoring doesn't necessarily even mean that the original code was structurally bad. When new features are added, code gets more complicated, requiring more abstraction. I like to abstract code via refactoring when we need it, not before. Then refactoring changes good code to good code. It's just that the new situation presents different demands to the code than the old situation.
For example, what caused a typo in a sting somewhere, that now contains "foo" instead of "bar"? Likely cognitive overload, because the code was too complex to keep entirely in memory. I.e. author was to busy processing the code in his head to notice the typo he just made. Therefore code with lower cognitive load is likely to have fewer bugs like this or even overall.
Some programmers have learned that there is a way to prevent string typos with appropriate test cases and some programmers keep their code as simple as possible, i.e. with lowest cognitive load. So, we should see correlations to which languages such programmers prefer and which languages encourage such coding practices.
This is a complex subject that has much more to do with psychology, than with technology. And it should be studied as such. Trying to study bugs without touching psychology is pretty much bs.
I don't get this statement though: The enrichment of race condition errors in Go is likely because the Go is distributed with a race-detection tool that may advantage Go developers in detecting races. I thought that including a race detector would reduce race condition errors. Am I missing something?
It was also interesting to see it rank low in the security correlation ranking too.
And the authors at least noted that "The enrichment of race
condition errors in Go is likely because the Go is distributed with a
race-detection tool that may advantage Go developers in detecting
E.g. is some of their gains loss because the developer is asleep at the wheel. Some things like this are established in other domains, on cars and bikes there is some evidence that improved safety equipment increases risky behaviour.
Even a sample this size is too small:
Once you factor out the level of commit activity, language influence is actually quite large.
Once you factor out commit count, the impact of language turns out to be quite large. A Haskell project can expect to see 63% of the bug fixes that a C++ project would see. I don't call a 36% drop in bugs "small".
Anyhow, I don't think this says anything about anything.
However it screams "well we did not find anything much". (Which is good science anyway so props to them)
Choice of language contributes just 1% to the variance in bugs - that is if you want to improve your project quality - don't bother looking at the language choice.
It's an interesting result - but does rather beg the question what drives the other 99%?
gitolite is a perl project.
showdown? https://github.com/showdownjs/showdown it has some perl code in it, but it is a js project.
rails-dev-box? rail dev, do you agree with that?
So I will not read the paper.