
Linux code is the 'benchmark of quality,' study concludes  - jeffreyfox
http://www.pcworld.in/news/linux-code-benchmark-quality-study-concludes-98752013
======
SEMW
Coverity has been reporting bugs it finds in the Linux kernel since ~2000.[1]

That makes this comparison complete nonsense, surely? "Bugs found by static
analyser X" is only useful as a metric for comparing software projects insofar
as it's representative of wider code quality. Which may well be true normally,
but doesn't work if you report those bugs, then do the analysis again after
they're fixed to compare with the results from software projects you didn't do
that with!

[1] See <http://www.coverity.com/library/pdf/linux_report.pdf‎> . At one point
it listed all linux bugs found at <http://linuxbugs.coverity.com/> . Example
bug report on lkml from last month: <https://lkml.org/lkml/2013/4/5/297>

~~~
abc_lisper
Props for saying this.

------
ctdonath
Then there's the seL4 OS kernel, which was developed and proven to have zero
bugs. [http://www.theengineer.co.uk/news/safer-
software/312631.arti...](http://www.theengineer.co.uk/news/safer-
software/312631.article)

------
jiggy2011
This is fairly meaningless, what is classed as a "defect" for example? Not all
bugs are equal.

What about comparing code that has similar requirements and has similar
numbers of users.

Of course Linux is going to fare well against BS corporate software that was
made primarily to satisfy some middle manager.

Likewise open source is going to include a lot of stuff written by college
students that nobody actually uses.

Would be more interesting to compare Linux with similar parts of the NT core
for example.

~~~
gd1
Even worse, not all lines of code are equal. Across languages, but also within
them. C/C++ can be terse or verbose depending on style and the subset of
features employed.

------
zwieback
I'm a big fan of Coverity but it takes a lot of babying to make it useful. The
code bases where I've brought Coverity bugs down to an acceptable ratio
included a lot of markups and comments like

"// doing it like this to make Coverity happy"

The other issue is that the high-quality closed source codebases are probably
inaccessible precisely because the amount of investment it takes to get the
defect count low is also the reason they are closed.

------
S_A_P
Ok, so my problem with this is what qualifies as a "bug" in their scans. If
there scans are so good at finding these bugs then we need to pay them all
tons of money and make our software bug free by using their scanning tool.

~~~
plorkyeran
My experience with their free open source scan is that of the 50 or so
"defects" found, one was an outright bug, one was a false-positive, one was
intentional (a deliberate crash in some debug code), and the rest were
basically style complaints.

------
16s
The OpenBSD devs have been very critical of it. That may just be sour grapes,
but they may have some valid criticisms as well.

------
codex
I've often wondered what the defect rate is for various projects as measured
by hours spent. I've often seen commercial code written quite quickly, while
open source code, being written more for self-actualization, artistry, and
social proof, tends to be written more slowly (and carefully). On the other
hand, good engineers tend to write code more quickly than bad ones.

------
_pmf_
I'd rather suggest PostgreSQL's sources: * it's user space code * they are
much more uniform * they are much more readable

~~~
npsimons
Or how about sqlite? It's the epitome of ruthless testing.

------
derrida
Linux! Ha! I am sure the device driver for that 256mb Hello Kitty USB thumb-
drive is poetry. Linux is not the first place I'd look for high-quality code.
I think I'd start with Minix.

~~~
sp332
The standards for admission into the mainline Linux kernel tree are pretty
high.

~~~
derrida
True. And I wonder if Minix is higher. But from a security point of view,
you're only as strong as your weakest link. Linux runs device drivers in
kernel space, so your OS is as strong as the hello-kitty driver. Think this is
hard to exploit? You can create a USB device to search a linux host for device
drivers that are exploitable, then imitate that device. [1]

Suddenly, the Hello Kitty USB drive matters. That code is running in kernel
space.

Minix on the other hand runs device drivers in user-land. [2]

Given that device drivers contain 3-7 times as many bugs as other kernel
code,[3] a conclusion you may reach is that Linux contains more bugs per line
than Minix.

[1] <https://www.youtube.com/watch?v=D8Im0_KUEf8>

[2] <http://www.minix3.org/other/reliability.html>

[3] <http://www.osnews.com/story/15960>

ps. Sure as hell I can't code to the standard of getting a non-trivial patch
accepted to the Kernel :-)

~~~
jiggy2011
Interesting, though I guess there is a non-trivial performance cost to
userspace drivers, seems hard to believe you could reasonably drive a GPU from
userland. I remember John Carmack saying something about how driver overhead
was the one of the biggest bottlenecks when developing modern games.

Driver quality is of course something which will always significantly rock the
boat when it comes to stability but that is going to be the same with any
operating system. To an extent driver quality should be a factor when choosing
hardware. If you don't build your kernel with Hello Kitty support you never
have to worry about that code.

I guess that is one of the reasons that Apple has a better reputation for
software reliability in that for the most part they get to choose the hardware
that will be used with the OS.

~~~
justincormack
I think Apple USB drivers are userspace. USB was slow until 3.0.

~~~
jiggy2011
I guess that would make sense, since USB peripherals are likely less
performance sensitive in terms of latency and are also the place where you are
going to get the widest variety of devices.

------
nolok
If by benchmark of quality you mean code with the lowest defect density
("Defect density refers to the number of defects per 1000 lines of software
code.").

~~~
shabble
and if by 'defect' you mean 'thing flagged by coverity analysis tools'.

------
sjs1234
This guy in ms research studies code defects in the context of ms software.
[http://research.microsoft.com/apps/mobile/showpage.aspx?page...](http://research.microsoft.com/apps/mobile/showpage.aspx?page=/en-
us/people/nachin/publications.aspx)

------
rtkwe
Benchmark of open source quality perhaps. Over everything I'd say NASA would
probably take that prize. Of course we don't have their sources to analyze but
their practices are well known and the results seem pretty strong.

------
aortega
Nonsense. I work long hours debugging various kernels, including Windows,
Linux and the *BSDs. The quality of the OpenBSD kernel is amazing. Maybe Minix
is better, but that's because it's educational code.

------
jrochkind1
How does this scan service work? Is it just BS?

If software can automatically find code defects... why are there any code
defects at all anymore? Just fix whatever their scan says to fix.

~~~
tobiasu
It's a relatively sophisticated static analyser. Nothing new, but quite
useful.

Open source projects can register and get reports for free, commercial
companies have to pay. Coverity uses eg. Linux to test and compare their
product against, and write various marketing pieces such as this one to raise
awareness for their product.

<http://scan.coverity.com/>

------
FollowSteph3
Defect is too broad a term in this study. The headline is sensationalization
of one metric, which is it itself too broad. And does it really measure what
it states?

------
alexchamberlain
Presumably, this is only kernel space code... That's only a fraction of a
modern Linux OS.

Good study though... not the greatest qualification of facts in the article.

~~~
npsimons
I've often felt that more developers (or even interested power users), should
be running with things like MALLOC_CHECK_=3
(<http://www.novell.com/support/kb/doc.php?id=3113982>) enabled by default for
everything. On top of that, when we have plenty of FLOSS static analysis tools
(<https://news.ycombinator.com/item?id=4545188>), plus things like valgrind,
gprof and gcov, I don't understand why more people don't use them. As for
compiler flags, if we can build a whole distro around optimization (Gentoo),
why can't we build a whole distro around debugging (-fstack-protector-all,
-D_FORTIFY_SOURCE=2, -g3, etc)? I realize some distros already enable things
like this, but usually they are looking to harden things, not necessarily
diagnose bugs.

~~~
amboar
You may want to look into Hardened Gentoo which does things along the lines
you suggest, amongst other hardening techniques.

<http://www.gentoo.org/proj/en/hardened/>

------
kunai
It would be very interesting to see how the quality of BSD code is compared to
Linux.

------
asloobq
I would love to know how Linux fares against any of the Windows OSs in this
scenario.

~~~
alexchamberlain
We would only hear about it if Linux lost.

------
cooldeal
Compared to what? Proprietary corporate CRUD code? How about comparing to BSD,
Hurd, Haiku, Mach etc.?

Edit: This article has better details.
[http://gcn.com/blogs/pulse/2013/05/linux-leads-in-open-
sourc...](http://gcn.com/blogs/pulse/2013/05/linux-leads-in-open-source-
quality-but-risky-defects-lurk.aspx)

"The finding is based on an analysis by the Coverity Scan Service, which for
more than seven years analyzed 850 million lines of code from more than 300
open-source projects, including those written in Linux, PHP and Apache."

"In general, Coverity found the average quality of open-source software was
virtually equal to that of proprietary software. Open-source projects showed
an average defect density of .69, the study found, a dead heat with the .68
for proprietary code developed by enterprise customers of the service.

Although the average rates of defects in the two types of code are nearly
identical, researchers did find a difference in quality trends based on the
size of the development project.

For instance, as proprietary software coding projects passed 1 million lines
of code, defect density dropped from .98 to .66, a sign that software quality
rises in proprietary projects of that size.

That trend reversed itself in the cost of open-source code, researchers found.
Open source projects between 500,000 and 1 million lines of code had a defect
density of .44, which grew to .75 when those projects went over the 1 million
line mark."

~~~
toyg
Could it be that over-1m-LOC proprietary projects are, in fact, fossilised?
Once a project is large enough, deep changes are discouraged because their
cost (and risk) to the business gets too high.

Meanwhile, open source projects like to refactor (somebody would say _reinvent
the wheel_ ) forever and ever, constantly ripping out old code for new, so
defect density is stable and simply rises in line with overall complexity
(which obviously rises with project size).

I'd be curious to look also at developers' turnaround rates: once you leave a
company you can't keep hacking on their code, which is something you can
actually do with open-source. As old developers leave, their code lies
untouched for fear of breaking anything, and again gets fossilised.

~~~
sliverstorm
You could probably also speculate about the impact of the corporate projects.
For example, if the project is over 1M LOC, can we surmise it is very likely
that project is their bread-and-butter (and thus gets much more attention and
resources)?

------
ctdonath
I'd make some snarky comment about the quality of one small but important part
of Linux noted for ongoing consternation - sound - but seems that'd be poking
a hornet's nest _again_. <https://news.ycombinator.com/item?id=5664202>

~~~
to3m
Is this really relevant here?

(I'm sure the sound code has few statically-detectable defects... even if it
fails to produce anything audible for most people.)

~~~
ctdonath
Intended relevance was raising the issue of what constitutes defects, the
range of effects, and longevity thereof. As others noted, some of what were
counted as "defects" were little more than semantic inconsistencies or obscure
flaws rarely seen (if ever); counting each of those as "1 defect" on the same
scale as something that pesters the heck out of a large percentage of users
(or drives away many prospective users) isn't quite right.

My poor wording was an attempt to raise the point without eliciting a similar
hundred+ responses as the last time it came up.

