Hacker News new | past | comments | ask | show | jobs | submit login
Size is the best predictor of code quality (vivekhaldar.com)
225 points by gandalfgeek on Sept 26, 2011 | hide | past | favorite | 123 comments

Size is the best predictor of many things about software projects.

Note that 'size' is a dimensionless quality; we can only approximate it with certain proxy metrics (KSLOC, Function Points, Budget allocation).

Edit: and gzipped size, and token counts, and logical lines, and Halstead metrics, and cyclomatic complexity, and object points, and ... and ... and ...

For example, project size is the best predictor of whether a project will meet its initial budget/time/feature/quality goals (Boehm, Standish). It totally swamps staff quality, programming language, programming process, tools, libraries, everything in this respect (Boehm).

Per (Standish), a project with a budget > US$10 million at launch has a 98% probability of not meeting its goals and from memory < 50% probability of avoiding cancellation.

In fact I have a totally untested hypothesis that agile "works" because it's mostly applied by small teams to small projects.

(Boehm): Barry Boehm, Software Cost Estimation with COCOMO II

(Standish): The Standish Group CHAOS Report.

"In fact I have a totally untested hypothesis that agile \"works\" because it's mostly applied by small teams to small projects."

I think that agile works more as an interface and contract helper between customer and supplier. A few years ago it was very difficult to convince a customer to follow the iterative way. The customer just wanted all the features in time.

On the other hand, small project sizes make this possible at all. For a very complex project you have so many potential users and customers that a single agile team cannot deal with all of them.

Size has its own unique problems.

Agile "works" because neither "agile" nor "works" is well defined. (Some agile methodologies are well-defined, of course.)

The issue is that budget size requires more code, (eg. if you have a budget you need to spend it), and spending it requires hiring programmers. You can't take on a $10 million dollar project and tell people you're hiring 4 programmers and that the project will be done in a few months. You need to hire 100 programmers and tell people it will take a year.

Basically, it's a symptom of the idea that work expands to fill time, agile works IMHO because it avoids spending time that doesn't need to be spent.

This is one of those chicken-and-egg, correlation-is-not-causation problems.

Does big 'size' cause a big budget, or do big budgets cause blooming 'size'? A bit of both I'd wager.

That reminds me of a little fable I've always found humorous: The Parable of the Two Programmers http://www.csd.uwo.ca/staff/magi/personal/humour/Computer_Au...

One of the big advantages of agile is to reduce scope. Traditionally the incentive is think of everything at the beginning because no changes will be allowed during the project. The result is overproduction of features. Agile methods try to focus on the features that are more important reducing the size of the project.

The composable programming evangelists (off-shoot from the movement to minimize mutable state) ague that the hifhest-quality software is always a small program that composed other small programs, recursively until all the desired functionality is accumulated.

I just pulled Steve McConnell's (must read!) CODE COMPELTE: A Practical Handbook of Software Construction off my bookshelf. The section How Long Can a Rountine Be? references some surprising (but perhaps dated) studies that suggest the evidence in favor of short routines is "very thin" and the evidence in favor of longer routines is "compelling". These studies are probably biased to desktop and corporate software written in C during the 1980s.

The consensus is that routines should have fewer than 200 LOC, but that routines shorter than ~30 LOC are not correlated with lower cost, fault rate, or programmer comprehension. btw, the longest function I've seen in commercial software I've worked on was 12,000 LOC! I will not name names. :)

* A study by Basili and Perricone found that routine size was inversely correlated with errors; as the size of routines increased (up to 200 LOC), the number of errors per LOC decreased (1984).

* Another study found that routine size was not correlated with errors, even though structural complexity and amount of data were correlated with errors (Shen et al. 1985)

* A 1986 study found that small routines (32 LOC or fewer) were not correlated with lower cost or fault rate (Card, Church, and Agresti 1986; Card and Glass 1990). The evidence suggested that larger routines (65 LOC or more) were cheaper to develop per LOC.

* An empirical study of 450 routines found that small routines (those with fewer than 143 source statements, including comments) had 23% more errors per LOC than larger routines (Selby and Basili 1991).

* A study of upper-level computer-science students found that students' comprehension of a program that was super-modularized into routines about 10 lines long was no better than their comprehension of a program that had no routines at all (Conte, Dunsmore, and Shen 1986). When the program was broken into routines of moderate length (about 25 lines), however, students scored 65% better on a test of comprehension.

* A recent [sic!] study found that code needed to be changed least when routines averaged 100 to 150 LOC (Lind and Vairavan 1989).

* In a study of the code for IBM's OS/360 operating system and other systems, the most error-prone routines were those that were larger than 500 LOC. Beyond 500 lines, the error rate tended to be proportional to the size of the routine (Jones 1986a).

* An empirical study of a 148 KLOC program found that routines with fewer than 143 source statements were 2.4 times less expensive to fix than larger routines (Selby and Basili 1991).

Judging from my experiences, several of these studies may be missing a major confounding factor: the complexity of the problem the code solves. All my longest methods are rather stupid output formatting stuff or if/else cascades that handle tedious by mostly trivial distinctions. But for code that solves hard problems I often write many small functions for independent steps of the solution.

So the studies that measure method length/bug count correlation within a single code base or in code written within a single organization might only measure the fact that code that requires no thinking contains fewer bugs than code that does. Paging Captain Obvious. Some of the other studies address that (e.g. Shen 1985 and the code comprehension studies), but as it is so often the case in quantitative studies of things related to programmer productivity, we lack repeated measurments where the only variable is the independent factor whose influence is studied.

That's an interesting point. There's certainly a cap on the complexity of code that can be put into a single long function. (Unless, I suppose, it has inner functions that call one another, like how people do OO in Javascript; such things can be just as complex as whole programs.) Usually it's implementing some conceptually unified thing. Even if that thing is a complex algorithm, it's still cohesive enough to be able to say what it is. And implementing even a very complex algorithm is not particularly complex at a system level.

I've mentioned those findings here before. It's great to have them listed. It's fascinating that, even with all their limitations and age, they fall so completely on one side of the question - the opposite side to what a lot of sophisticated programmers believe.

The last time we debated this on HN, there was a disagreement about how much complexity the interactions between functions add to a program. To me, complex call graphs are even worse than complex code inside a function. I was surprised to learn that an opposing view even existed.

>>complex call graphs are even worse than complex code inside a function

Hear hear.

I have a problem specific to that, here. Some programmers follow the "no documentation should be needed, the code is obvious" dogma in a non-typed scripting language -- and you have to look many levels up in the call graph before you even find out about the damn function parameter's types... :-(

Ideally you have tests which let you know how every function can be (ab)used.

If I wrote 2-20 line functions, then I wouldn't document/test every one, either.

I think the idea is that the short functions call other (short) functions at a lower conceptual level to create a large amount of functionality. By testing that function, you're testing how they work together.

This is different from short functions being short because they don't do much.

One of the very hard to quantify effects here are how many of the bugs found in 'short' routines would have gone unnoticed if the routines were longer. In other words, were all the bugs that were there (both in short and in long functions) actually found?

In trivial flow (short routine) bugs tend to stand out whereas in complex flow (longer routine with multiple levels of nesting) bugs can be much harder to spot.

It may actually be good that more errors were found in shorter routines after all that is exactly what it is about, finding errors, not making errors.

I wonder how these 1980's results relate to object oriented code and other things that have happened since then. Even 25 LOC would be a huge method in Smalltalk-style OO for example.

Other than OO, we also have much better tools for navigating code now. That may have changed how we approach and understand unknown code.

Another reason (in addition to those already mentioned) why these numbers are not easy to interpret:

  small routines [..] had 23% more errors per LOC
If the smaller routines had, on average, 100/1.23 = 81% or fewer of the lines of the larger routines, then they still had fewer errors per routine.

Good writeup. McConnell is right in that there's a vast abyss between the researchers and practitioners in our trade.

Another good book by McConnell, which also discusses size with summaries of studies, is Software Estimation: Demystifying the Black Art.

Smaller functions tend to be more reusable leading to fewer LoCc per codebase, which could overcome the bugs/LoC penalty.

Also, considering the date, this research could be influenced by the bug-prone-ness of parameter passing in C as a non-memory-managed language.

For anyone interested in more discussion of this I suggest grabbing a copy of "Making Software", in particular chapter eleven on Conways Corollary which the chapter centered around this paper from 2008 http://research.microsoft.com/pubs/70535/tr-2008-11.pdf

The meat:

  Table 4: Overall model accuracy using different software measures
  Precision  Recall Model
  86.2% 84.0% Organizational Structure 
  78.6% 79.9% Code Churn
  79.3% 66.0% Code Complexity
  74.4% 69.9% Dependencies
  83.8% 54.4% Code Coverage
  73.8% 62.9% Pre-Release Bugs
Or in plain terms if people mess with code they don't normally mess with, you can bet real money (with a higher probability than other metrics) it introduces bugs.

Edit: I have been meaning to make a git tool that would analyze the history of a project to create predictions on what bit of code is the most buggy using this model, but just haven't done it yet. It would be cool to integrate it with GitHub's bugs api to see how correct it might be. If someone does make it let me know!

> However, I still haven’t found any studies which show what this relationship is like. Does the number of bugs grow linearly with code size? Sub-linearly? Super-linearly? My gut feeling still says “sub-linear”.

interesting, my gut says exponential, which is why cost and likelihood of project cancellation shoot up in the largest projects.

edit: i have been corrected below, i concur with quadratic.

Are you sure you mean exponential? I think quadratic is a more reasonable guess.

He might be using "exponential" to mean "superlinear", which seems to be the sense a lot of my students try to use it in (as well as non-technical people).

Now that you mention it, I guess non-technical people do tend to use that definition. I had never thought about it before. It's unfortunate, considering there's a big difference between, say, quadratic and exponential growth. In fact, it tends to be a more impactful difference than between linear and quadratic, especially regarding algorithms.

I am not aware of the difference. Is it that exponential means something like c^n, superlinear is n^c, and linear cn?

What Locke1689 said. If you're not familiar with big O notation, linear is like f(x) = 2x, quadratic is like f(x) = x^2, and exponential is like f(x) = 2^x. There's a huge difference between quadratic and exponential: exponential grows significantly faster as x increases. For computing, the difference is significant. Cobham's thesis says that polynomial algorithms (which includes quadratic) are reasonable to perform, while exponential algorithms aren't.

Superlinear is anything above linear. This includes pseudolinear (e.g., O(nlog n)). Exponential is O(m^n). Linear is O(n).

Edit: And polynomial is O(n^m).

without straining my brain, to me its reasonable to measure complexity by number of interacting components, which is a combination, which is geometric, which is a discrete exponential, right?

The number of edges in a complete graph is n*(n - 1)/2, so by that metric it would be quadratic.

Yes but the number of nodes is n, and an "interaction" among m nodes might not be decomposable to a chain of pairwise interactions, which raises the ceiling back to exponetial (actually factorial, which is worse).

What corresponds to interactions that aren't decomposable as pairwise interactions? Race conditions? Resource utilization? Real bugs (and the nastier ones), but probably a minority. So factorial with a relatively small constant in front of it.

But obviously the real answer is to write a program that will correctly verify all other programs.

It can be quadratic. Think about drawing all possible lines between points (possible components in software):

Make a table with columns being # points and the # lines you can draw between them.

1 0 2 1 3 3 (triangle) 4 6 (box with a criss cross) 5 10 (5 point star, with everything connected) ...

This relation is n * (n - 1) / 2

Yes, combination, which is factorial, which is yet larger than exponential.

Not sure what you mean by "geometric" in this context.

The Art of Unix Programming by esr covers just this question[1]. Nowadays module/file size is one of the most important factors in how I design and write my code.

[1]: http://catb.org/~esr/writings/taoup/html/ch04s01.html

Isn't it the case that small code bases contain fewer features? So, really, isn't this result merely that the bug rate per feature is a constant?

That is not necessarily the case, no. For example, implementing the functionality of printf yourself is rather involved, but calling printf from the standard library is literally a one-liner. In both cases you get the functionality of printf.

Similarly, there are brief ways to write things and verbose ways. Sometimes what is commonly written as large factory classes and interfaces in one language works out to a simple higher-order function or macro in another language. See the old "evolution of a programmer" joke for some extreme examples.

( I HATE that you are downmodded. )

I would argue that it isn't just bugs per feature, but bugs per interaction-point. the more features there are, the more interaction points between those features.

Not at all.

Consider a very high quality code base with a lot of features. If you leverage every best practice by using advanced techniques such as advanced functional programming (using closures, macros, monads, etc.), aspect-oriented programming, and domain-specific-languages as necessary then you could still have a very small code base.

Most code bloat is due to things like unnecessary duplicate code (such as common error handling, logging, or thread management idioms being inlined and "unrolled" everywhere rather than tucked away behind nice abstractions), different code using different sub-components that are very similar (e.g. every team using their own hand-rolled string class in C++), working around limitations of whatever language is used (stretching limited languages too far), working around design defects, and such like. Anyone who has had even a tiny exposure to functional programming techniques can appreciate the enormous power it has to reduce code quantity.

I have two questions for y'all on this.

I've heard it said for years that studies show the number of bugs grows roughly linearly with code size and that this holds true across any programming language. (It's repeated, for example, at http://c2.com/cgi/wiki?LinesOfCode) I think I've even seen references to such studies, but I don't remember where. So, HN: what are these studies? Anybody know? (I mean besides the one referenced by the OP on class size. I believe this meme goes further back than that.)

My second question is about how to measure code size. PG said a few years ago: why not just count tokens? I've thought about this ever since and I don't see what's wrong with it. Raw character count is obviously a lame metric, and LOC isn't much better. But token count seems like an apples-to-apples comparison that is easy to measure objectively and leaves out noise such as name length and whitespace. So: what's wrong with token count as a measurement of code size?

Here's one study showing that complexity grows exponentially with the size of the program:


Here's a study showing addressing changed requirements or fixing program defects requires a program maintenance effort that is directly proportional to the size of a program:


Thanks, I'll take a look. This point from the abstract of the latter is highly reminiscent of the paper cited by the OP:

"Repair maintenance is more highly correlated with the number of lines of source code in the program than it is to software science metrics."

Edit: I read it. This must certainly be one of the studies people are referring to; it covers exactly the question of interest. The major limitation is that all the programs were in PL/I, so it says nothing about language-independence. The important finding is that line count was the most highly correlated variable with bug count of those studied, quite a bit more so than more complex metrics were (Halstead's E). It's also interesting that although the authors write, "There are some very large programs in this set," the largest program was in fact only a very modest (by our standards) 6572 lines of PL/I.

In the Computer Language Benchmarks Game, they compare GZip'ed code size. I disagree with removing comments, since comments tend to be important, and comments generally document weirdness and bugs inherent in the platform/library/language.


Les Hatton has some of what you are looking for. See "Re-examining the fault density - component size connection":


...and some of his other work:


Thanks. I'll take a look at those too.

> So, HN: what are these studies?

If you have access to the publication libraries of the ACM and IEEE, you'll find that they publish most of this literature.

(It costs money for both, unfortunately).

> My second question is about how to measure code size. PG said a few years ago: why not just count tokens?

Some schemes do so, it depends on how you define "SLOC". The classic problem is if-thens.

How many lines is this?

    if ( foo ) then bar else baz
Or this?

    if ( foo )
    then bar
    else baz
Or this?

    if ( foo ) then
In the literature you'll see a distinction between "physical" lines of code and "logical" lines of code. The latter is close to token-based.

> So: what's wrong with token count as a measurement of code size?

I vaguely recall that some properties correlate with physical lines and others with logical. I can't recall what and which, sorry.

If you have access to the publication libraries of the ACM and IEEE, you'll find that they publish most of this literature.

I know where to find research literature. I'm asking for specific citations. Is the claim an urban legend? If "studies show" X, one ought to be able to point to the studies.

How many lines is this?

All three of your examples have the same number of tokens, so to judge by them alone, token count is not just a good measurement of code size, it's a perfect one. My question is what's wrong with it.

I don't see how "logical lines", whatever that is, can possibly be simpler than counting tokens. In fact I don't see how anything can be simpler than counting tokens, since it's easy to know what it means, a tokenizer is always available, and everything irrelevant to the program is by definition dropped.

  I don't see how "logical lines", whatever that is
At least by the definition I'm accustomed with, a logical line is a statement or series of statements which directly and logically belong together, for example a function call or an arithmetic operation.

Logical lines are independent from physical lines as each logical line can be split over multiple physical lines (eg splitting up a long string IO operation), or one physical line can contain multiple logical lines (this is a bad idea in most cases though).

Since the definition hinges a bit on what somebody considers as "logically belonging together", the whole concept is a bit fuzzy. Consider this string formatting operation (Python):

somestring.split(somechar)[-1]).replace("foo", "bar")

Do you consider this one logical line? Or would your logical lines look more like this:

somestring = somestring.split(somechar) somestring = somestring[-1] somestring = somestring.replace("foo", "bar")

Both are valid interpretations of logical lines, but they are visually and conceptually quite different, which makes it - imho - a bit problematic to try and use them as a measure of code quality.

You make sense, but the concept itself seems so fuzzy and hard to nail down that I marvel at how it ever arose, given that there already exists an unambiguous and ubiquitous way to distill the logical structure of code free of textual artifacts.

I think token count hasn't gained traction because of languages with syntax and types. "Surely all the boilerplate in defining a class doesn't make it more complex," goes the argument.

I prefer to turn the debate around on its head. Rather than argue complexity metrics (boring) I say I prefer languages without boilerplate because they make complexity harder to camouflage.

One can argue that boilerplate doesn't add to complexity (though I don't agree). But no one can argue that it doesn't add to code size. The studies cited in this thread show either linear or superlinear growth of bug count with code size. If those studies are correct, doesn't that rather settle the issue?


One way that I could see where actual lines is a more useful measurement is if there is any correlation between what percentage of the code you can see on-screen at any time and bugs.

Not saying that there is such a correlation, just that there may be cases where it is useful to measure by line count.

> I'm asking for specific citations.

None to hand right now.

> I don't see how "logical lines", whatever that is, can possibly be simpler than counting tokens.

I wasn't rejecting token counts per se. I think that it's a useful metric too.

What I was trying to convey is that "logical lines" is the term used in the literature. Logical lines can cover token counts if you define 1 token = 1 logical line. Or it might not. Either way, you have to settle on a definition.

A common LOC metric for languages in the C family is to count semicolons ("real" lines of code).

I used to use that all the time. It's great because it's quick, and a pretty damn good sloppy metric. I remember joking that the extra semicolons in for loops and the lack of semicolons in if/while statements balance each other out. People tend not to use semicolons in comments, either, except in commented-out code (which some of us have a pet peeve against anyway).

It's very easy to correct for both for loops and if/while.

Yes, but then the metric is no longer so charmingly trivial.

I actually think raw byte-count is a pretty good metric. Documentation size and variable name length are also symptoms of complexity. Specifically I think wc -c $(find -type f) (how many total bytes) and find -type f | wc -c (how many files and directories) make good metrics. Obviously you need to filter out data files and such.

This has some problems (e.g. spaces vs. tabs, utf8, etc.), but all of these size metrics will be pretty loose.

But this penalizes programmers who like to use long readable names. I'm not one of them (though I used to be), but they have a strong case here.

Take any program. Replace all the names with the smallest possible character sequences. Have you made the program simpler? Or smaller in any meaningful way? Surely not. I'd say what you've done is left its logical structure precisely intact (another way of saying that token count is a good metric) while reducing its readability.

This metric relies on the assumption that people are trying to produce readable code. IMHO long variable names are much more helpful in complex codes than simple ones.

Ok, but now I'm wondering if we have opposite views of code size. In my view, code size is bad bad bad. More code means more complexity. Any time you add code, you're subtracting value; it's just that (if it's good code) you're adding more value than you're subtracting. So a higher score in a code size metric is a bad thing to aspire to, and we should greatly favor approaches to writing software that -- all other things being equal -- lead to smaller programs. I don't think that programmers who use long names for readability should have their programs discounted as longer (and thus more complex). Just because their names are longer doesn't mean their programs are.

No no no. My logic is this: Take tight, readable code with short names a replace them with long names, and you'll have worse code. The converse isn't true because complex (bad) codes are more readable with long variable names.

Complexity -> Code Size Code Size -> Long Variable names (win for big codes) Complexity is bad

Therefore long variable names are a symptom of a problem, but not the problem themselves. Long variable names aren't bad, but they are still a good predictor of badness. Since size metrics are meant to predict badness, long identifiers should increase size metrics.

Oh, I see. You sound like an APLer. We have similar tastes, but many good programmers disagree, so I doubt that long variable names are a predictor of program badness. Not every long name is FactoryManagerFactoryManagerFactory.

Consider a language like K, in which variables usually have one-letter names. The real code-size win for K is not that. It's that the language is so powerful that complex things can be expressed in remarkably compact strings of operators and operands. (Short variable names, I'd argue, are an epiphenomenon. It's because the programs are so small that you don't need anything longer, and longer names would drown out the logical structure of the program and make it harder to read.) Token count is a good metric here. Both line count and byte count come out artificially low, but token count can't.

I came back to say I've thought about your argument a couple more times and I think you're on to something there. The idea that long variable names, even when they add to readability, are a secondary indicator of code badness (because the code is too complex not to be able to get away with short names) is a subtle and interesting way to frame the problem. I'm surprised it didn't get more pushback from the 95+% of programmers who take the opposing view. I suppose this little corner of the thread is a quiet enough backwater that nobody noticed.

But I still don't see how you get around the objection that, according to your preferred metric, if you replace all the names with arbitrarily small character sequences, you get significantly smaller code - yet clearly not better code.

Also reduced its maintainability.

One metric I've seen is gzip-compressed size, which has the nice property that it identifies the size of the incompressible elements -- ie it discounts repetitive boilerplate.

Another interesting set of metrics is Halstead's "software science" metrics[1]. They fell out of favour because initially they were hard to count and didn't seem to correlate with anything else.

[1] http://en.wikipedia.org/wiki/Halstead_complexity_measures

I never understood the gzip one. Repetitive boilerplate is bad; why hide it?

You're trying to understand the "true" size of the software in spite of the idiosyncrasies of a given language.

As I noted somewhere above, "size" is an abstract, dimensionless quality. It can only be approached through proxies. The more the merrier, I reckon, especially if they turn out to correlate with different things.

In the case of most projects, copy/paste code is not just because of the language. It's because of lousy programmers. I've seen large codebases which are made up of a full 40% duplicate code. There's no way to blame that on the language.

You're missing the point of the parent: There is no 'one true' metric. If you use different metrics (actual lines, logical lines, gzip'd size, etc) you may well find different correlations.

But repetitive boilerplate is exactly the last thing that should get away scot-free in a measurement of code size.

It depends on what you want to know. Pure physical lines is one thing, "size" is a another.

I want a way to measure how complicated a program is that's independent of language and obviously extraneous things like line length.

You may find that the Halstead metrics I mentioned are closer to what you're after.

I've changed my mind. I'm interested in what I originally said: what's the best way to measure code size, and what are those studies (if they exist). Otherwise we get into debates about size vs. complexity, which is actually less interesting IMO. Size as a proxy for complexity is good enough for me.

"I've heard it said for years that studies show the number of bugs grows roughly linearly with code size and that this holds true across any programming language."

I have absolutely no reason to doubt this, but I suspect this does not look deeply enough at the process.

Bugs, normally, get fixed, especially towards the end of a project, and it is much easier to eradicate bugs, and verify eradication, in a small project, than it is in a large one.

Poor bug-fixing, in the late stages of a large, buggy project, may well introduce further bugs, as well as discovering latent bugs, masked by the original ones.

The Mythical Man-Month

That's a marvelous book-length essay, but not a formal study. Or does Brooks cite research on this?

Writing as little code as possible to fully accomplish a goal has recently become a fundamental principle that I live by.

If I can write a piece of code in fewer lines, I'll do it. That pretty obvious, we all would. But I try to take it a step further and consciously seek solutions that lead to fewer lines of code. Chopping down a large block of code is an incredibly gratifying feeling for me.

I find that writing less code while maintaining expressiveness usually leads to simpler solutions and, IMO, it is simplicity that reduces the bug count.

I find that in a team environment that I would prefer my co-workers write more lines of code, and longer lines at that.

While, given a moment or two, I can unpack a dense list comprehension (a one liner) I would rather read several statements that add up the same thing.

Of course there are many times when you could do the same thing with fewer lines, in a more elegant, straight-forward way. However I have a hard time believing that just having fewer lines is a sufficient goal for flexible, maintainable code.

Then again I'm pretty new at this :)

In my experience, it's the size of individual methods/functions that determine the number of bugs. >50 or 75 lines of code per routine greatly reduces its maintainability and increases the number of bugs (often difficult bugs to track down).

At my school (Epita, France) our C coding style standard mandated that all functions be <= 25 lines.

Even though some lines didn't count (like those containing a single curly brace), it was very tough, but always possible. This applied to all projects, small and large. They made us write mostly Unix apps, like an FTP server, a command-line NNTP client, a POSIX shell (I still remember how meticulous you had to be when reading all the man pages to implement process control and terminal control correctly!). Plus the code had to be portable across all 3 Unix OS running at the school: NetBSD, Solaris, Digital Unix. This was in 2000-2001.

For example I just checked the FTP server I wrote for one of the assignments (I still have a copy): 3123 lines and all the functions are <= 25 lines of code. Such rigorousness definitely shaped the quality of the code I now write professionally, 10 years later...

That's awesome! I'm guessing it fits within what I defined - you look at the starting line number and the ending line number for code points in a function and the number should be <50 to 75 LOC. That includes inline comments (noticeably not function definition comments). Code clarity should also be prevalent - meaning, nothing fancy! ;-) Don't cheat the system with single line if statements (for example). It's a really, really simple rule that works! People have argued with me, saying they needed more LOC for a routine, but not once has that proven to be true - at least not in the code I reviewed. And I'm, by far, not the sharpest tool in the shed. If I can do it, anyone can!

Our coding standard was very strict. It was not possible to cheat and save lines by writing, eg:

  if (func()) a = 1;
You had to write:

  if (func())
    a = 1;
Writing very complex C programs with functions <= 25 lines is definitely possible. All Epita students were routinely doing it!

I was thinking more along the lines of:

if (func()) { a=1; }

/* curly braces didn't count in your allocation of LOC, but they would in mine. */

If you read the abstract of the paper that the post refers to, it actually says that the size of a _class_ affects the number of bugs _in that class_. That's something very different to the size/bugs correlation for a whole application.

This is a hugely important overlooked point. When the measurements have a systematic bias, a product optimized to those measurements will have systemic problems. In this case: large, easy-to-understand classes that are excessive in number and completely fail to interoperate correctly.

Size by itself is very elusive metric and varies hugely depending on the language used. At least for a single method/procedure I found Cyclomatic Complexity (http://en.wikipedia.org/wiki/Cyclomatic_complexity) to be best predictor of maintainability if not quality

The metrics involved here are specifically OO metrics - inheritance depth, number of children and so on, but there's at least one, Weighted Method Count, which uses cyclomatic complexity as a weighting factor. It's described on page 7 of the paper.

It seems counter-intuitive to me too (that the complexity of a method doesn't matter), but perhaps if you have to have that complexity in your application, it's best to have it in one method rather than trying to spread it around with inheritance or some sort of pattern?

It is a result of the paper that once you control for lines of code, Cylomatic Complexity no longer predicts defect rate.

This theory emphasizes the imortances of service oriented architectures. Note: I specifically am not endorsing WS-*.

Decoupling a larger application into smaller applications with well-defined interfaces should reduce cognitive load and, per this theory, perhaps defects as well.

The fact that this is apparently not painfully obvious is almost as pathetic as the fact that people actually try to use it to justify programming languages that reduce the lines of code instead of learning how to write code that isn't terrible. A giant, bloated codebase written in erlang is still a giant, bloated codebase.

That isn't to say erlang or any other language isn't worth learning (they are), its just that no language can save you from bad programming.

I don't think the argument is that some language will completely "save you from bad programming" but rather that it will encourage bad programming less some other language.

Some languages like Java require more lines of code and provide less mechanisms for dealing with complexity than other language (say Erlang), which means that you're more likely to write bad code in Java than Erlang.

All languages are equal, but some are more equal than others.

I always had a bit of a problem with this sort of study because you can never know for sure that you uncovered all the bugs in a program (short of doing a formal proof of correctness). This in turn introduces biases. Case in point, it could be that a program with a higher number of LOC is actually more likely to have a real user base which then leads to a larger number of bugs being discovered.

"Bigger is just something you have to live with in Java. Growth is a fact of life. Java is like a variant of the game of Tetris in which none of the pieces can fill gaps created by the other pieces, so all you can do is pile them up endlessly."

I am reminded of http://qntm.org/files/hatetris/hatetris.html

I had theory that code mainainability is ultimately dependend on nuber of things that programmer can keep in head simultainously. Number ussually beeing beetween 4 and 12 in later text N. so that means if function has more than N parts it will be divided to 2 functions. and if class has more than N methods it will be divided. and so on.. Anything that has more than parts than N will be divided. and we must consider that N is not same for everybody but varies. what is maintainable for one programmer could not be for other.

How to measure what is your N number. It could be done like in that movie Rainman you throw toothpics and must quicly count them , start with large number and in next round remove some until you consistenly correctly count the number of toothpics.

in extension to this there is limit to number of parts that one man can control . this number is N^N

The unfortunately named CRAP method of measuring code quality uses a metric based on Cyclomatic complexity (http://en.wikipedia.org/wiki/Cyclomatic_complexity) and code coverage to estimate change and maintenance risk of code (this page has an equation http://www.artima.com/weblogs/viewpost.jsp?thread=215899). The paper cited in Vivek's article emphasises that code length decreases cognitive complexity. I would bet that Cyclomatic complexity also correlates to bugs and maintainability on the same basis.

More insights about defect prediction in Thomas Zimmermann's publications at:


A lot of his papers are freely available as pdf.

After rummaging around code metrics, I've come to the conclusion that kLOC is the best estimator for 'hardness' of a codebase. There are a few ways to slice it, of course, (no-comment source only? statements only? semi-colons only?), but, fundamentally, I do not see any pragmatic use for code quality metrics besides "How many pages is this". Everything boils down to the amount of moving components in the system.

If you want to simplify it even further there is this: https://github.com/technomancy/bludgeon

    Bludgeon is a tool which will tell you if a given
    library is so large that you could bludgeon
    someone to death with a printout of it.

No code has no bugs.

(A koan.)

Conversly: No code has no features.

Inversely: This feature has no code: http://instantzendo.com (view source to see the code)

I remember that that exact program (written in C!) won a prize at one of the IOCCCs.

Admittedly, I'm pretty sure it won the prize for the "Most Egregious Abuse of the Rules", but it won nonetheless.

Yeah, but there's code there to return the no code page.

No features has no bugs.

So, does this mean that adding tests actually adds bugs to your overall program (main code + tests)? One might hope that well-designed tests at least push the bugs from the main code to the test... wonder if any research has been done on this particular question.

Does this just mean character count or statements/expressions? I've worked with APL-like languages which produce enormous ratios of character to statement/expression but they never seemed particularly easier to debug than say, Python.

I wonder if real-life organizations/bureaucracies obey the same law.

It'd be interesting to look at relative code quality vs. size on a per-language basis. Some languages take a whole lot less boilerplate to accomplish the same thing.

I'd love to know how this is affected when you include whitespace, and if code quality is measurably affected by how much can actually fit on the screen.

There is probably a proof out there for why fewer lines of code is better; but everyone who believes it is too busy pumping out features to bother articulating it.

I would say fewer AND human readable lines are better. The more both can be achieved, the better.

[...] my hypothesis that the number of bugs can primarily be predicted only by the total lines of code [...] I still haven’t found any studies which show what this relationship is like. Does the number of bugs grow linearly with code size? Sub-linearly? Super-linearly? My gut feeling still says “sub-linear”.

That's an interesting question. I'd bet the other way - for example, a 500 kLOC program having more total bugs than the sum of ten 50 kLOC programs.

But even if the sub-linear hypothesis were true, there's the yield problem that semiconductor manufacturers know well. Suppose you have, statistically, one fatal defect per 500 kLOC. That means it's hard to get a functional 500 kLOC program done. But you could get right eight or nine out of ten 50 kLOC programs ...

The 500k loc program will have more bugs, because suppose at best it can be divided into ten 50K components each with the same average number of bugs as the 50k program. The fact that the components must interact correctly will introduce more bugs. How much more is pretty unclear but I am certain it is more.

I wonder if they correlated it with Programming Language. Because this would suggest that the more terse a language is, the less prone to errors it is.

I just cut my Visual Studio font size down to 6, but my code runs the same.

effects are not instant. you must keep your code on font 6 for ever.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact