Note that 'size' is a dimensionless quality; we can only approximate it with certain proxy metrics (KSLOC, Function Points, Budget allocation).
Edit: and gzipped size, and token counts, and logical lines, and Halstead metrics, and cyclomatic complexity, and object points, and ... and ... and ...
For example, project size is the best predictor of whether a project will meet its initial budget/time/feature/quality goals (Boehm, Standish). It totally swamps staff quality, programming language, programming process, tools, libraries, everything in this respect (Boehm).
Per (Standish), a project with a budget > US$10 million at launch has a 98% probability of not meeting its goals and from memory < 50% probability of avoiding cancellation.
In fact I have a totally untested hypothesis that agile "works" because it's mostly applied by small teams to small projects.
(Boehm): Barry Boehm, Software Cost Estimation with COCOMO II
(Standish): The Standish Group CHAOS Report.
I think that agile works more as an interface and contract helper between customer and supplier. A few years ago it was very difficult to convince a customer to follow the iterative way. The customer just wanted all the features in time.
Size has its own unique problems.
Basically, it's a symptom of the idea that work expands to fill time, agile works IMHO because it avoids spending time that doesn't need to be spent.
Does big 'size' cause a big budget, or do big budgets cause blooming 'size'? A bit of both I'd wager.
The consensus is that routines should have fewer than 200 LOC, but that routines shorter than ~30 LOC are not correlated with lower cost, fault rate, or programmer comprehension. btw, the longest function I've seen in commercial software I've worked on was 12,000 LOC! I will not name names. :)
* A study by Basili and Perricone found that routine size was inversely correlated with errors; as the size of routines increased (up to 200 LOC), the number of errors per LOC decreased (1984).
* Another study found that routine size was not correlated with errors, even though structural complexity and amount of data were correlated with errors (Shen et al. 1985)
* A 1986 study found that small routines (32 LOC or fewer) were not correlated with lower cost or fault rate (Card, Church, and Agresti 1986; Card and Glass 1990). The evidence suggested that larger routines (65 LOC or more) were cheaper to develop per LOC.
* An empirical study of 450 routines found that small routines (those with fewer than 143 source statements, including comments) had 23% more errors per LOC than larger routines (Selby and Basili 1991).
* A study of upper-level computer-science students found that students' comprehension of a program that was super-modularized into routines about 10 lines long was no better than their comprehension of a program that had no routines at all (Conte, Dunsmore, and Shen 1986). When the program was broken into routines of moderate length (about 25 lines), however, students scored 65% better on a test of comprehension.
* A recent [sic!] study found that code needed to be changed least when routines averaged 100 to 150 LOC (Lind and Vairavan 1989).
* In a study of the code for IBM's OS/360 operating system and other systems, the most error-prone routines were those that were larger than 500 LOC. Beyond 500 lines, the error rate tended to be proportional to the size of the routine (Jones 1986a).
* An empirical study of a 148 KLOC program found that routines with fewer than 143 source statements were 2.4 times less expensive to fix than larger routines (Selby and Basili 1991).
So the studies that measure method length/bug count correlation within a single code base or in code written within a single organization might only measure the fact that code that requires no thinking contains fewer bugs than code that does. Paging Captain Obvious. Some of the other studies address that (e.g. Shen 1985 and the code comprehension studies), but as it is so often the case in quantitative studies of things related to programmer productivity, we lack repeated measurments where the only variable is the independent factor whose influence is studied.
The last time we debated this on HN, there was a disagreement about how much complexity the interactions between functions add to a program. To me, complex call graphs are even worse than complex code inside a function. I was surprised to learn that an opposing view even existed.
I have a problem specific to that, here. Some programmers follow the "no documentation should be needed, the code is obvious" dogma in a non-typed scripting language -- and you have to look many levels up in the call graph before you even find out about the damn function parameter's types... :-(
This is different from short functions being short because they don't do much.
In trivial flow (short routine) bugs tend to stand out whereas in complex flow (longer routine with multiple levels of nesting) bugs can be much harder to spot.
It may actually be good that more errors were found in shorter routines after all that is exactly what it is about, finding errors, not making errors.
Other than OO, we also have much better tools for navigating code now. That may have changed how we approach and understand unknown code.
small routines [..] had 23% more errors per LOC
Another good book by McConnell, which also discusses size with summaries of studies, is Software Estimation: Demystifying the Black Art.
Also, considering the date, this research could be influenced by the bug-prone-ness of parameter passing in C as a non-memory-managed language.
Table 4: Overall model accuracy using different software measures
Precision Recall Model
86.2% 84.0% Organizational Structure
78.6% 79.9% Code Churn
79.3% 66.0% Code Complexity
74.4% 69.9% Dependencies
83.8% 54.4% Code Coverage
73.8% 62.9% Pre-Release Bugs
Edit: I have been meaning to make a git tool that would analyze the history of a project to create predictions on what bit of code is the most buggy using this model, but just haven't done it yet. It would be cool to integrate it with GitHub's bugs api to see how correct it might be. If someone does make it let me know!
interesting, my gut says exponential, which is why cost and likelihood of project cancellation shoot up in the largest projects.
edit: i have been corrected below, i concur with quadratic.
Edit: And polynomial is O(n^m).
But obviously the real answer is to write a program that will correctly verify all other programs.
Make a table with columns being # points and the # lines you can draw between them.
3 3 (triangle)
4 6 (box with a criss cross)
5 10 (5 point star, with everything connected)
This relation is n * (n - 1) / 2
Not sure what you mean by "geometric" in this context.
Similarly, there are brief ways to write things and verbose ways. Sometimes what is commonly written as large factory classes and interfaces in one language works out to a simple higher-order function or macro in another language. See the old "evolution of a programmer" joke for some extreme examples.
I would argue that it isn't just bugs per feature, but bugs per interaction-point. the more features there are, the more interaction points between those features.
Consider a very high quality code base with a lot of features. If you leverage every best practice by using advanced techniques such as advanced functional programming (using closures, macros, monads, etc.), aspect-oriented programming, and domain-specific-languages as necessary then you could still have a very small code base.
Most code bloat is due to things like unnecessary duplicate code (such as common error handling, logging, or thread management idioms being inlined and "unrolled" everywhere rather than tucked away behind nice abstractions), different code using different sub-components that are very similar (e.g. every team using their own hand-rolled string class in C++), working around limitations of whatever language is used (stretching limited languages too far), working around design defects, and such like. Anyone who has had even a tiny exposure to functional programming techniques can appreciate the enormous power it has to reduce code quantity.
I've heard it said for years that studies show the number of bugs grows roughly linearly with code size and that this holds true across any programming language. (It's repeated, for example, at http://c2.com/cgi/wiki?LinesOfCode) I think I've even seen references to such studies, but I don't remember where. So, HN: what are these studies? Anybody know? (I mean besides the one referenced by the OP on class size. I believe this meme goes further back than that.)
My second question is about how to measure code size. PG said a few years ago: why not just count tokens? I've thought about this ever since and I don't see what's wrong with it. Raw character count is obviously a lame metric, and LOC isn't much better. But token count seems like an apples-to-apples comparison that is easy to measure objectively and leaves out noise such as name length and whitespace. So: what's wrong with token count as a measurement of code size?
Here's a study showing addressing changed requirements or ﬁxing program defects requires a program maintenance effort that is directly proportional to the size of a program:
"Repair maintenance is more highly correlated with the number of lines of source code in the program than it is to software science metrics."
Edit: I read it. This must certainly be one of the studies people are referring to; it covers exactly the question of interest. The major limitation is that all the programs were in PL/I, so it says nothing about language-independence. The important finding is that line count was the most highly correlated variable with bug count of those studied, quite a bit more so than more complex metrics were (Halstead's E). It's also interesting that although the authors write, "There are some very large programs in this set," the largest program was in fact only a very modest (by our standards) 6572 lines of PL/I.
...and some of his other work:
If you have access to the publication libraries of the ACM and IEEE, you'll find that they publish most of this literature.
(It costs money for both, unfortunately).
> My second question is about how to measure code size. PG said a few years ago: why not just count tokens?
Some schemes do so, it depends on how you define "SLOC". The classic problem is if-thens.
How many lines is this?
if ( foo ) then bar else baz
if ( foo )
if ( foo ) then
> So: what's wrong with token count as a measurement of code size?
I vaguely recall that some properties correlate with physical lines and others with logical. I can't recall what and which, sorry.
I know where to find research literature. I'm asking for specific citations. Is the claim an urban legend? If "studies show" X, one ought to be able to point to the studies.
All three of your examples have the same number of tokens, so to judge by them alone, token count is not just a good measurement of code size, it's a perfect one. My question is what's wrong with it.
I don't see how "logical lines", whatever that is, can possibly be simpler than counting tokens. In fact I don't see how anything can be simpler than counting tokens, since it's easy to know what it means, a tokenizer is always available, and everything irrelevant to the program is by definition dropped.
I don't see how "logical lines", whatever that is
Logical lines are independent from physical lines as each logical line can be split over multiple physical lines (eg splitting up a long string IO operation), or one physical line can contain multiple logical lines (this is a bad idea in most cases though).
Since the definition hinges a bit on what somebody considers as "logically belonging together", the whole concept is a bit fuzzy. Consider this string formatting operation (Python):
Do you consider this one logical line? Or would your logical lines look more like this:
somestring = somestring.split(somechar)
somestring = somestring[-1]
somestring = somestring.replace("foo", "bar")
Both are valid interpretations of logical lines, but they are visually and conceptually quite different, which makes it - imho - a bit problematic to try and use them as a measure of code quality.
I prefer to turn the debate around on its head. Rather than argue complexity metrics (boring) I say I prefer languages without boilerplate because they make complexity harder to camouflage.
Not saying that there is such a correlation, just that there may be cases where it is useful to measure by line count.
None to hand right now.
> I don't see how "logical lines", whatever that is, can possibly be simpler than counting tokens.
I wasn't rejecting token counts per se. I think that it's a useful metric too.
What I was trying to convey is that "logical lines" is the term used in the literature. Logical lines can cover token counts if you define 1 token = 1 logical line. Or it might not. Either way, you have to settle on a definition.
This has some problems (e.g. spaces vs. tabs, utf8, etc.), but all of these size metrics will be pretty loose.
Take any program. Replace all the names with the smallest possible character sequences. Have you made the program simpler? Or smaller in any meaningful way? Surely not. I'd say what you've done is left its logical structure precisely intact (another way of saying that token count is a good metric) while reducing its readability.
Complexity -> Code Size
Code Size -> Long Variable names (win for big codes)
Complexity is bad
Therefore long variable names are a symptom of a problem, but not the problem themselves. Long variable names aren't bad, but they are still a
good predictor of badness. Since size metrics are meant to predict badness, long identifiers should increase size metrics.
Consider a language like K, in which variables usually have one-letter names. The real code-size win for K is not that. It's that the language is so powerful that complex things can be expressed in remarkably compact strings of operators and operands. (Short variable names, I'd argue, are an epiphenomenon. It's because the programs are so small that you don't need anything longer, and longer names would drown out the logical structure of the program and make it harder to read.) Token count is a good metric here. Both line count and byte count come out artificially low, but token count can't.
But I still don't see how you get around the objection that, according to your preferred metric, if you replace all the names with arbitrarily small character sequences, you get significantly smaller code - yet clearly not better code.
Another interesting set of metrics is Halstead's "software science" metrics. They fell out of favour because initially they were hard to count and didn't seem to correlate with anything else.
As I noted somewhere above, "size" is an abstract, dimensionless quality. It can only be approached through proxies. The more the merrier, I reckon, especially if they turn out to correlate with different things.
I have absolutely no reason to doubt this, but I suspect this does not look deeply enough at the process.
Bugs, normally, get fixed, especially towards the end of a project, and it is much easier to eradicate bugs, and verify eradication, in a small project, than it is in a large one.
Poor bug-fixing, in the late stages of a large, buggy project, may well introduce further bugs, as well as discovering latent bugs, masked by the original ones.
If I can write a piece of code in fewer lines, I'll do it. That pretty obvious, we all would. But I try to take it a step further and consciously seek solutions that lead to fewer lines of code. Chopping down a large block of code is an incredibly gratifying feeling for me.
I find that writing less code while maintaining expressiveness usually leads to simpler solutions and, IMO, it is simplicity that reduces the bug count.
While, given a moment or two, I can unpack a dense list comprehension (a one liner) I would rather read several statements that add up the same thing.
Of course there are many times when you could do the same thing with fewer lines, in a more elegant, straight-forward way. However I have a hard time believing that just having fewer lines is a sufficient goal for flexible, maintainable code.
Then again I'm pretty new at this :)
Even though some lines didn't count (like those containing a single curly brace), it was very tough, but always possible. This applied to all projects, small and large. They made us write mostly Unix apps, like an FTP server, a command-line NNTP client, a POSIX shell (I still remember how meticulous you had to be when reading all the man pages to implement process control and terminal control correctly!). Plus the code had to be portable across all 3 Unix OS running at the school: NetBSD, Solaris, Digital Unix. This was in 2000-2001.
For example I just checked the FTP server I wrote for one of the assignments (I still have a copy): 3123 lines and all the functions are <= 25 lines of code. Such rigorousness definitely shaped the quality of the code I now write professionally, 10 years later...
if (func()) a = 1;
a = 1;
/* curly braces didn't count in your allocation of LOC, but they would in mine. */
It seems counter-intuitive to me too (that the complexity of a method doesn't matter), but perhaps if you have to have that complexity in your application, it's best to have it in one method rather than trying to spread it around with inheritance or some sort of pattern?
Decoupling a larger application into smaller applications with well-defined interfaces should reduce cognitive load and, per this theory, perhaps defects as well.
That isn't to say erlang or any other language isn't worth learning (they are), its just that no language can save you from bad programming.
Some languages like Java require more lines of code and provide less mechanisms for dealing with complexity than other language (say Erlang), which means that you're more likely to write bad code in Java than Erlang.
All languages are equal, but some are more equal than others.
I am reminded of http://qntm.org/files/hatetris/hatetris.html
How to measure what is your N number. It could be done like in that movie Rainman you throw toothpics and must quicly count them , start with large number and in next round remove some until you consistenly correctly count the number of toothpics.
in extension to this there is limit to number of parts that one man can control . this number is N^N
A lot of his papers are freely available as pdf.
Bludgeon is a tool which will tell you if a given
library is so large that you could bludgeon
someone to death with a printout of it.
Admittedly, I'm pretty sure it won the prize for the "Most Egregious Abuse of the Rules", but it won nonetheless.
( http://homepage.ntlworld.com/richard.leedham-green/ )
That's an interesting question. I'd bet the other way - for example, a 500 kLOC program having more total bugs than the sum of ten 50 kLOC programs.
But even if the sub-linear hypothesis were true, there's the yield problem that semiconductor manufacturers know well. Suppose you have, statistically, one fatal defect per 500 kLOC. That means it's hard to get a functional 500 kLOC program done. But you could get right eight or nine out of ten 50 kLOC programs ...