
Norris numbers - johndcook
http://www.teamten.com/lawrence/writings/norris-numbers.html
======
npsimons
The real interesting corollary here is that no matter what level you are at,
you will probably want to leverage a language that's as powerful as possible,
since you will be able to accomplish more with less lines of code[1].

A few months back, I expressed disappointment with Google Maps' dropping of
support for displaying KMZs from other servers, and my intent to put together
a replacement. When I finally got around to it, I found a JS library[2] (plus
plugin[3]) that allowed me to replicate what I needed in about 10 lines of
JS[4]. Of course, this is more the power of leveraging a library (and public
tile servers), but I think it's still instructive to my point.

[1] -
[http://www.paulgraham.com/power.html](http://www.paulgraham.com/power.html)

[2] - [http://leafletjs.com/](http://leafletjs.com/)

[3] - [https://github.com/mpetazzoni/leaflet-
gpx](https://github.com/mpetazzoni/leaflet-gpx)

[4] -
[http://hardcorehackers.com/~npsimons/photos/2014/07-13:%20Ow...](http://hardcorehackers.com/~npsimons/photos/2014/07-13:%20Owens%20Point/gpstrack.html)

~~~
smsm42
I think this is not the right understanding, since LOC here is being used as a
proxy for complexity, but language that packs more complexity into single line
would not necessarily reduce the complexity. It might, but it might not - e.g.
it is true that you can implement Conway's life game as one-liner in APL [1],
but does it become much simpler than multi-line implementation of the same in
more mainstream language? I would not really say so.

[1] [http://catpad.net/michael/apl/](http://catpad.net/michael/apl/)

------
DanielBMarkham
_The real trick is knowing when a new feature adds linear complexity (its own
weight only) or geometric complexity (interacts with other features)._

This is why so many folks love DSLs. Fully abstract and get your geometric
features out early; test them and make them flexible without having to re-code
anything.

~~~
sp332
Alan Kay is heading up a project to write a whole OS - network stack, graphics
and all - in 20,000 lines of code.
[http://vpri.org/html/work/ifnct.htm](http://vpri.org/html/work/ifnct.htm) The
whole project is basically made out of DSLs.

~~~
renox
Is the project still alive?

The last progress report about the project was in 2011..

~~~
sp332
Well, they're still writing papers about it at least.
[http://vpri.org/html/writings.php](http://vpri.org/html/writings.php)

~~~
agumonkey
Recent papers mention STEPS but it seems a done experiment.

------
sgt101
There is shockingly little data on this available. The most systematic
research I have found was led by Lehman at Imperial in the 90's :
[http://www.eis.mdx.ac.uk/staffpages/mml/feast2/papers/abstra...](http://www.eis.mdx.ac.uk/staffpages/mml/feast2/papers/abstracts.html)

Interestingly I am in a position to revisit the evolution of one of the
systems in the FEAST study. The key insight from FEAST was the idea of an
"S-curve" where complexity overwhelms drive for features. Initial development
progress is slow because the infrastructure and framework for the system is
not in place, then comes rapid development then a slow down as complexity
kicks in.

The reviewed system behaved like that in the study (I checked the data) and
for a few years after. But then a period of explosive growth occurred, halted
only by a strategic decision to move away from the platform for technology
management (obsolescence of mainframes) reasons.

------
NathanKP
I've never coded a piece of software that needed 3 million lines of code. I'm
at the 200,000 LOC level right now.

However, from my software architect experience I imagine that any project in
the millions of lines of code would be best broken into smaller services that
communicate through a common backbone.

This type of architecture would allow a large dev team to be broken into
smaller groups that each focus on smaller, manageable subsets of the large
code base with each subset being a service which then communicates with other
subsets through a backbone.

It would require a lot of internal documentation and back and forth
communication between internal teams to get the services to integrate with
each other flawlessly but it shouldn't be too difficult with proper care and
talent.

~~~
chipsy
Yes, I point to how we accomplished networking. "Internet-scale" code is
basically a lot of interacting systems of protocols. Protocol development
seems like the "final step" in scaling a system.

------
walshemj
Is this 1500 lines in a monolithic program with no structure?

Certainly I found no problems in working with some of the billing systems I
worked on at BT which had way way beyond 2k loc and probably around the 200k.

I recall 1400 long line if blocks in one bit of fortran 77 code :-)

But we did have a real genius as team leader

~~~
adamtj
Genius?

A weak man can lift a small rock. A strong man can lift a large rock. But to
lift a ten ton boulder, you need a crane. Even a weak man can operate a crane,
if he knows how.

Your genius is like a strong but stupid man. Maybe he or she can lift a 1400
line 'if' rock, I mean block, but that's probably the limit. Any more would
require a different mode of operation -- the programming equivalent of using a
crane.

The point is that people who haven't broken the 1500 line wall will write code
with no structure. That's _why_ they can't break through the wall. That's the
heaviest code they can lift without assistance.

Certainly there are well structured 1500 line programs. But those
overwhelmingly come from people who have broken through that limit. The people
who know how to use a crane will tend to use them even for smaller projects.

Why would I go to all the trouble to use a giant crane just to lift a smallish
20 kg (40 lbs) rock? Because then I won't get tired or strain my back. So I
can lift another. And another, and another. I can do this all day without
breaking a sweat. The strong man is tired and sore and worn out from just a
few dozen small rocks, while I'm still fresh, still going, and still able to
lift multi-ton boulders.

~~~
walshemj
And your point is?

We where doing MR back in the early 80's on the largest cluster of super
mini's in the UK - dam right Cliff was a genius the project would have not
worked with out him.

~~~
adamtj
My point? What, are you feeling insulted? If so, stop. After all, I don't know
him or you or anything about the project you were on. Anybody reading my
comment would know that.

I thought maybe you missed the point of the article, that's all. Read it
again, more closely. Perhaps then you'll see Cliff in a different light.

If you missed the point of the article and he wrote a 1400 line 'if'
statement, then it's at least _plausible_ that you're both stuck behind that
first or second wall. I could easily be wrong. Then again, maybe this article
has the key you need to catapult yourself far ahead of Cliff. He may well be a
genius, but you could potentially be far more effective and productive. You
may even be able to accomplish things that Cliff would fail at.

So, give the article another read.

~~~
metaphorm
Fortran77 though. pretty limited set of organizational tools to work with
there. Procs and Modules is it, I think (though I'm no expert). Many of the
patterns we're used to using for code organization just wouldn't be available.

So how you would deal with a really big if statement without having more
sophisticated tools? The business logic must have had a very large number of
conditional branches. Even if you encapsulate the instructions of each branch
in their own proc, it still doesn't change the fact that you're going to have
a large number of conditions.

This isn't particularly a solved problem even in modern languages. We can use
all kinds of layers of abstraction to make individual units of code smaller,
but it's sort of an illusion. Splatting your business logic across a dozen or
more files/modules/classes whatever might make it easier to look at in a text
editor but it doesn't make it easier to develop or test. If you've got complex
business logic you're going to have a complex program.

~~~
walshemj
the system was broken down into a lot of sub processes from memory the 1400
long branch if was just a branching if then else statement - most weren't as
bad.

A core part of the documentation was large number of A1 sheets which covered
an entre wall of our office ;-)

The core of the system working out what do to with all the log records was a
collection of Pl/1G.

We did have to build a lot of extra stuff you get for free nowadays we had a
custom build system that you could use to build any part or the whole system
written in primes JCL.

------
GregBuchholz
How many lines of code is our DNA? Could there be a similar effect at work in
other systems, concerning network size and orthogonality that limits their
size? Things like the amount of complexity in DNA of organisms, or the size of
cells, etc.?

~~~
philh
The size of our (non-junk) DNA is limited by its mutation rate. (Though I
don't know how the limit grows as a function of mutation rate, and I don't
know if we've hit that limit as a species.)

~~~
GregBuchholz
I thought that "junk DNA" was now a discarded misnomer, and that while there
are sections that don't code for protein synthesis, it still has a biological
function. It looks like it is still up in the air as to what percentage of
non-coding DNA is biologically vital.

[http://en.wikipedia.org/wiki/Noncoding_DNA](http://en.wikipedia.org/wiki/Noncoding_DNA)

~~~
philh
If junk DNA doesn't exist, the limit applies to the whole genome.

(I lean towards thinking that some non-junk DNA has been previously mistakenly
classified as junk, but that junk DNA still exists in a meaningful sense. But
I'm a layman, and I don't even have a strong idea of what the expert consensus
is.)

------
jamesdutc
This is a very interesting phenomenon, but I think one should be wary of using
it for decision-making in the absence of distinguishing principles. "Keep
things simple": yes, of course, and I also look both ways before I cross the
street.

Believing in this phenomenon has promotional value - those other guys just
don't "get it." I worry that deeply internalising it bears great risk of self-
delusion. Acknowledgement of this phenomenon, irrespective of whether it
exists or not, may prove detrimental.

One need not heed the arguments of those lesser 2,000-liners; only a
200,000-liner possesses the _je ne sais quoi_ to know the right choices.

------
simgidacav
I'm not convinced by the "number of lines" measure unit. It's clearly
something which depends on the programming language, so one should gather
different statistics depending on the language. Not to mention the fact that
different projects might be prone to get messed up in different ways...

------
habitue
I don't see how "refuse to add features" allows complexity to scale past 20K.
That seems more like a strategy to stay under a given number of LOC, dodging
the issue (which isn't to say it's a bad strategy).

~~~
cfallin
I see it more as "refuse to add the wrong features". The key to scaling IMHO
is to avoid the ball-of-mud where every feature interacts with every other
feature, making the design impossible to disentangle and modify sanely. In
software engineering terms, we want low coupling. So we choose the right
_primitives_ at the lowest level, such that additional requirements/feature
requests can be built by composing primitives or attaching add-ons or using a
well-defined hook interface or somesuch. This keeps the core clean and easy to
reason about, and avoids surprises.

~~~
mechsin
I don't quiet see it as just refusing to add the wrong features either. While
that is defiantly part of it the article states not to add something "unless
you need it right now, and need it badly". That seems like quite a high
threshold in our minds sometimes but it really isn't. I think it just points
to needing to think about what you’re adding and how you’re adding before you
just go and do it. Stuff like do I really have to do this with a tall stack of
if statements and check each case or is there a better way or what happens if
I modify this class method, etc. As pointed out the advice to move from 2,000
to 20,000 is thoughtful classes and proper packaging. Those are what most
would consider just BBP but if you’re a novice who’s major experience is
writing script type things you don't normal consider things like how is your
process going to fit into a over arching frame work and even if you do you
many not have the raw experience necessary to let you see the shape your
project is going to take in the future.

------
chipsy
I have to wonder if "breaking the wall" is correlated with being able to write
more densely and create more functionality with less code.

~~~
not_kurt_godel
It's not about density. In fact, the opposite. It's about making each piece of
the code dead-simple in terms of readability and functionality. When every
class does exactly what you would expect it to, you can build complex
structures reliably (which themselves act as dead-simple abstractions on top
of the building blocks, and so on).

~~~
chipsy
I disagree. Simple things are _necessarily_ dense in their implementation
because they're so exacting. Recall Simple Made Easy[0].

[0] [http://www.infoq.com/presentations/Simple-Made-
Easy](http://www.infoq.com/presentations/Simple-Made-Easy)

------
agzam
2000 lines of C++ and 2000 lines of Haskell code are definitely not the same
things

------
ExpiredLink
The article has some valid points. But a person who needs to emphasize his
superiority against novice programmers in such a way lacks maturity of
personality.

~~~
interpol_p
I didn't get that from the article. It felt like the author was emphasising
experience, not superiority. We all went through these stages, the author
acknowledges that.

The article doesn't say anything negative about novice programmers, just that
these are the things we go through when we learn. I certainly went through the
stages described, and I'm hitting the near-million-line projects now and
wonder what happens next.

