
Norris Numbers – Walls you hit in program size (2014) - dhotson
https://www.teamten.com/lawrence/writings/norris-numbers.html
======
jacquesm
If he had that insight in 2011 then he was pretty late to the party. By then
the 1K to 2K line limit for 'beginner programmers' was very well established.

To go beyond that you need a few tricks of the trade, mostly structured
programming and avoiding global scope will get you to 10K and up.

After 50K or so you will need even more tricks: version control will become a
must, naming conventions matter and you're probably well into modularization
territory. Some actual high level documentation would also not be a bad idea.

By the time you hit 100's of thousands of lines you are probably looking at a
life-time's worth of achievement by a single programmer, or more likely you
are looking at teamwork. And so you'll need yet more infrastructure around
your project to keep it afloat.

You can see quite a bit of this at work when you browse through open source
repositories at github, those are roughly the points at which plenty of
projects and up being abandoned, as often as not through lack of insight in
how to organize a larger project as through the demotivation that comes from
lack of adoption.

~~~
sasa_buklijas
agree, but it also depends on programing language.

I do not see how 1K of C and Python lines are same.

~~~
lmm
As I understand it the evidence is, surprisingly, just the opposite: defect
rate per line of code is the same, independent of which language you use.

~~~
a1369209993
I don't think that's particularly surprising, and in fact that's the other
half of the point: a 2K 'beginner program' in haskell or lisp could do all the
work of a 50K C or java program, but because it's so much smaller, you
have/need much less support structures and, yes, much fewer total defects.

~~~
lmm
My intuition is that code with 2kloc "worth" of language constructs would have
the same defect rate across languages (and you can get a lot more done in n
constructs in some languages than others), but that code using shorter
identifiers or symbolic operators should have the same defect rate per
"construct" i.e. a higher defect rate per line. But AIUI the evidence doesn't
bear this out, and the per-line defect rate in APL or Perl is the same as in
other languages.

~~~
Joeri
As programmers we like our lines to have a given level of cognitive load,
depending on personal taste, so regardless of the language you’ll end up with
on average the same number of language constructs per line. In high level
languages those constructs do more than in lower level languages, but since
they are already debugged the defect rate per construct is not higher, ergo
the defect rate per line is also stable across languages.

~~~
lmm
> regardless of the language you’ll end up with on average the same number of
> language constructs per line

I'm saying this isn't true. APL is the extreme example, but languages vary
quite a lot in how many constructs you can and do cram onto a single line.

------
maximexx
This can be true for a beginner, for a developer who never hit that wall
before or for a developer without enough talent to learn how to do it right
after all. But once you start using the right methodologies, making things
highly modular and as much as possible independent from each other, you can go
quite a distance before new "walls" arise.

Sometimes I take more time to think of a proper and scalable name for
variables and methods than the time it actually took to write the
implementation. While refactoring I might do another round of thinking about
naming and make sure there are no or an absolute minimum of possible side
effects for the implementation, etc..

I'm coding almost 30 years now. I'm still learning and make my mistakes of
course, but most of the time they occur because I rush for some reason.
Writing good code takes time, especially to rethink what you're doing,
refactoring, making the right adjustments so it completes the codebase. Before
I complete the beta release of a codebase I've made thousands and thousands of
decisions, where only 1 wrong decision can cause a terrible amount of trouble
later on.

For me it's a creative process. It goes in waves. I cannot always be a top
performer, I've accepted that. When I recognise I was in a low during some
implementation I might do a total rewrite of it or apply some serious
refactoring(and force myself to take the time for that).

I still experience coding to be much harder to get right than I ever expected
it to be. For me a codebase is a highly complex system of maybe hundreds of
files with API's working together, not just a bunch of algorithms.

------
gmoes
I feel that there is some naiveté in this perspective, although the OP does
touch on it somewhat. A novice most likely would write their code in a very
monolithic fashion. That same approach fails significantly with larger code
bases.

As a seasoned developer I have come to realize that one of the most important
things to be a good developer is organizational skills. Unfortunately it seems
that ways to organize code bases, including things like naming, mutable state,
modularization, cohesion/coupling, etc., are not as well developed or
understood in general in our industry as they should be.

While understanding and knowing the right algorithms is important. I sometimes
wonder if our emphasis on the knowing algorithms off the top of your head
interviewing process contributes to putting the emphasis on the wrong things
in software development.

~~~
ksk
>Unfortunately it seems that ways to organize code bases, including things
like naming, mutable state, modularization, cohesion/coupling, etc., are not
as well developed or understood in general in our industry as they should be.

True, but I think any attempt to standardize or to make it another engineering
discipline would mean the end of high programmer salaries. Just follow
guidelines and standard procedure in a book and you'll end up with a
reasonable solution that is reasonably good and reasonably reliable at a
reasonable cost. Most businesses would jump at that...

And I'd argue that would be a good thing for all the sectors where programming
is important, but no good programmer wants to actually join because its not
sexy. (coal mining, medical equipment, etc)

> I sometimes wonder if our emphasis on the knowing algorithms off the top of
> your head interviewing process contributes to putting the emphasis on the
> wrong things in software development.

Yes but no business actually cares about creative solutions, unless algorithms
are core to their business (and even then, other human factors outweigh
finding the optimal, bestest, fastest solution). They simply want to use
computers to solve a business problem. They want a runner to run from A to B,
not an Olympic sprinter who is going to break a world record. Do you know a
reasonably decent sort algorithm? Good, just use it. Profiling? Optimizing?
That's for Olympic sprinters. Design patterns? Blindly apply a GoF design
pattern that approximates your problem, etc etc.

~~~
wellpast
I think our industry can come to _understand_ and _articulate_ the skill set
without having to standardize/certify it.

Civil architects can be wildly creative and artful in their industry which
understands its own domain deeply.

~~~
ksk
Right, but when you hire an architect to design a random office building,
you're not expecting a piece of art. My point is the vast majority of
programming projects are random office building # 23.

~~~
wellpast
Then we’re in the realm of automated or at least codified/systematic process,
or we _should_ be, no?

~~~
ksk
Yes, and we should be encouraging more of that. But as programmers (well to my
mind anyway) it nags us when we look at inefficient solutions, even those
which are reasonably robost/adequate. We tend to scoff at people using Visual
Basic to solve a business problem, when its probably the best language for a
large chunk of the problem space.

------
montrose
"Absolutely refuse to add any feature or line of code unless you need it right
now, and need it badly."

As I've gotten older I've seen the value of this rule. Though it sounds merely
negative, it can produce effects that seem little short of genius.

~~~
reificator
Excepting those features which allow you to cull large chunks of code. Those
are my favorite features, rare as they are.

~~~
kelihlodversson
In my experience this is often possible by removing duplicated functionality
into a shared implementation. That is by exchanging multiple features of
linear complexity into a single implementation. Due to being shared, that
implementation will tend to have geometric complexity.

That means the duplication has to be highly significant and variations need to
be few for it to pay off. ... And you'll better hope you have a good set of
tests to verify everything is still working.

In most cases it's still a good idea, but it's still a case where you need to
argue that you really need it, so it's no exeption at all.

~~~
chewbacha
Sounds like the perennial redundancy vs dependency struggle [0]. I do like the
explanation in terms of linear vs geometric complexity though.

[0] [https://yosefk.com/blog/redundancy-vs-dependencies-which-
is-...](https://yosefk.com/blog/redundancy-vs-dependencies-which-is-
worse.html)

------
aaavl2821
I'm a novice just building my first 2k+ program and the thing that seems crazy
to me is that based on what you learn in intro courses, thinks like namespaces
and organization and modularization not only aren't emphasized, but seem like
unimportant distractions from the core task of programming

Based on all the blogs and HN comments and comments here, that couldn't be
further from the truth. Are these just things you learn from experience /
mentors?

~~~
dahart
> thinks like namespaces and organization and modularization not only aren't
> emphasized, but seem like unimportant distractions from the core task of
> programming

There's an analogy I've heard, and I can't remember where. It might be a
Feynman quote I'm recalling...

It'd be crazy to start design work on an airplane, if you're hungry and want
to go to the corner store a block away for chips. It'd also be crazy to try to
walk to China, since it's a long way, and also you might drown. The approach
you take to solving the problem of getting from here to there depends
completely on where here and there are, and what else you want to carry with
you.

I don't use namespaces and build modules for an Arduino project, or a simple
command line tool. But for large projects, I literally can't live without
namespacing and modularization.

The core task of programming constitutes _very_ different activities at
different scales, as different as comparing walking to building jet engines.

------
fredley
Does this depend on language? 20k lines of Perl is a different beast from 20k
lines of Python, or 20k lines of verbose Java, I would expect. If not, does it
suggest we should be aiming to use more expressive languages, that can get
more done in less lines?

~~~
ryanmarsh
_20k lines of Perl is a different beast from 20k lines of Python, or 20k lines
of verbose Java_

Very good point. We’re all familiar with the “I can do ${lambda} in ${n} lines
with ${lang} language”.

It seems to take me 10x more lines to do something in Java vs. Python, and 10x
lines to do something in Python vs. Perl. Except that the Java is laborious to
read, the Python is nicely readable, and the Perl will never be understood by
a programmer.

I had great fun writing Perl many years ago but I used to joke that Perl was a
“write only” language.

~~~
dublin
I had lunch with Eric Raymond here in Austin a 15-20 years ago, and he was
lamenting the fact that he couldn't read _his own_ Perl programs less than a
year after writing them, so he was rewriting them all in Python. I never get
religious about languages or platforms, but Python, which has been described
as "executable pseudocode" does a nice job of balancing power and readbility.
Idiomatic Perl is powerful, but opaque enough to be effectively
unmaintainable. Since errors are pretty much proportional to LOC, it _is_ best
to use languages that offer "high leverage": Python, Tcl, and Lua come to
mind. Note that C is still almost the only choice for real embedded work -
it's unsurpassed at register bit-banging. (Well, except for assembly, but that
requires skills that are no longer taught - interestingly, the same ones that
make good embedded C programmers...)

~~~
ryanmarsh
_he couldn 't read his own Perl programs less than a year after writing them_

A year? I often lost a half day trying to understand Perl I’d written less
than a week before. I’m not exaggerating.

I’m not sure what that says about me as a programmer but I’m terrified to find
out.

------
watmough
I hadn't come across this before, and sure enough, looking at the current
personal project I'm working on, it's composed of 3 main files of C++, a
Windows program, a parser [1] and a custom OpenGL control [2], each is 500 -
600 lines.

For me, I use Stepwise Refinement [3] to get to this point, but to get
further, I have to start breaking out a more abstract approach. I found this
definition of Structured Programming that puts it very nicely [4].

There's also a perhaps self-imposed wall, where you might trial a solution and
implement a prototype, then use that to realize that it no point in pressing
on without some serious redesign of a key component. For my example above, the
rendering part is old-style OpenGL, which works pretty nicely at 60 fps, but
I'm holding off doing more until I can slot in and benchmark a better approach
using vertex buffers and shaders, with the goals of enabling me to shape and
scale the renderings in an abstracted coordinate system, and scale to
rendering hundreds of files instead of just one.

[1]
[https://twitter.com/watmough/status/962470455037841409](https://twitter.com/watmough/status/962470455037841409)

[2]
[https://twitter.com/watmough/status/965007110391128064](https://twitter.com/watmough/status/965007110391128064)

[3] [http://www.informatik.uni-
bremen.de/gdpa/def/def_s/STEPWISE_...](http://www.informatik.uni-
bremen.de/gdpa/def/def_s/STEPWISE_REFINEMENT.htm)

[4] [https://www.encyclopedia.com/computing/dictionaries-
thesauru...](https://www.encyclopedia.com/computing/dictionaries-thesauruses-
pictures-and-press-releases/structured-programming)

------
dang
Discussed at the time:
[https://news.ycombinator.com/item?id=8072730](https://news.ycombinator.com/item?id=8072730)

and in 2015:
[https://news.ycombinator.com/item?id=10191540](https://news.ycombinator.com/item?id=10191540)

------
cortesoft
I avoid these walls entirely by never using newlines in my code.

~~~
HumanDrivenDev
That's webscale af

------
psyc
Interesting. My main project is getting near the 20k range. I haven't hit a
wall, but it is getting noticeably harder to stay fluent in every part of the
program. When I switch from one major system to another, there is a few days
ramp-up. Still happy with readability and complexity.

~~~
photojosh
Same. There are some parts of the code that I haven't touched in a few years,
and when I do there's a significant effort to refamiliarise myself... and a
whole lot of "what stupid idiot wrote this", oh yeah, that was me three years
ago. :)

I just finished the Python 2 -> 3 upgrade on it, that was fun.

------
theSage
Are there projects we can undertake to intentionally hit those walls and
measure ourselves?

~~~
kybernetikos
Like the equivalent of 'memory-hard' problems - 'lines-of-code' hard problems.

I suppose that's close to what
[https://en.wikipedia.org/wiki/Kolmogorov_complexity](https://en.wikipedia.org/wiki/Kolmogorov_complexity)
is.

~~~
theSage
Ah no. I meant what task can we undertake to see this in action? For example,
writing your own compiler is bound to make you hit the 1k wall. If you're able
to write it chances are you have passed the 1k wall some time in your past.
That kind of measurement.

~~~
kybernetikos
Part of the problem is that people who are good at dealing with large
codebases tend to write fewer lines of code if they can at all get away with
it.

------
Paul_S
A bit over dramatic. I remember Unreal 3 was close to 2 mil and that's the
normal region for games (and has been for a decade) and trust me, gamedev
studios are _not_ staffed by seasoned programmers.

~~~
avinium
Is that just scripting, or does that include the engine?

If it's just game scripting, then I could imagine it's possible to squeeze 2
mil lines out of junior devs. I assume there are very few things that can go
drastically "wrong".

~~~
Paul_S
"Scripting" vs coding is not a clear cut case also a lot of studios (back in
the day that is, since now the unreal model is different and everyone has
access to the source) had source access so games would usually make changes to
the engine. 2 mil is just the base engine. There's of course loads of
middleware and plugins and integration for it all, probably doubling the size
and of course the "scripting" which adds complexity like anything else and it
relies on support for things in the main codebase. Customising the engine is
not a matter of tweaking variables but extending the base classes. And for
game logic you might be writing completely separate systems with their own
architecture.

Repos were big. Really big. Especially since we're talking about people who
will check in FMVs into source control.

Ah... what a horrible world.

------
agumonkey
what about paulg onlisp and leveraging nested macros ?

what about kay vpri efforts to reduce a full system to 100K (OMeta)

------
g5095
shard your problem space.

~~~
mrweasel
Indeed, micro services can be a pain in the butt to debug, due to
communication, but they can help you to view the world as a collection of
smaller easily understood programs, rather than one huge monolith.

~~~
dublin
Not exactly a new idea:
[https://en.wikipedia.org/wiki/Unix_philosophy](https://en.wikipedia.org/wiki/Unix_philosophy)

