
Well Crafted Code, Quality, Speed and Budget - struppi
http://devteams.at/well_crafted_code_quality_speed_budget
======
zenogais
Curious if the author has read anything by Watts Humphrey? He spent his entire
career at SEI arguing this point and attempting to develop the tools and
processes to bear it out. This article reads like a repetition of his
arguments from A Disicipline for Software Engineering (1995).

~~~
nickpsecurity
I learned in a discussion with "kragen" that "software engineering" sort of
took on a life of its own that turned into a nightmare of management-focused
stuff harmful to programming. Most programmers, especially mainstream crowd,
have an aversion to the term from the permanent association. I started saying
CompSci and focusing on prior techniques rather than "engineering" due to that
permanent scarring. As in my other comment, I just focus on specific
techniques and processes that help programmers do their job with evidence they
work.

It's my suggestion to you. I'm going to try to dig up and read that paper out
of curiosity as I know many useful things came out of software engineering
research. Most won't, though, the second they see "SEI" or "software
engineering." So, might be best for us to just drop that and focus on
improvements to "programming" or "development" with specific techniques.

------
nickpsecurity
I'm mixed about the post, esp the science part. The science of developing
robust software is great and pretty consistent going back decades varying
mostly in specific tools and tactics. Mainstream programming just _doesn 't
apply it_ although more adoption in past decade of key techniques. Here's some
computer science from the 1960's-1980's used in robust and secure system
development (esp Orange Book B3 or CC EAL6) people might want to copy. I'm
taking an empirical route where I reference techniques that were applied to
many real-world projects with lessons learned in papers or studies that were
consistent. All one can do with limited data & these aren't in order of
importance.

1\. Formal, non-English (eg math/logical) specifications of requirements or
abstract design. English is ambiguous and misreadings of it caused countless
errors, even back then. CompSci researchers tried formal specs with both
English as a start and precise notations (eg Z, VDM, ASM's, statecharts) for
clarity in specifics. Result was many inconsistencies caught in highly assured
systems and protocol specs before coding even began.

2\. High assurance stuff often used mathematical (formal) verification.
Whether that worked or made sense was hit and miss. More on it later. Yet,
virtually all of them said there was benefit in restrictions on the specs,
design, and coding style to fit the provers' limitations. Essentially, they
used boring constructs that were easy to analyse and this prevented/caught
problems. Don't be too clever with design or code. Wirth and Hansen applied
this to language design to bake safety & comprehension in with minimal to low
loss in performance.

Note: Led to Nick P's Law of Trustworthy Systems: "Tried and true beats novel
or new." Always the default.

3\. Dijkstra's THE project showed that modular, layered design with careful
attention to interfaces (and interface checks) makes for most robust and
maintainable software. Later results confirmed this where each module must fit
in your head and control graph that's pretty predictable with minimal cycles
prevented all kinds of local-becomes-global issues. Many systems flawless (or
nearly so) in production were built this way. Dijkstra correctly noted that it
was _very hard_ to do this even for smart people and average developer might
screw structuring up a lot. Solid prediction... but still worth striving for
improvement here.

4\. Fagan ran empirical studies at IBM that showed a regular, systematic, code
review process caught many problems, even what tests missed. Turned that into
formal inspections with the periodicity and prioritizing tuned per
organization for right cost-benefit. Was generalized to whole SDLC by others
in high robustness areas. Improved every project that used it from then on.
Exactly what parameters to use is still open-ended but periodically looking
for well-known flaws with reference sheet always works.

5\. Testing for every feature, code-path, prior issues outside of code base,
and common use-case. All of these have shown repeated benefits. There's a cut-
off point for each that's still an open, research problem. However, at a
minimum, usage-based testing and regression testing helped many projects
achieve either zero or near-zero, _user-facing_ defects in production. That's
a very important differentiator as 100 bugs user never experiences is better
than 5 that they do regularly. Mills' Cleanroom process combined simple
implementation, code review, and usage-testing for insanely-high,
statistically-certifiable quality even for amateur teams.

6\. By around 60's-70's, it became clear that the language you choose has a
significant effect on productivity, defects, maintenance, and integration.
Numerous studies were run in industry and military comparing various ones.
Certain languages (eg Ada) showed vastly lower defects, equal/better
productivity, and great maintenance/integration in every study. Haven't seen
many such studies since the 90's and most aren't constructed well to eliminate
bias. However, it's grounded in science to claim that certain language choices
prevent common negatives and encourage positives. So, it follows to adopt
languages that make robust development easier.

7\. By the 80's or 90's, it was clear that computers were better at finding
certain problems in specs and code than humans. This gave rise to
methodologies that put models of system or code into model-checkers and
provers to show certain properties always hold (the good) or never show up
(the bad). Used successfully with high-assurance safety and security critical
systems with results ranging from "somewhat beneficial" to "caught stuff we'd
never see or test for." Back then it was unclear how applicable it was. Recent
work by Chlipala, Leroy, et al show near perfect results in practice when
specs/proofs are right and much wider application than before. Lots of tooling
and prior examples means this is a proven way of getting extra quality _where
high-stakes are worth the cost_ and where core functionality doesn't change
often.. The CompCert C compiler, Eiffel's SCOOP concurrency scheme, and Navy
team's EAL7 IPsec VPN are good examples.

8\. Static analysis, aka "lightweight formal methods," were devised to deal
with specialized skills and labor of above. Getting to the point, tools like
Astree Analyzer or SPARK Ada can prove absence of common flaws with little to
no false positives without need for mathematicians in the company. Just a half
dozen of these tools by themselves found tons of vulnerabilities in real-world
software that passed human review and testing. Enough said, eh?

9\. Software that succeeded with testing often failed when random stuff came
at it, especially malware. This led to various fault-injection methods like
fuzz testing to simulate that and find breaking points. The huge number of
defects, esp in file formats & protocol engines, found via this method argues
for its effectiveness in improving quality. It ties in with stuff above in
that well-written code that validates input at interface and preserves
invariants throughout execution should simply disregard (or report) such
erroneous input.

10\. Interface errors themselves posed something like 80+% of problems. This
was noted as far back as the 60's in Apollo project when Margaret Hamilton
invented software engineering, fault-tolerance, and specification techniques
to fight it. Dijkstra and Hoare pushed for pre- and post-conditions plus
specific invariants to document the assumptions of code during procedure
calls. Modern version is called Design by Contract in Eiffel, Ada, and
numerous other languages (even asserts in C). Many deployments and tests
showed such interface checks caught many issues, esp assumption violations
when new code extended or modified legacy.

11\. Concurrency issues caused all kinds of problems. Techniques were devised
by Hansen (Concurrent Pascal) and later Meyer et al (SCOOP) to mostly immunize
against them at language level with acceptable performance. Languages without
that, especially Java, later got brilliant tooling that could reliably find
race conditions, deadlocks, or livelocks. Use of any method inevitably found
problems in production code that had escaped detection. So, using prior,
proven methods to immunize against or detect common errors in concurrency is A
Good Thing. Note that shared-nothing, event-driven architectures also emerged
but I have less data on them outside that some (NonStop, Erlang) worked
extremely well.

The above are just a few things that computer science established with
supporting evidence from real-world projects so long ago that Windows didn't
exist. Anyone applying these lessons got benefits in terms of code quality,
security, and maintainability. The rare few applying most or all of them,
mainly high assurance community, got results along lines of space shuttle
control code with extremely, low defects or zero in production. So, given the
past and present _results_ of these methods every time they're put to the
test, I'm irritated every time another person talks like there's no good
science to quality software. I just listed a bunch of it, it's been tested in
production as scientific method requires thousands of time, tweaked probably
hundreds, and core approaches remained even if tactics got modified.

Now people can feel free to use and improve on the science. CompSci continues
to in every area I listed with a chunk of proprietary and FOSS developers
using a subset of the techniques. Just need more uptake. Use what's proven.
And do note that there's plenty of examples for specific design and
implementation decisions for common types of functionality. Many things that
were shown to work or not work that could be encoded in libraries, DSL's,
templates, whatever. No excuse except for our field's continual failure to
learn and hand down the lessons from the past.

~~~
struppi
I think we agree more than you think ;) In my original article, I quoted some
papers and studies. And on twitter, somebody occused me that "All science is
crap". So I tried to write another article with only _my_ opinion on the
subject - The article linked here on HN.

~~~
nickpsecurity
Reading it instead of skimming it shows there's indeed plenty of agreement
even without the science part. The minimizing f_n reflects well what the
science established.

Far as defects, problems people report are definitely a start. Fagan Software
Inspection Process had the nice idea of standardizing in a list the kind of
issues in code that would be treated as defects and fixed by priority. They
could be lack of a bounds check, code style issue, whatever team thought was
important. So, my definition for defect in such a scheme was "unacceptable
code."

"Low defect potential." Interesting metric. Often a function of complexity,
coupling, amount of state being thrown around, and handling of interface
assumptions. Scientists should come up with ways of assessing stuff like that
then see what effect it had on defects during changes among many codebases. Of
course, the minimizing f_n stuff should help here.

"When you find a defect in production a week after deploying the code, it is
probably still rather cheap to fix: All the original developers are still
there, they still remember what they did, the documentation is still
accurate."

That's actually a really good point. I don't remember seeing anyone else say
it. I'll have to remember that.

"Well crafted code is self documenting. When I read the code together with
it's tests, I want to be able to understand what is going on. Without reading
some external documentation (wikis, ...)"

Well-crafted code section is overall good but I'm not sure about this. I've
seen code that can work like this. Yet, I've also seen high assurance work
where there were abstract specs of what it does w/ pre and post conditions
that were easier to understand for the _use_ rather than modification of a
module. Could also be fed into analysis tools. Some modern languages include
that stuff in the code itself and can export interface documentation.

That's not even getting into issues like consistently using a filesystem where
all kinds of extra commands are needed for non-obvious reasons that need a
whole article to describe. A 1 line operation suddenly takes more like 10. So,
I'd say make the code self-documenting where possible but there's potentially
good reasons to have the other things. If they're needed or useful.

"Good software design is, in my opinion, less subjective than some other terms
I described above."

I agree. You did alright explaining that.

"It's Not Much Slower Anyway The Microsoft study on TDD"

You're conflating quality and testing here. They're not the same. There's
plenty of low-quality software with lots of testing. Likewise, Cleanroom
(extreme example) normally resulted in high quality from app/code structure
and verification via code revies before testing. Same with Fagan inspections
and OpenBSD's auditing process. All of those had vastly higher quality than
TDD projects far as I've seen. And that's without arguing against testing, as
you know I'm for it.

Maybe renaming anything like that in your articles to specifically focus on
benefit or cost of testing. So, this would be "TDD/Testing is not much slower
anyway." The arguments you made for it are good and actually apply to all
quality techniques. I think you intended that but set it up wrt to testing.
So, maybe it's not the title and just something about that section in
particular. Sounds nitpicking but my reaction is more that we need to always
convey quality as a combo of prevention, code review, and testing. Code review
& testing at the least. Gotta keep putting it into their heads until they
can't forget. ;)

"Let's look back to "Minimizing F_n (For Large n)":"

"Executable specifications (a part of the test suite) make sure everybody
understands the current functionality that is implemented. And that
"documentation" cannot be outdated, because otherwise the tests would fail."

All good advice but I counter that this long-running problem is a management
and SCM/VCS problem. One can implement a policy and rule that any change to
code requires an update of the documentation. Then enforce that. A test suite
is also a good idea but not a requirement for dealing with this problem. Not
even enough as other verification methods (eg DbyC, static analysis) were
needed to catch problems in this area that testing misses.

"At some point, your users will recognize that it takes longer and longer
until they get software that contains the new features they requested. And
they have to pay more and more for it."

A really, good point that few mention or argue satisfactorily. Should be one
of the top points. Need more ways to illustrate this in numbers or visually so
less technical people get it. Maybe just delivery time and cost themselves as
a graph going up over time. ;)

"putting it all together"

Strong, sound conclusions. The science supports it up to a certain quality
point where ROI diminishes. You're advocating a realistic one, though.

"A former colleague once said TDD would not make sense for them "because our
iOS app only consists of a user interface and some server calls, and you
cannot really test those. There is no logic in between. And it's really easy
to test manually." Well, maybe you can get away with it when you have an app
like that."

You can counter that software is rarely that simple. There's assumptions built
in to various component, failure modes to account for (esp network), and
effects of state build-up if it's isn't fully stateful. Hell, tell them a non-
bloated web server is "simple" with an "interface and some server calls." Then
show them the release notes detailing all the bug fixes. Murphy's Law says the
"Simple thing are always hard." Not always, but often enough to justify
verification activities.

Just popped in my head that it also helps when platform (eg iOS) breaks
something with a change and you didn't know about it. Proper tests can isolate
that immediately.

"Also, your actions now will come back and bite you later."

Very important. Most "short-term" jobs end up lasting a long time and even
being reused in some way. Besides, getting in habit of doing it right makes
doing it right more effortless. The delay between short-term and long-term
should get smaller as verification activities become natural. I also like
"unlimited downside" concept haha.

"And in many organizations, you will have a hard time to argue throwing away
"perfectly working code" when the time pressure gets bad..."

Always worth remembering, too.

Overall, when I'm more thorough in the review, you have a good write-up with
only a few things I take issue with. As you predicted. :)

~~~
struppi
Thank you so much for your detailed reply! That's a lot to think about :)

Thanks for the hints about defects. I like "unacceptable code", but it's
probably still a definition not everyone can live with :)

"It's Not Much Slower Anyway The Microsoft study on TDD"

"You're conflating quality and testing here. They're not the same."

You are right. My arguments here are only for TDD, and I actually did not want
to write a TDD article :) I'll think about how to improve that section. Also,
CleanRoom is really interesting, but I don't know very much about it and never
experienced it.

"At some point, your users will recognize that it takes longer and longer..."
"Should be one of the top points."

I was thinking about the order of the points a lot, and no particular order
seemed _really_ right. It now comes at a point where I think I have
established all the preconditions: Speed means lead time, cost depends on
speed, high external quality means faster, high internal quality means faster.
But you're right, the main points I want to bring across come very late in the
article. As for the illustrations: Maybe I'll write another article about how
users care about internal quality - Because I've worked in so many legacy code
projects that I think I have a good view on that :)

I like your points about the iOS stuff and the short-term jobs. I'll think
about them some more. I'm still not sure if I believe the "unlimited downside"
idea 100% (because only very few things are truly unlimited), but it's an
interesting concept.

~~~
nickpsecurity
" Thank you so much for your detailed reply! That's a lot to think about :)"

You've very welcome. :)

"Thanks for the hints about defects. I like "unacceptable code", but it's
probably still a definition not everyone can live with :)"

I kind of came up with it off the top of my head. Point is that the code will
be rejected on relatively arbitrary criteria. It might be known defects,
coding style, what libraries were used... anything. Hard to find a word or
phrase covering all that which people will agree with.

"Also, CleanRoom is really interesting, but I don't know very much about it
and never experienced it."

This presentation...

[http://groups.engin.umd.umich.edu/CIS/course.des/cis376/ppt/...](http://groups.engin.umd.umich.edu/CIS/course.des/cis376/ppt/lec21.ppt)

...summarizes it for you in about 5 min of reading. The first attempts to make
software in a scientific way, engineered software I call it, were done by
Margaret Hamilton (founder of our field) at NASA plus academics like Dijkstra
and Hoare. They all produced nearly defect-free systems in a systematic, often
mathematical way. Harlan Mills took the lessons to industry when developing
Cleanroom: a combo of lightweight formal methods, design constraints,
spec/code verification, incremental development, and usage-driven testing. He
wanted SW products to be as low-defect as HW clean rooms. Result was code at
extremely low defect rate that was consistent enough for statistical
certification. While popular, Cleanroom teams actually _warrantied_ their
software.

Altran/Praxis is another company I know that (a) truly engineers software and
(b) warranties it at specific defect rates. Defect rates are usually better
than the Linux kernel and close to the Space Shuttle's control code. Here's a
case study of their Correct by Construction process applied to high-assurance,
certificate authority:

[http://www.anthonyhall.org/c_by_c_secure_system.pdf](http://www.anthonyhall.org/c_by_c_secure_system.pdf)

So, there's you two methods that already maxed out quality (and/or security)
on real-world projects. Altran charges an acceptable premium for quality but
Cleanroom often cost nothing extra or saved money for same reasons as you
article says. Appears the modern programmers still have lessons to learn in
software quality from the 1980's. Like I tell people, the old wisdom is often
pretty solid even after you fix outdated parts. Our field is just terrible at
passing it down. I try to fix that with posts like this so the smart, young
people get a boost toward whatever great things they create. Also reduce
endless rediscovering of fire and reinventing the wheel. :)

"I was thinking about the order of the points a lot, and no particular order
seemed really right. It now comes at a point where I think I have established
all the preconditions: Speed means lead time, cost depends on speed, high
external quality means faster, high internal quality means faster. "

I'm going to let you keep working on that formula. You've got the right
ingredients. Far as order, there might not be any. Many things in engineering
or business are holistic where the parts all feed into each other. The goal is
an emergent property of that. So, rather than an order, you'd visualize it
like a bubble chart or something that with connections between the components
and labels for the effect they have. Just something that's clear that they're
all connected, all feed into each other, and ignoring one breaks the process
for achieving the goal. Just my dos centimos.

"Maybe I'll write another article about how users care about internal quality
- Because I've worked in so many legacy code projects that I think I have a
good view on that :)"

You should. Matter of fact, you should do a lot more work on figuring out how
to measure and convey the importance of eliminating technical debt. To
laypersons, not geeks. Not enough work here despite its importance. Everyone
is instead making arguments for geeks that already get it cuz they're knee
deep in the crap or picked the job that didn't have it.

Great attempt here:

[http://tech.ticketmaster.com/2015/06/30/what-ticketmaster-
is...](http://tech.ticketmaster.com/2015/06/30/what-ticketmaster-is-doing-
about-technical-debt/)

Tool for metrics here that was interesting. Not tried it, though.

[http://swreflections.blogspot.com/2012/02/technical-debt-
how...](http://swreflections.blogspot.com/2012/02/technical-debt-how-much-is-
it-really.html)

" I'm still not sure if I believe the "unlimited downside" idea 100% (because
only very few things are truly unlimited), but it's an interesting concept."

I just thought it sounded catchy. Not committing to it myself. ;)

