
The top bug predictor is not technical, it's organizational complexity - keyP
https://augustl.com/blog/2019/best_bug_predictor_is_organizational_complexity/
======
ChrisMarshallNY
This is a no-brainer.

As a development manager for a quarter-century, and an active software
developer for a lot longer than that, I can definitely say that every place
there's a "meeting of the minds" is a place for bugs.

In the software itself, the more complex the design, the more of these
"trouble nodes" (what I call them) there are. Each interface, each class, each
data interface, is a bug farm.

That's why I'm a skeptic of a number of modern development practices that
deliberately increase the complexity of software. I won't name them, because I
will then get a bunch of pithy responses.

These practices are often a response to the need to do Big Stuff.

In order to do Big Stuff, you need a Big Team.

In order to work with a Big Team, you need a Big Plan.

That Big Plan needs to have delegation to all the team members; usually by
giving each one a specific domain, and specifying how they will interact with
each other in a Big Integration Plan.

Problem is, you need this "Big" stuff. It's crazy to do without it.

The way that I have found works for me, is to have an aggregate of much more
sequestered small parts, each treated as a separate full product. It's a lot
more work, and takes a lot more time, with a lot more overhead, but it results
in a really high-quality product, and also has a great deal of resiliency and
flexibility.

There is no magic bullet.

Software development is hard.

~~~
jiggawatts
So, just a day ago, I got dragged into a meeting where many people were
involved in a discussion about the new Cloud Enterprise Application
Architecture Template. Or whatever.

It had a 3-tier architecture.

I asked: Why?

And they answered: Why not?

I answered: Because layers must only be introduced if needed. Is there a need?

They answered: The standard design is the need.

I clarified: Is there a _technical requirement?_ Or perhaps an organisation
one, such as disparate teams working on the two components?

They answered: No! Of course not! It's a unified codebase for a single app
written by a single person! But it is not Enterprise enough! It must be split
into layers! And then, you see, it will will match our pattern and belong.

I verified the insanity: Are you saying that this finished, working
application isn't currently split into layers, but you want it split into
layers simply so that it can have layers?

They chorused: Yes.

~~~
hyperman1
The 3 tier architecture was a reaction to VB and RAD-like tools, where things
like data validation and database I/O were coupled directly to the input
component. It was common for these frameworks to not even have an object for
data transfer sitting between UI and DB.

This was the timeframe where more and more manual work was automated. Hence it
was a common situation where input used to be given by a human, but now comes
from another application. The simplest way to do that kind of retrofit was to
drive the UI from the application: The application fills in its own gui fields
which triggers the validation, then simulates a click on OK.

This caused al kinds of ungodly messes. You need a gui for background
processes, reliability was low, etc.. 3 tier architecture were a way to say
'never again' to this style of programming. Forcing people in to it was
necessary.

But that was another time. Mindlessly applying an architecture without
understanding why is of course dumb. But not applying an architecture without
understanding it's pros and contras is just as dumb. It all depends on the
quality of the architects in question.

Not that I want to call you dumb, of course. IT today is different from 20
years ago.

~~~
specialist
n-tiered architectures were born during the transition from "workgroup" to
"client/server". This upheaval caused an industry-wide loss of programming
lore.

It wouldn't have been too bad, except the ODBC interface inadvertently led to
abandoning schema-aware programming models like VB's ADO, Paradox, FoxPro,
etc.

At the same time, object oriented became fashionable.

So we ended up with ORMs, ActiveRecord, and various offshoots.

Mostly because no one remembers life before client/server.

~~~
pjc50
I was always a bit hazy on what these terms actually meant, and what we lost.
It does seem like there was a brief golden age of easy "rapid application
development" tools like FoxPro where you could wire some UI fields to a
database without too much trouble. Now we have people trying to do the same in
the web, badly.

~~~
specialist
I also had no idea what client/server meant (at the time). My current
oversimplified distinction:

Workgroup: I/O thru file system, clients responsible for locking, concurrency,
etc.

Client/Server: I/O thru DB's protocol (eg TDS), server responsible for
locking, concurrency, etc.

\--

As for what was lost, I spent way too long (10+ years) trying to figure that
out, trying to fulfill the desire to recover the ADO (Access Data Objects)
programming paradigm. I think I succeeded, more or less, and am currently
reorienting my life to work on it full-time.

------
habosa
This rings true for anyone who has ever worked at a big tech company (I work
at Google).

At Google when your project begins to scale up you can ask for more money,
more people, or both. Most teams ask for both.

What you can't ask for is _different_ people. You can't solve your distributed
systems problems by adding 5 more mid-level software engineers to your team
who have not worked in the domain. Yet due to how hiring works, this is what's
offered to you unless you want to do the recruiting yourself. Google views all
software engineers as interchangeable at their level. I have seen people being
sent to work on an Android app with hundreds of millions of users despite
never having done mobile development before. That normally goes about as well
as you'd expect.

So you end up with teams of 20 people slowly doing work that could be done
quickly by 5 experts. In some cases all you lose is speed. In other cases this
is fatal. Some things simply cannot be done without the right team.

~~~
natalyarostova
I see the same thing, and the only way I can reconcile this is that the
benefit to sr leadership in terms of treating SDEs as fungible is so massive
that it is still worth the massive productivity loss from assuming
exchangability.

~~~
amznthrowaway5
What are the benefits to treating SDEs as fungible?

At Amazon, Sr. Leadership and HR love to pretend all SDEs at a given level are
interchangeable, level actually indicates competence, and leetcoding external
hires with zero domain knowledge have far more worth than internal promos. All
of the above assumptions seem completely insane to me and have resulted in the
destruction of many projects.

~~~
natalyarostova
Yeah me too. Also at amazon. And yet amazon is obscenely successful, as are
other big tech companies which take similar strategies. It seems that matching
expertise to projects is so fucking hard that just giving up at the start and
accepting it’s impossible is the optimal strategy.

Honestly I don’t know. I agree it’s weird. But these companies keep succeeding
doing it this way, so I’m not sure what to make of it.

~~~
amznthrowaway5
> But these companies keep succeeding doing it this way, so I’m not sure what
> to make of it.

That doesn't necessarily mean anything. The fact that a system might be
working, doesn't mean it's anywhere near optimal. I think these companies are
successful in spite of these types of policies, not because of them.

------
MontyCarloHall
Looking at the metrics used in the publication[0], it seems most of them focus
on the absolute number of engineers working on a given component. This makes
sense — more engineers touching a component introduces more opportunities for
bugs. (Edit: as other commenters have pointed out, total lines of code, highly
correlated to number of engineers, is likely the best first-order predictor of
bugginess.)

I bet we can improve predictive power by considering the degree of
_overengineering_ , i.e., the number of engineers working on a task (edit: or
lines of code) relative to the complexity of the task they’re working on. 100
people working on a task that could be accomplished by a single person will
result in a much buggier product than 100 people working on a task that
actually requires 100 people. The complexity of code expands to fill available
engineering capacity, regardless of how simple the underlying task is; put 100
people to work on FizzBuzz and you’ll get a FizzBuzz with the complexity of a
100 person project[1]. Unnecessary complexity results in buggier code than
necessary complexity because unnecessary components have inherently unclear
roles.

Edit: substitute "100 people" with "10 million lines of code" and "1 person"
with "1000 lines of code" and my statement should still hold true.

[0] [https://www.microsoft.com/en-us/research/wp-
content/uploads/...](https://www.microsoft.com/en-us/research/wp-
content/uploads/2016/02/tr-2008-11.pdf)

[1]
[https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...](https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpriseEdition)

~~~
ezzzzz
My own (current) personal hell is essentially the inverse of this assumption.
I work in an area that was essentially ran by 1 extremely overworked
developer. The result? Now there are 40 people (including management, PMs,
Scrumlords, QA and devs) doing the work originally done by a single person.
The tech-debt and cruft is unbelievable (considering the domain is also quite
complex). Every decision made before the 40 people were hired was made just to
check off 1 of 1000 things on this person's plate... I could probably write a
book about this if we are ever able to turn things around, which requires
explaining to management why the things that have been 'working' for over a
decade are longer working.

The sad part is, it would seem like all the engineers we have are overkill,
but in my little silo, we could easily split our work into even more sub-
teams, hire 12 more people, and still keep churning just to stay afloat. Sorry
for the rant, I'm not sure exactly what I'm driving at. I guess I'm just
trying to give a cautionary example of how not to manage large-scale software
projects.

~~~
spookthesunset
Often times, I've found situations that you describe (ones where you need to
throw infinite developers at to fix) to be a smell that what your "product"
does isn't something that isn't line with how your business delivers value.
Such things are almost always candidates to be replaced by third party
software of some kind.

Maybe I'm wrong though. When you are charting new ground, building new shit
that has never been build before--which is what your product teams _should_ be
doing--you don't have years long backlogs because you can't see that far out.
Good, productive feature work is iterative.

If you can see with a high degree of clarity what you will be working on 5
years from now, it probably means it's been done before and you are better off
cutting a check for it.

Hopefully this makes sense :-)

~~~
ezzzzz
I agree. I've even suggested to my manager that we'd be better off utilizing a
3rd party. Problem is, this is a corporate gig. We don't have a 'product',
we're just a cost center for the business. Even if we outsourced to a 3rd
party, that would likely take years to coordinate.

------
he0001
Isn’t this aligned with Conway’s law? I mean, a complex business model
requires, most likely or eventually, a complex solution? If not, the two
systems are at odds, and the computer system is even more complex/buggy since
it doesn’t follow the organization's complexity, it doesn’t do what the
organization need it to do.

That at least is my experience anyway.

~~~
tekmaven
I was surprised that Conway's law was not mentioned.

~~~
MontyCarloHall
In the original publication that’s the subject of the article, it was:
[https://www.microsoft.com/en-us/research/wp-
content/uploads/...](https://www.microsoft.com/en-us/research/wp-
content/uploads/2016/02/tr-2008-11.pdf)

------
thenewnewguy
Just skimmed over the post, so it's possible they pointed this out and I
didn't notice - but I think this is misleading. The title makes it sound like
organizational complexity _causes_ bugs, but in reality I think both are
simply effects of a more underlying cause.

Larger and more complicated software both requires a bigger team (therefore
more organizational complexity) and is more likely to contain bugs.

~~~
kitd
Steve McConnell identified it as the number of lines of communication in the
team or dept creating the module, incl dependents and dependers.

It's why Conway's Law exists, and points towards the importance of well-
designed and -specified APIs.

~~~
daveslash
If you consider people on a team as nodes in a graph, and lines of
communication as edges, then a team of _n_ people has potentially _(n(n-1))
/2_ potential lines of communication. I try to express to people that the more
potential lines of communication you have, the greater the chance of
miscommunication. I think this is also called out in Brooks' _The Mythical Man
Month_.

~~~
kitd
Indeed. It's even a law named after him:

[https://en.wikipedia.org/wiki/Brooks%27s_law](https://en.wikipedia.org/wiki/Brooks%27s_law)

------
Merrill
>Organizational Complexity. Measures number of developers workong on the
module, number of ex-developers that used to work on the module but no longer
does, how big a fraction of the organization as a whole that works or has
worked on the module, the distance in the organization between the developer
and the decision maker, etc.

After one of the early big software project failures (maybe Multics?) there
was a quote about software projects going around (maybe John R Pierce?) that
"If it can't be done by two people in six months, it can't be done."

One of the functions of good software design is to break the system down into
pieces that a couple of people can complete in a reasonable length of time.

------
hos234
Herbert Simon is who you read first when you thinking about orgs -
[https://en.m.wikipedia.org/wiki/Satisficing](https://en.m.wikipedia.org/wiki/Satisficing)

That will take to you to healthy and productive places.

------
wycy
The article examines this through the lens of Windows Vista. Am I the only
person who actually did like Vista and didn't have any problems with it? I
gathered that most of the issues people had with it was caused by third party
incompatible software and hardware.

~~~
kryptiskt
I also had little trouble with Vista and liked it well enough. But I had
plenty of RAM, I believe it performed badly with 1 GB or less. And of course
people got hung up on UAC.

~~~
swiley
That’s kind of crazy. Firefox and libreoffice don’t do well in 1 Gig of ram
but at least they’re _doing something._ You can run a decent DE and few decent
apps in 1gig just fine with most Linux distros.

~~~
barrkel
And Microsoft Word ran perfectly well on 16MB of RAM in Windows 95, back in
the day; and even better on NT with 32 or 64MB, productive and more than
comfy.

I guess my point is that running decent software on what today would be
considered very little hardware is a solved problem, but it's not what the
economy is optimized for.

~~~
swiley
The difference here is that LibreOffice is still maintained (in fact, the math
typesetting in Microsoft office is practically unmaintained at this point and
is just miserable to use.) Old windows (and old Linux) have serious problems
that modern OSes don’t have. You can run _modern_ _good_ and _compatible_
software in very little ram and the only reason not to is because someone else
is forcing you or you’re just not aware.

------
zubairq
I guess this is why startups exist because it it almost impossible for larger
firms to execute good ideas even if they think of the idea themselves

~~~
moretai
What is the underlying reason goliaths don't execute ideas well? Too many
chefs in the kitchen?

~~~
gfs78
Lots of freeloaders and incompetent power trippers in the middle layers and
above. Small orgs. and startups cannot afford this kind of workers.

In my current project (big co.) we have a technical PM, a non-technical PM, a
non programmer dev lead, an scrum master and a lead business analyst, all
involved in managing the work of a team of 2 and a half (a sr ba/qa guy, a
part-time ssr dev and me). Waste work is probably in the 90%.

~~~
spookthesunset
> Lots of freeloaders and incompetent power trippers in the middle layers and
> above.

Not just middle layers and up. Freeloaders and gatekeepers everywhere.

------
neilobremski
I think all current and ex Microsoftees can agree (and probably other workers
in Big Tech Corp) that this is not only obvious but ongoing and dastardly
resilient to getting solved! At some level this must be a sociological thing
because humans seem to be hardwired into repeating this mistake.

This happens at smaller companies in smaller ways but the effect is the same.

It's worse than the "Mythical Man Month" in that production is not simply
slowed down but it is slowly made rotten until it gets burned, buried, or
passed off to out-sourced maintenance.

------
kabes
The article ends with a mention of the book 'Accellerate'. Accellerate is your
typical management book, based on some surveys with bad statistics and worse
conclusions. Weird he mentions this book after talking about what a proper,
replicated study Microsoft Research did.

------
ChrisSD
Previous discussion (which got flagged):

[https://news.ycombinator.com/item?id=21795462](https://news.ycombinator.com/item?id=21795462)

~~~
tyingq
I can't quickly tell why it was flagged. Was the study highly flawed? Or
promoted in a dishonest way? Or was it really the comments as a whole that got
it flagged?

~~~
augustl
Probably because the title was originally extremely linkbait-y.

------
mirekrusin
...which is likely proxy for complexity of the task - the more difficult the
task, the more likely it'll have bugs - not an exciting revelation.

Their "code complexity" means nothing. If you compare simple "todo web app"
it'll have orders of magnitude more "code complexity" than, for example,
sha256 hash implementation.

~~~
TheCoelacanth
And rightly so. It is much easier to write a bug-free sha256 implementation
than try to write a bug-free To-Do app.

------
sorokod
Conway's law strikes again.

------
nelsonic
Does this mean that a solo developer working on a SaaS product can avoid bugs?

~~~
_ZeD_
surely this solo developer will avoid bugs regarding "communication
misunderstanding" between the team elements.

~~~
AnimalMuppet
I'm not totally sure about that. "Me, yesterday" has to communicate with "me,
today", and communication misunderstandings can absolutely happen.

Now, sure, such things happen _less_ when it's me talking to me than when it's
me talking to you. They still happen, though.

------
johnwatson11218
The use of the term 'p-value' seems off, it looks like he is referring to a
different concept but using an overloaded term.

------
kozak
I have a Microsoft keyboard that has a dedicated Calculator button. In older
versions of Windows, I used to press that button and start entering numbers
right away. But in newer updates of Windows 10, I now have to press the
Calculator button, WAIT FOR THE CALCULATOR TO LOAD, and then start typing my
digits. I think this is ridiculous.

~~~
chapium
I don't share this experience. I think you have an underlying issue.

~~~
kozak
This started happening when default calculator changed from a classic native
app to the new "modern" app.

------
chiefalchemist
Code/technology is nothing more than a tool. A toolbox is only as good as the
person/peoples who pick it up. That is a great tool will not save a
disorganized organization. The great tool will not make a crap product better.

Blaming the means for the ends is a classic n00b mistake. A mistake that's
being made over and over and over again.

------
amelius
From the article:

> In the replicated study the predictive value of organizational structure is
> not as high. Out of 4 measured models, it gets the 2nd highest precision and
> the 3rd highest recall.

