Hacker News new | past | comments | ask | show | jobs | submit login
The top bug predictor is not technical, it's organizational complexity (augustl.com)
388 points by keyP on Dec 18, 2019 | hide | past | favorite | 179 comments

This is a no-brainer.

As a development manager for a quarter-century, and an active software developer for a lot longer than that, I can definitely say that every place there's a "meeting of the minds" is a place for bugs.

In the software itself, the more complex the design, the more of these "trouble nodes" (what I call them) there are. Each interface, each class, each data interface, is a bug farm.

That's why I'm a skeptic of a number of modern development practices that deliberately increase the complexity of software. I won't name them, because I will then get a bunch of pithy responses.

These practices are often a response to the need to do Big Stuff.

In order to do Big Stuff, you need a Big Team.

In order to work with a Big Team, you need a Big Plan.

That Big Plan needs to have delegation to all the team members; usually by giving each one a specific domain, and specifying how they will interact with each other in a Big Integration Plan.

Problem is, you need this "Big" stuff. It's crazy to do without it.

The way that I have found works for me, is to have an aggregate of much more sequestered small parts, each treated as a separate full product. It's a lot more work, and takes a lot more time, with a lot more overhead, but it results in a really high-quality product, and also has a great deal of resiliency and flexibility.

There is no magic bullet.

Software development is hard.

So, just a day ago, I got dragged into a meeting where many people were involved in a discussion about the new Cloud Enterprise Application Architecture Template. Or whatever.

It had a 3-tier architecture.

I asked: Why?

And they answered: Why not?

I answered: Because layers must only be introduced if needed. Is there a need?

They answered: The standard design is the need.

I clarified: Is there a technical requirement? Or perhaps an organisation one, such as disparate teams working on the two components?

They answered: No! Of course not! It's a unified codebase for a single app written by a single person! But it is not Enterprise enough! It must be split into layers! And then, you see, it will will match our pattern and belong.

I verified the insanity: Are you saying that this finished, working application isn't currently split into layers, but you want it split into layers simply so that it can have layers?

They chorused: Yes.

The 3 tier architecture was a reaction to VB and RAD-like tools, where things like data validation and database I/O were coupled directly to the input component. It was common for these frameworks to not even have an object for data transfer sitting between UI and DB.

This was the timeframe where more and more manual work was automated. Hence it was a common situation where input used to be given by a human, but now comes from another application. The simplest way to do that kind of retrofit was to drive the UI from the application: The application fills in its own gui fields which triggers the validation, then simulates a click on OK.

This caused al kinds of ungodly messes. You need a gui for background processes, reliability was low, etc.. 3 tier architecture were a way to say 'never again' to this style of programming. Forcing people in to it was necessary.

But that was another time. Mindlessly applying an architecture without understanding why is of course dumb. But not applying an architecture without understanding it's pros and contras is just as dumb. It all depends on the quality of the architects in question.

Not that I want to call you dumb, of course. IT today is different from 20 years ago.

n-tiered architectures were born during the transition from "workgroup" to "client/server". This upheaval caused an industry-wide loss of programming lore.

It wouldn't have been too bad, except the ODBC interface inadvertently led to abandoning schema-aware programming models like VB's ADO, Paradox, FoxPro, etc.

At the same time, object oriented became fashionable.

So we ended up with ORMs, ActiveRecord, and various offshoots.

Mostly because no one remembers life before client/server.

I was always a bit hazy on what these terms actually meant, and what we lost. It does seem like there was a brief golden age of easy "rapid application development" tools like FoxPro where you could wire some UI fields to a database without too much trouble. Now we have people trying to do the same in the web, badly.

I also had no idea what client/server meant (at the time). My current oversimplified distinction:

Workgroup: I/O thru file system, clients responsible for locking, concurrency, etc.

Client/Server: I/O thru DB's protocol (eg TDS), server responsible for locking, concurrency, etc.


As for what was lost, I spent way too long (10+ years) trying to figure that out, trying to fulfill the desire to recover the ADO (Access Data Objects) programming paradigm. I think I succeeded, more or less, and am currently reorienting my life to work on it full-time.

I used to loathe FoxPro, now I actually miss it, it was simple enough to quickly get the job done. I have always thought in the 2000s that something better, more powerful and even easier to use would come. And looking back I felt a lot of times technologies general we reinvent the wheel and nothing better actually came out of it.

FileMaker may be an alternative, but I never understood why it hasn't caught on.

And I am not even sure if there are any product similar that is good on the Web.

I assume that's the windows 3.11 era? It was a simpler time, with small non-hostile networks, less critical software, and more humans in the loop. While 'workgroup' mostly means PC/DOS to me, the mainframes of the day had a comparable story unfolding.

Funny thing, I remember lot of similar apps in enterprise businesses until the late 2000s. That only started to change when web apps and cloud-hosted software became the norm.

Can you talk more about workgroup?

Not the GP, but I used to work in desktop development for enterprise businesses.

Most multi-user apps were two tiered: client and database. The way they worked was by connecting directly to the database and/or a shared filesystem in the local network. All validation happened client-side.

Database credentials and fine-grained file/directory permissions were the only security measures. That and the NAT. ;)

There's still software in the market that uses this model, like some niche ERPs from the 90s and software catered to small/mid businesses.

> Database credentials and fine-grained file/directory permissions were the only security measures. That and the NAT. ;)

It baffles me that the CCNA materials is still based around these use cases.

> The simplest way to do that kind of retrofit was to drive the UI from the application: The application fills in its own gui fields which triggers the validation, then simulates a click on OK.

> Not that I want to call you dumb, of course. IT today is different from 20 years ago.

The meteoric rise of RPA within IT shops is recreating the same situation. The more things change, the more they stay the same.

I think RPA is the merging functional and technical, which is what all the debate is revolving around... Change can't happen without change

What is RPA?

This link should resist time rot: https://www.google.com/search?q=top+rpa+vendors

"Robotic process automation" was indeed what came up in my web search for the term, but it seemed an unlikely fit for generic distributed app development. Why is robotic process automation sweeping "IT shops"?

Because rather than upgrading software and building APIs that integrate at a low level, IT shops are building software and APIs on top of UI automations. Some new fancy service desk platform the suits want may not have a connector to your 15 year old heavily-customized SAP instance (which is run by another team you have no influence with), and budget / time constraints leave RPA as the only option.

It's pretty easy to see why this would cause problems, but the consulting companies have been pushing hard on RPA because when it blows up in 5 years, who are you going to call? I say this as a consultant who has to sell this awful crap because "partnerships".

Thanks for the context. Seems like the latest incarnation of scripting 3270/mainframe terminals, telnet/expect sessions or web/selenium/headless browsers.

Copy&paste is the enterprise API/data integrator of last resort. Image/video is another integration point. iOS can screen capture full-page images of web pages, with tools for human annotation. Soon the local ML/bionic processor and AR toolkit can perform text recognition on those images, which means they can be live edited, re-composed and fed into another system.

> fancy service desk platform the suits want may not have a connector to your 15 year old heavily-customized SAP instance (which is run by another team you have no influence with)

This intersects with DRM and the title of the OP story. When OrgA and OrgB fail to partner/cooperate (e.g. no formal integration) or are actively hostile (implement DRM to prevent data movement between OrgA and OrgB products), it creates pain for customers and new business opportunity for OrgC and OrgD.

Which is why scraping and reverse engineering are never going away, they are society's last line of defense against vendor org dysfunction.

Oh it's so much worse than that. Swivel-chair / copy-paste integrations are often a better solution than RPA in today's world.

It was one thing scripting mainframe terminals, but the equivalent today are SaaS apps. The major enterprise vendors like Salesforce are pretty good about roadmaps and release schedules, but a lot of the smaller ones work on more of a continuous deployment model. This means your RPA integrations are constantly breaking, and suddenly you have to hire a whole bunch of RPA analysts to deal with fixing them. Or you can just hire a few more data entry people to do it manually.

I ... I feel bad saying this, but I can totally see a use for RPA.

One, and more often than not, applications embed a ton of business logic in the client which is not easily available in the API.

Two, and honestly I feel dirty writing this, the UI is usually a lot better tested. Or tested at all.

I'm just the messenger, here. We were very adamant about NOT doing this in my previous company or our current projects, but I can totally understand the "it's good enough for humans, it's good enough for machines" attitude. It makes me sad though.

Is MVC a good thing?

You're lucky you got some form of discussion out of the people you work with. Whenever I try to steer conversation towards "technical accountability" where I ask what I think are legitimate questions concerning application design and architecture, the mood is as if no-one else believes it's of any importance and I am that weird guy putting spokes in the wheels of a great project. My arguments aren't refuted, they're conveniently ignored. People are honestly dumbfounded and prefer to change subject at best, at worst they lash out with what seems to be simplistic thinly-veiled "culture" attacks -- "oh, please, not with that abstraction/encapsulation argument again, we've got work to do"-sort of utterances.

> I answered: Because layers must only be introduced if needed. Is there a need?

It might have gone better if you had also stated why that is the case, e.g. "every additional layer exponentially increases the likelihood of bugs being introduced, so their introduction must be worth that risk, or the higher cost of mitigating measures".

Of course, the challenge will still be that "likelihood of bugs" is rather abstract, and often people believe they can be prevented just by paying more attention, and assume that that will happen of its own accord.

I went through this, more or less, several years ago. But now I see the uselessly layered app being almost effortlessly retrofitted with new top and bottom layers, keeping only the original middle layer.

If they hadn't layered it, the new retrofitting wouldn't even be possible and the company wouldn't have that contract.

Was it insanity?

Then you do the layering when you need to. One of my biggest gripes about many engineers is their desire to fix technical debt for no business purpose.

In your example, the team tasked with retrofitting would have introduced the layers when the need to make changes arose.

What actually happens in a “business purpose focused” company: You tell the boss you need to double the estimate they gave to the customer, so that you can pay the tech debt and add those layers we need. Boss refuses, we need that change now so get it done without fixing the debt, why is the team so incompetent that they didn’t add layers to start with, why didn’t you just fix the tech debt on your own time in quiet periods, we can’t have a ticket for that since it doesn’t add business value, blah blah...


That's why there's no final answer to "layer: yes or no?". It depends. And as a developer it's your job to guess which decision is more likely to save you time (= money) in the future. There's no way to know the right answer for sure because it depends on further (business) decisions that aren't made yet.

What you're describing sounds to me like poor departmental relationships and the typical challenges of lots of people with lots of goals working together.

I sure hope all of y'all's companies are 'business purpose focused' or you're gonna be looking for a new job pretty soon.

are you claiming this same boss would have given you the time to do it at some other point, thereby delaying something else?

Ironically yeah ime - when doing it upfront. These places sometimes expect so much business value from Cool New App that they let it spiral out of the agreed timespan and you get lots of time to ‘add layers’ (i.e. plan for the future or overcomplicate, however you see it). Once Cool New Thing is live and expectations have cooled, they just want to make some ‘small’ changes, nobody wants to let you restructure the architecture.

That's a tall order in a 100k LOC app, for a small team with budget constraints. I'm pretty sure they couldn't do it your way, whereas the prelayered way set them up for a success.

Prelayered raises initial complexity while lowering eventual complexity. It also protects the core layers where the IP is (presumably) from regressions during retrofit.

I did a project like that at my old job. It was a single screen with a single purpose (login the user via oauth SSO, then collect a couple of pieces of input and submit them to a backend webservice), so it was assigned to me rather than creating a project team. To make it match the rest of our application architecture I created a REST layer in C# that communicated via SOAP to an ESB that forwarded the SOAP request to a Java webservice that finally forwarded it to the backend system. I did depart from our organizational norm by doing the frontend with plain HTML+JS (plus Boostrap, IIRC) rather than AngularJS.

Yes, it was complicated, but I think there is a benefit: it's very clear where certain functionality should live. The C# REST layer was application-facing, so it took care of SSO and basic validation. The Java webservice contained the business logic to validate things from an broader enterprise perspective. The ESB was a piece of trash that did provide authnz so the Java webservice didn't have to.

Was it worth the complexity? Probably not, in this case. But those sorts of applications tend to have long lifespans and evolving requirements, so the standardization can be helpful.

> Yes, it was complicated, but I think there is a benefit: it's very clear where certain functionality should live

Off-topic, but your post reminds me of this Michael Feathers article [1], where he argues that while programming languages have tools to support encapsulation, the don't really work, but the thing about microservices (or layers in this case) is that they actually force us to actually encapsulate our code.

[1] https://michaelfeathers.silvrback.com/microservices-and-the-...

That's a really good article—thanks for sharing it. I totally agree that forced encapsulation (ideally with tightly-governed contract-first APIs) is the most enticing selling point of microservices.

I would say that this could very easily be justified as standardizing and de-skilling development and production operations. That's a very good business reason to standardize architecture patterns, because it's cheaper to use a pattern that is sometimes overkill than it is to have a bespoke architecture for every app.

I actually like DDD, where code is executed per domain with a bounded context.

Not for a single person though, but it will force kin developers to think more deeply.

Otherwise it could easily result in a code-mesh/hell.

I’m trying an experiment in my new role where I deliberately avoid these types of projects.

They usually don’t deliver on time and are too stressful to work with. It’s not worth it both personally and from a career perspective.

However if they manage to deliver a complex design on time, I’ll have lost a great career opportunity. It’s a gamble either way, but high complexity both in organization and design, usually yields a high failure rate on just about every metric I can think of except the metric of “I’m going to use this complex design to get another job and bail myself out of the dumpster fire I’m creating before I have to deliver anything of value”.

What is your avoidance mechanism if you don't mind me asking?

Figure how to support it without being involved in actual deliverables. Or transfer to a new team/project. Or both.

I’ve had those “well, your project is fucked” meetings. The ones where all the leads are architecture astronauts inventing requirements as they go. “We have to have the ability to pass the original authentication the entire way down the stack including the rabbitmq instance too.




... astronaut proceeds to fill two whiteboards with ungrounded nonsense.

All you can go is smirk and just wait until the whole project gets mysteriously canceled because no mortal could ever implement it successfully.

Honestly as long as I don’t need to directly depend on them, these are some of my more favorite meetings. So surreal. That and when the team infights the entire time.

The most impressive architecture I've ever seen was from these two consultants from a big-4 consultancy that had been sitting at a desk at the customer site for literally years. They had lunch together, never talked to anyone that I could see, and I didn't even know what they did for the first couple of years.

I walked past their desk one evening after hours and noticed a printed A3 page of their high-level design diagram. It was... spectacular.

It had over a hundred tiny icons, each representing various systems. Triangles for directory systems, cylinders for databases, and little arrows connecting these things.

You have to picture a spider web of connections between dozens of each type of system.

The notion that this could be implemented was absurd beyond all comprehension. Just one of the tiny little arrows was connecting SAP to a custom system vaguely similar to Siebel. This arrow represented on the order of thousands of tables and API endpoints that need to be hooked up. Another arrow connected an Active Directory with a million accounts to an Oracle directory of the same scale. Another arrow represented synchronisation between a cloud-hosted payroll service to an on-premises equivalent product.

Half the systems didn't exist. Three quarters of the connections didn't exist. Most would have to be written as bespoke code. Some of the arrows would in turn require load balancers, distributed systems, and change tracking databases of their own. We're talking thousands of man years of effort to implement this thing.

It was audacious in the breadth of its scope to the point of going past insane into the brilliant daring art that's only possible if you can appreciate it at the right level of understanding.

That understanding was that these two brilliant people had been collecting something like $4K/day each for years and produced something that dares your to call them on their bluff. But nobody dared say that the emperor has no clothes. They pulled it off.

I was truly impressed.

The nice thing about those kinds of bozos is it helps you filter out other adjacent bozos as well. Anybody who thinks said bozos are on the right track can immediately go into your "yeah, they don't know what they are talking about" bucket.

That being said, I took a picture of the two whiteboards after this dude filled it up with their god-like wisdom. It was truly a marvel of ungrounded architecture.

The number of code repositories I've seen where the database layer was (leakily) abstracted away "in case we ever need to migrate" is WAY greater than the number of code bases where there was actually a need to migrate.

Yet they all still introduced the headaches of having to update the abstraction layer whenever you wanted to make schema changes

What would you do instead?

Are these three tiers respectively called "model", "view", and "controller", by chance?

Decoupling and modular design is pointless?

In this case, yes. There is such a thing as pointless busywork that has no tangible benefit to the business, user, or the future codebase maintainer.

What do you propose instead?

> That's why I'm a skeptic of a number of modern development practices that deliberately increase the complexity of software

Maybe I'm nitpicking, but the article points out the number #1 predictor of software bugs is not the complexity of software but of the organization itself. A single person can make an hugely complex piece of software, and a relatively large team can make a conceptually simple system.

As for software complexity itself, there's an interesting research result that the thing that matters the most is line count. Not cyclomatic complexity, not the type system, not modularity or test coverage, not the programming language -- looking at the line count alone trumped all the other metrics in predicting flaws. (I can't look for this paper now, but I'm sure with a bit of googling anyone can).

A single person can make an hugely complex piece of software, and a relatively large team can make a conceptually simple system.

In practice I think you tend to hit Conway’s law -- organizations build software that mirrors their own organizational structure. So it’s hard for large teams to make simple designs.

I’m very skeptical about that line count metric; in my observation bugs tend to sit between modules due to bad interface design. But I could certainly believe that bad modularisation is correlated with line count (in the form of excessive boilerplate).

You can be skeptical, but this has been tested empirically. Defect density per lines is relatively constant (this is a well-known empirical result, which you can also google), which means more lines predict more bugs. When researchers contrasted line count vs bad modularization, line count was the better predictor (precision & recall).

Maybe it's anti-intuitive, but it's what reality shows :)

Maybe it's something as banal as there being a high enough chance of mistakes per LOC that this factor ends up dominating the other factors.

Maybe. The key takeaway of this for me is: if you want to introduce a bug-predicting metric, you must do better than simple LOC count. If you can't do better, then ditch your metric :)

>if you want to introduce a bug-predicting metric, you must do better than simple LOC count

Agreed. As I pointed out in another post[0], a good candidate for this metric is the amount of unnecessary code, which is often a proxy for unnecessarily large/complex teams.

[0] https://news.ycombinator.com/item?id=21824110

There's also just reading/remembering costs per LOC that make it harder to track specifics about APIs and your own code.

Smaller code fits in your head better, and stays more predictable. You don't have as many weird if-conditions to remember.

You don't have as many weird if-conditions to remember.

More LOC doesn’t necessarily mean more if-conditions, though! For code written in different styles.

It’s a strong statement that LOC and LOC alone is the best known predictor of bug rate, but that seems to be what several people in this thread are saying.

For example, I think most people (though not all) would agree that very dense code full of tricky one-liners (think Perl Golf) is more likely to be buggy and hard to maintain. But if so, “number of control structures used” or some such ought to be a better metric than plain LOC (I assume we’re using “LOC” literally here, not as a shorthand for something else).

Maybe there’s some second-order effect going on? Like very dense code discouraging modifications, so it gets less maintenance, and therefore accrues fewer bugs over time?

This is an interesting topic! Any research links appreciated.

You can be skeptical, but this has been tested empirically.

Yeah, I wasn’t claiming to be right, more noting my reaction! I’ll look up the research -- any links appreciated.

> in my observation bugs tend to sit between modules due to bad interface design

Yep, and when you start ripping the monolith apart into separate version controlled projects and deployment pipelines without addressing the interface issues you've significantly increased the complexity of your work products.

Keep in mind that size doesn't necessarily equal complex. Where I have seen Conways law most clearly is with dysfunctional teams regardless of size.

I was on a team once where one of the senior people was such a jerk no one wanted to work with him. This led to him and the team carving out a piece of the system that he alone worked on and interfaced with the rest of the system through a single queue. This was certainly not the best design and added all sorts of unnecessary complexity.

Another team I was on had a person who was a good programmer and tended to blow off design meetings. The organization never rarely reprimanded this person. In turn it led to various APIs being built that were close, but always not quite right.

Big company examples abound. Contrast and Apple keynote to a Google one. Sometimes I wonder if the people presenting at the Google one even work at the same company.

That makes sense. Every line is a potential bug.

But it's not the only factor, and, quite frequently, it's a matter of correlation, as opposed to causation.

That kind of thing can be very tricky to determine.

When I write software, my first stab at a function tends to be a fairly linear, high-LoC solution, which I then refactor in stages; reducing LoC each time, and ensuring that the quality level remains consistent, or improves.

As far as quality goes, my first, naive stab, was just fine, and I have actually introduced bugs during my refactoring reduction.

Note that I'm not arguing about underlying reasons, just saying what the empirical results show. That's reality.

So if you try to predict software bugs using modularization (or lack thereof) or "if-then-else" branches, or whatever complexity metric you can think of, you'll get one result. If you try to predict them using simple line count, you'll get another result. The second one will have better precision & recall. So no metric so far has been shown to be better than simply counting lines. That's not an obvious result, but it's the truth.

Sadly, you'll have to believe me because I cannot find the studies right now.

Any results/findings for development speed or new hire on-boarding?

> Every line is a potential bug.

But there are many code lines that don't contain bugs. If only one could somehow make software only from those...

I'd take a slightly more mathematical approach to the code lines predictor: zero lines of code contain zero bugs, an infinite amount of lines of code contains an infinite number of bugs. It follows immediately that yeah, LOC is massively important and no, we are not interested in that, what we do want to know is how everything else is influencing the derivative.

(and on that digression about longish linear solutions: I completely stopped feeling bad about writing long, linear functions for long, linear problems. I've come to greatly prefer those over the indirections of forced subdivision. If there's a reason to subdivide other than "long is bad", great, go for it, but never subdivide just for having smaller parts. Use assign-once, nested scope etc to make the length more palatable)

Pretty sure when the tests are green you're just meant to reach for the next sticky note ;)

Do you know if the paper controlled for all the variables in tandem. For example, I wouldn't think line count is necessarily mutually exclusive to modularization. That is to say, I'd expect well modularized code to reduce line count generally.

I found a video about the paper: the presenter claims the author of the paper (not the same person) did a double-barrelled correlation and that no metric had better predicting value than "simply doing wc -l on the source code".

See minute 39:30: https://vimeo.com/9270320

Thanks for finding that.

You're welcome. I've watched the whole video and it's pretty interesting. Not being Canadian or Australian I missed most of the jokes though :P

I think it did, that's the thing. Unfortunately since I cannot find the paper now, I can't tell you :(

> In order to do Big Stuff, you need a Big Team.

Depends on your definition of Big Stuff. If you mean send a rocket to Mars, then yes. But the vast majority of us are working on simple web apps that might call a few apis, yet these seem to require Big Teams. Compare that to what a single game developer might produce, and compare the complexity and performance of the product.

I think we need Big Teams for Small Stuff precisely _because_ of these 'modern development practices' that you mention. Getting things done in these paradigms takes _forever_, so you need a Big Team.

That's true. Like I said, I can only speak from my experience.

I do think that we are in a sort of "dependency hell," that is sorting itself out. In the end, a few really good dependencies will still be standing in the blasted wasteland.

Dependencies mean that a small team can do Big Stuff, but that relies on the dependency being good.

"Good" means a lot of things. Low bug count is one metric, but so is clear documentation, community support, developer support, and even tribal knowledge. It doesn't necessarily mean "buzzword-compliant," but sometimes aligning to modern "buzz" means that it benefits from the tribal knowledge that exists for that term, and you can deprecate some things like documentation and training.

People often think that I'm a dependency curmudgeon. I'm not. I am, however, a dependency skeptic.

I will rely on operating system frameworks and utilities almost without question, but I won't just add any old data processor to my project because it's "cool." I need to be convinced that it has good quality, good support, and a high "bus coefficient," not to mention that it meets my needs, and doesn't introduce a lot of extra overhead.

Nothing sucks more than building a system, based on a subsystem that disintegrates a year down the road. I suspect many folks that have built systems based on various Google tech, can relate. I have had that experience with Apple tech, over the years (Can you say "OpenDoc"? I knew you could!).

> I think we need Big Teams for Small Stuff precisely _because_ of these 'modern development practices' that you mention.

Perhaps. But what I've also seen is the head count of a given project is a direct reflection of the intra-org status of the person heading the project.

There's a belief - that's a myth - that if 3 ppl is good the 6 is twice as good and time will be cut in half. I think we also know - with rare exception - that productivity slides as heads increase.

That's a given.

Then there's also a belief - again a myth - that some mod dev practices can fix the increased head count issue. It might mitigate it here and there. But MDP can only do so much to fix a dysfunctional org/group.

Ultimately it's a leadership/management issue. Process and technology are too often lipstick on a pig.

> There's a belief - that's a myth - that if 3 ppl is good the 6 is twice as good and time will be cut in half. I think we also know - with rare exception - that productivity slides as heads increase.

This goes back to Brooks and has been true since longer than most of the programming industry has been alive. I do wonder why people are so resistant to learning from the past and just assuming "the way we do things now" must be an improvement.

> that if 3 ppl is good the 6 is twice as good

This 6 person development team is promoted by the Agile Industry. They say 6 people is a sweet spot, so that if someone goes on vacation then some other developer can "cover" them.

Yes. I should have said 6 and 12 or more. That said, it was only a quick refernce to the fact that productions drags as head count increases.

In order to get Big VC Money, before doing a Big IPO, you need a Big Team doing a Big Plan. Nobody's going to give you a billion dollars for something that's simple enough for one person to do.

You may say, "but this problem doesn't need a billion dollars!", to which I say "your corporate ownership structure isn't complicated enough, you need to make sure that as much of the billion dollars sticks to your hands as possible after you fail". WeWork passim.

I interviewed at a well funded company for engineering manager. The interview centered around how they were going to build a 100 person team so the job would include lots of interviewing and hiring.

I assume that some investor was told that "We are going to have 100 developers while our competitor has only 20" and the investor bought into that plan.

Plus managers who define their worth by easily measurable values like how many 0s on their p+l statement, and lines to them on the org-chart. Success of the project is much more subjective.

I agree with everything you have said. But the trends in "The Enterprise" are to eschew these ideas in favor of overly complex serverless architectures with poorly specified interfaces where any pluggable developer/resource can perform hit and runs on any component. What I've seen this lead to is the most brittle, bug ridden, low quality software in my career. And the irony is it's led to none of the perceived benefits of serverless architectures but has enhanced all the drawbacks of monolith architectures.

My best guess is to why things have become this way is that middle management in "The Enterprised" reckoned "Agile" as an opportunity to commoditize software development.

You might be ignoring the productivity gains in software dev over the last 10 years.

With open-source, languages like go/rust, excellent IDEs and basically free compute, the amount that a single developer can produce is 10x/20x more.

This is pure hogwash. All that productivity and more has been available for two decades with Java and C#. It's just that hipsters rejected it wholesale because those are their parents' programming languages.

Tell me about it.

I write in Swift. I love it.

I started with Machine Code (not ASM -Machine Code).

Also, all those lovely system frameworks are wonderful.

I used to use MacApp (Google it), and PowerPlant (Same).

AppKit and UIKit knock them into a cocked hat. SwiftUI shows promise, but it may be a year or two before it can really match the standards.

> This is a no-brainer.

It is NOT obvious to someone who hasn't thought about it for a while. Suppose someone is trying to persuade another person and just assumes that they already realize the costs of organisational complexity. There's a good chance they'll run into a wall and not get the message across.

If you think realizing it is a no-brainer, then your 25 years of experience is showing.

There's a name for the bias where something seems obvious once you've been told about it, but I can't remember what it's called (I don't think it's hindsight bias...)

Egg of Columbus?

> I'm a skeptic of a number of modern development practices that deliberately increase the complexity of software. I won't name them, because I will then get a bunch of pithy responses.

Bring it on. Share your experience with youngsters. And let the elite confront with methodologies you maybe didn't have experience with.

I'd also be interested. Some drivers of complexity I would think of:

* Using many of the GoF/OOP patterns, because you may need extensibility at some point. Basically YAGNI.

* Complex, hard to mentally map, build systems (e.g. CMake).

* Designing for purity over simplicity (I'm actually big on FP, here I'm thinking of the Haskell crowd which IMHO sometimes overdoes it).

* Writing a complex architecture without prototyping. Often your prototype will tell you what you need. If you start architecting too much beforehand then you often waste time on some details that don't matter, and even worse, afterwards you try to force it into your architecture which doesn't actually fit the problem. The beauty of software is that it's easy to change things. Architecture on buildings is different because you need to make sure that you're not building the wrong thing. In software building the wrong thing can give you the right insights and still be faster than planning for every eventuality.

> Writing a complex architecture without prototyping. Often your prototype will tell you what you need.

One million times yes. In my experience using a prototype as an input to a specification works much better than the other way around.

> Complex, hard to mentally map, build systems (e.g. CMake).

Also yes. It's almost to the point where one has to understand every detail of how CMake works to get it to do one (1) specific thing you need in your build process.

* Splitting requirements documents into user stories and then evolving the document further. This practice can be improved upon by adding redundancy with attaching the document to user stories and enterprise wiki. Further optimization can get to the org to CMMI level 5 but requires investment in diversity of the most critical requirements in other locations and protection of these assets from being linked.

Nah, that's OK. Thanks.

Just to clarify. I have been down this road. I am not interested in sacred cows or third rails.

I'm trying to do all my writing and commenting, based only on my own experience and insight.

I'm done with fighting on the Internet. I don't have the energy for it anymore.

I can respect that. Although I would have liked to hear your take on it, I know what you mean.

Agile and Scrum can add a lot of meetings and processes and metrics into a development team. For non-technical managers this is great because they can create nice charts with story points for upper management.

The above comment can devolve into a flame war because non-technical managers see Agile and Scrum in a different light. They believe that without proper management developers will be unproductive.

I can't believe I forgot this one. Scrum is so terrible, I don't know why it's still being introduced.

I'm always happy to chat (pontificate in an "OK Boomer" kind of way, I guess), but not in public forums. My screen name applies in a lot of places.

I've started to see it that way too. Every place where there is source code, the complexity grows instantly - it's like a virus. The only firewall you can have for this is if you have it all in different projects that need to justify themselves on their own on all merits, financially, technically, etc.

> an aggregate of much more sequestered small parts, each treated as a separate full product

What does it mean exactly? I feel you are trying to share a nice idea but I can't comprehend it. What are those small parts? Classes? Modules? Services? What does it mean to treat them as a separate full product within an organization?

One simple way of hermetically sealing organizational complexity: microservices.

Many people don't seem to understand when to use microservices. They're not for small teams.

I believe the real benefit of them is that you can have a team at say, Amazon, who works on their product prediction engine. They have well defined input data, and they have well defined data consumers need as an output. Beyond that, they just have to coordinate within their own team to build what they need to.

They don't have to meet with stakeholders across the organization and get into debates with ten other guys in other departments about adding a database field. They have their own database, of their own design, and they do with it what they want. If they need more data they query some other microservice.

Not OP, but to me this is the Unix philosophy of having many small tools that work well and interact well.

Even if your modules are very separated, if you can't individually use and play around with them they become a part of a big blob of software. Services may be products, but only if they're idependently usable.

If you have a small product that's useful in and of itself (e.g. git) you can much more easily make it work well and then integrate with other good tools and replace those if necessary (e.g. if you have problems with Bitbucket/Jira/Confluence, you can switch them out for other solutions, e.g. Gitea).

But if you have a huge clomplex product then at some point it becomes organizationally impossible to move away from it.

The Unix philosophy is all well and good, but it's very limiting. There are major types of valuable software products that simply can't be built out of small tools or services.

While there are certainly many ways to build complex software, I haven't found composing a complex system from many simple and independent parts limiting.

Exactly. It's just one way of doing things, and I definitely admit that some very good things have come from much more complex systems.

However, bug control is absolutely vital with these systems, and you definitely need some kind of quality-assurance system, or you will be hatin' life.

Care to site some examples?

Anything performance related I'd say, e.g. stuff like games.

There are some overheads involved, and you lose optimisation possibilities.


TL;DR: Lots of small modules (they could be drivers or classes), with narrow, well-defined, highly agnostic intersections.

Most of my coding work has been done as a lone programmer. Even when I was working as a member of a large team, I was fairly isolated, and my work was done within a team of 3 or fewer (including Yours Troolie).

I have also been doing device interface development for most of my career, so I am quite familiar with drivers and OS modules.

When I say "sequestered," I am generally talking about a functional domain, as opposed to a specific software pattern.

Drivers are a perfect example. They tend to be quite autonomous, require incredible quality, and have highly constrained and well-specified interfaces. These interfaces are usually extremely robust, change-resistant and well-understood.

They are also usually extremely limited; restricting what can go through them.

The CGI spec is sort of an example. It's a truly "opaque" interface, completely agnostic to the tech on either side.

There are no CGI libraries required to write stuff that supports it, there's no requirement for languages, other than the linking/stack requirements for the server, etc.

It's also a big fat pain to work with, and I don't miss writing CGI stuff at all.

It is possible to write a big project his way, but it is pretty agonizing. I've done it. Most programmers would blanch at the idea. Many managers and beancounters would, as well. It does not jive well with "Move fast and break things."

But there are some really big projects that work extremely well, that don't do this. It's just my experience in writing stuff.

When you write device control stuff, you have the luxury of a fairly restricted scope, but you also have the danger of Bad Things Happening, if you screw up (I've written film scanner drivers that have reformatted customer disks, for instance -FUN).


Based on my understanding a common name would probably be Slimlane, or Blade...

> Problem is, you need this "Big" stuff. It's crazy to do without it.

Sometimes you do. But many times big stuff gets written for reasons other than need. One of the best wins in our industry is to recognize that the big stuff isn't needed and to never start the project in the first place.

This rings true for anyone who has ever worked at a big tech company (I work at Google).

At Google when your project begins to scale up you can ask for more money, more people, or both. Most teams ask for both.

What you can't ask for is different people. You can't solve your distributed systems problems by adding 5 more mid-level software engineers to your team who have not worked in the domain. Yet due to how hiring works, this is what's offered to you unless you want to do the recruiting yourself. Google views all software engineers as interchangeable at their level. I have seen people being sent to work on an Android app with hundreds of millions of users despite never having done mobile development before. That normally goes about as well as you'd expect.

So you end up with teams of 20 people slowly doing work that could be done quickly by 5 experts. In some cases all you lose is speed. In other cases this is fatal. Some things simply cannot be done without the right team.

I see the same thing, and the only way I can reconcile this is that the benefit to sr leadership in terms of treating SDEs as fungible is so massive that it is still worth the massive productivity loss from assuming exchangability.

What are the benefits to treating SDEs as fungible?

At Amazon, Sr. Leadership and HR love to pretend all SDEs at a given level are interchangeable, level actually indicates competence, and leetcoding external hires with zero domain knowledge have far more worth than internal promos. All of the above assumptions seem completely insane to me and have resulted in the destruction of many projects.

Yeah me too. Also at amazon. And yet amazon is obscenely successful, as are other big tech companies which take similar strategies. It seems that matching expertise to projects is so fucking hard that just giving up at the start and accepting it’s impossible is the optimal strategy.

Honestly I don’t know. I agree it’s weird. But these companies keep succeeding doing it this way, so I’m not sure what to make of it.

> But these companies keep succeeding doing it this way, so I’m not sure what to make of it.

That doesn't necessarily mean anything. The fact that a system might be working, doesn't mean it's anywhere near optimal. I think these companies are successful in spite of these types of policies, not because of them.

Looking at the metrics used in the publication[0], it seems most of them focus on the absolute number of engineers working on a given component. This makes sense — more engineers touching a component introduces more opportunities for bugs. (Edit: as other commenters have pointed out, total lines of code, highly correlated to number of engineers, is likely the best first-order predictor of bugginess.)

I bet we can improve predictive power by considering the degree of overengineering, i.e., the number of engineers working on a task (edit: or lines of code) relative to the complexity of the task they’re working on. 100 people working on a task that could be accomplished by a single person will result in a much buggier product than 100 people working on a task that actually requires 100 people. The complexity of code expands to fill available engineering capacity, regardless of how simple the underlying task is; put 100 people to work on FizzBuzz and you’ll get a FizzBuzz with the complexity of a 100 person project[1]. Unnecessary complexity results in buggier code than necessary complexity because unnecessary components have inherently unclear roles.

Edit: substitute "100 people" with "10 million lines of code" and "1 person" with "1000 lines of code" and my statement should still hold true.

[0] https://www.microsoft.com/en-us/research/wp-content/uploads/...

[1] https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...

My own (current) personal hell is essentially the inverse of this assumption. I work in an area that was essentially ran by 1 extremely overworked developer. The result? Now there are 40 people (including management, PMs, Scrumlords, QA and devs) doing the work originally done by a single person. The tech-debt and cruft is unbelievable (considering the domain is also quite complex). Every decision made before the 40 people were hired was made just to check off 1 of 1000 things on this person's plate... I could probably write a book about this if we are ever able to turn things around, which requires explaining to management why the things that have been 'working' for over a decade are longer working.

The sad part is, it would seem like all the engineers we have are overkill, but in my little silo, we could easily split our work into even more sub-teams, hire 12 more people, and still keep churning just to stay afloat. Sorry for the rant, I'm not sure exactly what I'm driving at. I guess I'm just trying to give a cautionary example of how not to manage large-scale software projects.

Often times, I've found situations that you describe (ones where you need to throw infinite developers at to fix) to be a smell that what your "product" does isn't something that isn't line with how your business delivers value. Such things are almost always candidates to be replaced by third party software of some kind.

Maybe I'm wrong though. When you are charting new ground, building new shit that has never been build before--which is what your product teams should be doing--you don't have years long backlogs because you can't see that far out. Good, productive feature work is iterative.

If you can see with a high degree of clarity what you will be working on 5 years from now, it probably means it's been done before and you are better off cutting a check for it.

Hopefully this makes sense :-)

I agree. I've even suggested to my manager that we'd be better off utilizing a 3rd party. Problem is, this is a corporate gig. We don't have a 'product', we're just a cost center for the business. Even if we outsourced to a 3rd party, that would likely take years to coordinate.

That's super interesting! What would be a good way to measure the comlexity of a task in some objective way?

Also, the study doesn't really take "tasks" into account at all, it seems. Just modules and data relating to the modules.

>What would be a good way to measure the comlexity of a task in some objective way?

From an existing codebase this would be very difficult to objectively assess. I think you’d have to study it empirically — come up with a set of tasks (“A”) that each takes a single programmer “P” on average a week to complete. Then come up with a set of tasks (“B”) that each takes a team “T” of 10 programmers on average a week to complete (ensure that 10 programmers is a lower bound, i.e. decreasing the number of coders causes the project to take longer). Across multiple solo programmers and teams, compare the quality of the code produced by programmers P on tasks A, teams T on tasks B, and teams T on tasks A. I’d bet P/A > T/B > T/A.

I wonder if number of engineers is strongly correlated with lines of code. This would indicate maybe lines of code is still the best predictor... will read the report to see if they bring it up!

I agree that lines of code is probably the overall best predictor to first order (didn't mean to imply otherwise; I've edited my original post to clarify). I just meant that x lines of overengineered code will almost always be buggier than x lines of non-overengineered code.

I like the "surface area" metaphor. Every where I've worked took a divide & conquer approach.

Some day, I'd love to participate in the NASA / JPL style. Everyone reviews the entire code base together. Bugs are assumed a failure of process. I guess the thinking is all bugs are shallow given enough eyeballs.

Realizing now that I'm a hypocrite (again). I hate pair programming. But do kind of enjoy code reviews. Now I don't know what I believe.

P.S. I’m an idiot who can’t keep orders of magnitude straight. “10 million LoC” should be “100k LoC.”

How do you measure/quantify over engineering?

Isn’t this aligned with Conway’s law? I mean, a complex business model requires, most likely or eventually, a complex solution? If not, the two systems are at odds, and the computer system is even more complex/buggy since it doesn’t follow the organization's complexity, it doesn’t do what the organization need it to do.

That at least is my experience anyway.

I was surprised that Conway's law was not mentioned.

In the original publication that’s the subject of the article, it was: https://www.microsoft.com/en-us/research/wp-content/uploads/...

Just skimmed over the post, so it's possible they pointed this out and I didn't notice - but I think this is misleading. The title makes it sound like organizational complexity _causes_ bugs, but in reality I think both are simply effects of a more underlying cause.

Larger and more complicated software both requires a bigger team (therefore more organizational complexity) and is more likely to contain bugs.

Steve McConnell identified it as the number of lines of communication in the team or dept creating the module, incl dependents and dependers.

It's why Conway's Law exists, and points towards the importance of well-designed and -specified APIs.

If you consider people on a team as nodes in a graph, and lines of communication as edges, then a team of n people has potentially (n(n-1))/2 potential lines of communication. I try to express to people that the more potential lines of communication you have, the greater the chance of miscommunication. I think this is also called out in Brooks' The Mythical Man Month.

Indeed. It's even a law named after him:


Upvote for mentioning Steve McConnell.

I think that's a fair point. But organizational complexity is compared to other measures, such as the complexity of the software itself, the number of dependencies, etc - i.e. the size of the software itself - and the study found that organizational complexity is still the #1 method.

I agree with this assessment. It's basically an observation of parallel occurrences.

Big projects require big teams, and also have a lot more "trouble nodes," so there are many more places for bugs.

The big team is not the cause. It is simply a natural coincidence.

If you have a big project that needs a big team, then the big project is the cause, and the big team is only coincidentally there.

But if you have a medium project, and you put a big team on it, then I suspect that you will get big team problems (communication issues causing bugs) even though the project itself didn't cause the issues.

Conway's principle

There surely also exists some large and complicated software that was not developed by large and complex organizations.

>Organizational Complexity. Measures number of developers workong on the module, number of ex-developers that used to work on the module but no longer does, how big a fraction of the organization as a whole that works or has worked on the module, the distance in the organization between the developer and the decision maker, etc.

After one of the early big software project failures (maybe Multics?) there was a quote about software projects going around (maybe John R Pierce?) that "If it can't be done by two people in six months, it can't be done."

One of the functions of good software design is to break the system down into pieces that a couple of people can complete in a reasonable length of time.

Herbert Simon is who you read first when you thinking about orgs - https://en.m.wikipedia.org/wiki/Satisficing

That will take to you to healthy and productive places.

The article examines this through the lens of Windows Vista. Am I the only person who actually did like Vista and didn't have any problems with it? I gathered that most of the issues people had with it was caused by third party incompatible software and hardware.

Microsoft tried not to increase the minimum specs required over XP very much. This lead to certifying a bunch of underpowered hardware as "built for Vista" which lead to many, many people having super slow computers. On top of that, the UAC used to be a security boundary and it would prompt very often because Windows software had been written to behave like it was on a single-user system and mess with system files all the time, so it was really annoying.

I also had little trouble with Vista and liked it well enough. But I had plenty of RAM, I believe it performed badly with 1 GB or less. And of course people got hung up on UAC.

That’s kind of crazy. Firefox and libreoffice don’t do well in 1 Gig of ram but at least they’re doing something. You can run a decent DE and few decent apps in 1gig just fine with most Linux distros.

And Microsoft Word ran perfectly well on 16MB of RAM in Windows 95, back in the day; and even better on NT with 32 or 64MB, productive and more than comfy.

I guess my point is that running decent software on what today would be considered very little hardware is a solved problem, but it's not what the economy is optimized for.

The difference here is that LibreOffice is still maintained (in fact, the math typesetting in Microsoft office is practically unmaintained at this point and is just miserable to use.) Old windows (and old Linux) have serious problems that modern OSes don’t have. You can run modern good and compatible software in very little ram and the only reason not to is because someone else is forcing you or you’re just not aware.

Obviously you're not the only person, but the data points to a flawed release.

For me, Vista was slow as molasses, which was enough to upset me and make me hate it. For a lot of people, it also had driver problems.

No. It was Windows Longhorn. Microsoft abandoned it.

I guess this is why startups exist because it it almost impossible for larger firms to execute good ideas even if they think of the idea themselves

I don't think that's true, but often times when you work in a large org building actually ANYTHING is like pulling teeth.

Every decision you make is second guessed by 8 other people, and anything you do impacts 5 other teams. It's infuriating unless you're the type of person who loves working through people problems, and a lot of developers, including myself, aren't that kind of person.

Also, maybe it has to do with the fact that failure is an expected part of startups, but in the business world there's perverse incentives to build something mediocre, expensive and ultimately useless just to give the appearance of success.

With a startup, you're free to do whatever the hell you want. If it works, it works, and if it doesn't it doesn't.

I guess it would be possible to give in-house teams free reign to try out new ideas without all the organizational friction, but the startup model works, so why not just throw some money at some kids and see what happens?

> I guess it would be possible to give in-house teams free reign to try out new ideas without all the organizational friction

The traditional name for this is a skunkworks project:


They tried something like it somewhere I worked but messed it up. Everyone needed to pitch ideas then the 'approved' projects were the pet ideas of various suits, put under the same restrictions as regular projects, and choked to death.

Sorry I realized that we were both saying the same thing. When I say that large companies can’t execute I mean they do try, but 99.9% of the time they fail because of internal large company structures. I meant to say can’t execute successfully , not won’t try :)

What is the underlying reason goliaths don't execute ideas well? Too many chefs in the kitchen?

Lots of freeloaders and incompetent power trippers in the middle layers and above. Small orgs. and startups cannot afford this kind of workers.

In my current project (big co.) we have a technical PM, a non-technical PM, a non programmer dev lead, an scrum master and a lead business analyst, all involved in managing the work of a team of 2 and a half (a sr ba/qa guy, a part-time ssr dev and me). Waste work is probably in the 90%.

> Lots of freeloaders and incompetent power trippers in the middle layers and above.

Not just middle layers and up. Freeloaders and gatekeepers everywhere.

In my experience it's risk aversion. They're more worried about losing out from doing the wrong thing or breaking what they have, than a slow death from decaying market relevance. It's much easier to stay the course than to stick your neck out asking to change things.

Political land grabs for the "new hotness".

I think all current and ex Microsoftees can agree (and probably other workers in Big Tech Corp) that this is not only obvious but ongoing and dastardly resilient to getting solved! At some level this must be a sociological thing because humans seem to be hardwired into repeating this mistake.

This happens at smaller companies in smaller ways but the effect is the same.

It's worse than the "Mythical Man Month" in that production is not simply slowed down but it is slowly made rotten until it gets burned, buried, or passed off to out-sourced maintenance.

The article ends with a mention of the book 'Accellerate'. Accellerate is your typical management book, based on some surveys with bad statistics and worse conclusions. Weird he mentions this book after talking about what a proper, replicated study Microsoft Research did.

Previous discussion (which got flagged):


I can't quickly tell why it was flagged. Was the study highly flawed? Or promoted in a dishonest way? Or was it really the comments as a whole that got it flagged?

Probably because the title was originally extremely linkbait-y.

...which is likely proxy for complexity of the task - the more difficult the task, the more likely it'll have bugs - not an exciting revelation.

Their "code complexity" means nothing. If you compare simple "todo web app" it'll have orders of magnitude more "code complexity" than, for example, sha256 hash implementation.

And rightly so. It is much easier to write a bug-free sha256 implementation than try to write a bug-free To-Do app.

Conway's law strikes again.

Does this mean that a solo developer working on a SaaS product can avoid bugs?

Well, in theory anyone can avoid bugs. It's just really costly and requires a very structured approach to doing anything.

It's very unlikely, however, that you figure out every single combination of variables while writing tests. So, so in practice, I don't think it's possible to avoid bugs, unless you have a really tiny code base, maybe.

That's actually very interesting IMO! Now that you said that, it seems obvious to me that there's a sweet spot somewhere, and that it's higher than 1 and (probably much) lower than 100.

Days of yore when Bill Joy would rewrite / update Unix and userland over the weekend.


surely this solo developer will avoid bugs regarding "communication misunderstanding" between the team elements.

I'm not totally sure about that. "Me, yesterday" has to communicate with "me, today", and communication misunderstandings can absolutely happen.

Now, sure, such things happen less when it's me talking to me than when it's me talking to you. They still happen, though.

The use of the term 'p-value' seems off, it looks like he is referring to a different concept but using an overloaded term.

I have a Microsoft keyboard that has a dedicated Calculator button. In older versions of Windows, I used to press that button and start entering numbers right away. But in newer updates of Windows 10, I now have to press the Calculator button, WAIT FOR THE CALCULATOR TO LOAD, and then start typing my digits. I think this is ridiculous.

Agreed. The market for super snappy user experience seems to be underserved. Maybe not enough demand? Or maybe a symptom of the "everything is free and paid for with advertising" model we have today.

The calculator team has a GitHub discussion issue open tracking launch performance investigations if you're interested: https://github.com/microsoft/calculator/issues/209

If you were using Linux, solving your problem would be trivial. Practically, I can only recommend using a non-local calculator such as a search engine, since despite network latency that codepath is more heavily optimized and people now spend so much time in browsers. CTRL+L (address bar) then type "100+50/25=" and press enter. Bingo.

I don't share this experience. I think you have an underlying issue.

This started happening when default calculator changed from a classic native app to the new "modern" app.

Code/technology is nothing more than a tool. A toolbox is only as good as the person/peoples who pick it up. That is a great tool will not save a disorganized organization. The great tool will not make a crap product better.

Blaming the means for the ends is a classic n00b mistake. A mistake that's being made over and over and over again.

From the article:

> In the replicated study the predictive value of organizational structure is not as high. Out of 4 measured models, it gets the 2nd highest precision and the 3rd highest recall.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact