”My emails produced only well-worded refutations. They explained quite factually why the setup is the way it is, and implicitly therefore why it could not change”
This landed so truly for me,
it felt like a punch in the stomach.
I wouldn’t dare count the number of times I’ve been told the technical details of why something is the way it is, without anyone ever saying the reason why we actually wanted it to be this way. My thesis was usually:
we don’t.
In my career I feel like I have seen hundreds of examples of me saying the systems equivalent of “lets put the dining table indoors?” to be told that the dining table is outside because the original budget meant the front door could only be yay wide so we had to leave the table in the yard and put a tent over it. And I’m just left standing there agape at how we eat in a cold wet tent every night instead of fixing it.
Except it’s usually more like: why do we have to spend $9k on a commercial dishwasher repair contract? Because we have a commercial dishwasher … to get the rust off the silverware … because we eat outdoors every night … because the front door was too small to get the dining table in the house.
Somehow, when the real examples of this stuff are clever engineering around build / docker / polyrepo / release / feature flags / third party bugs, the cleverness makes people think the existence of the workaround should be tolerated. It’s infuriating to join a new team held hostage by years and years of band aids because they never suffer the bigger picture consequences.
The whole article was fantastic. I hope the author has the engineering leadership role they deserve. We need more people like this.
If you spend all your time refactoring, cleaning up after legacy design constraints, fixing ossified errors, then you run out of time and fail to write actual meaningful income generating features. Conversely if you make none of those improvements, eventually the weight of bad architecture slows all progress to a halt and no new income generating features get delivered.
One of the hardest parts about advancing as a developer, in my opinion, is being able to tell when you should refactor versus just leaving the old working mess alone.
With your example, it's like if it turns out that the dining table is embedded into the ground with concrete because it kept blowing over, and moving it indoors would require getting a carpenter to create new legs. And also that because the dining table has been there for so long, someone decided to run electricity cables through it, so rerouting it requires an electrician and will shut down the factory at the bottom of the garden for half a day. We could buy a separate table for indoors, to try and slowly migrate to the new table, but then we'd have two tables to maintain and we all know how that usually goes.
At a certain point, you look at it and go well the commercial dishwasher is just $9k and we can focus our efforts on building that loft conversion for now.
This is a naive question from someone who hasn't been involved in project management in almost a decade, but why can't feature development and refactoring be split across two different ownership groups?
The first group writes the initial version and all iterations (extra features) up to the point where the expected returns from quickly pushing out future iterations is less than the amount of required effort.
The second group then comes in and does a complete refactor, without changing the look or feel of anything that the customer actually wants. Meanwhile, the first group moves onto "the next big thing".
There’s a parallel comment warning of the second system syndrome, but I’d like to point out the problem of “who gets the credit?”
Google famously suffers from this: the pm who launches a product gets promoted; the person who adds a feature gets some credit in the employee review and the person who fixes bugs is judged to have wasted their time.
I suspect Apple has this problem too (they definitely prefer reimplementation rather than evolution in many cases) but their processes are more opaque.
Who would want to be on the cleanup team when the glory goes to the path breakers?
> Who would want to be on the cleanup team when the glory goes to the path breakers?
This and the second system syndrome are both organizational issues. Why couldn't an organizational simply freeze the customer facing portion of the application (so no UX or added features) and tell a group of developers responsible for the refactoring that they will be judged on a set of achievable metrics, such as decreased infrastructure costs or better performance?
This would fall squarely within common managerial frameworks (it's basically Tuckman's group development model, or what you see at many startups that launch an MVP), except that the initial application development is handled by a different group of 'high performing' developers.
1) The reason why such a refactor might be necessary is from things the first group tried but didn't quite work as intended, or that the users used the system differently to intended. The first group has that knowledge, the second group doesn't. So the first group will do the refactor better than the second group.
2) Beware of Second System Syndrome https://en.wikipedia.org/wiki/Second-system_effect where everyone tries to put in every feature that was missing from the first system, simply because there is no urgency around the second system, because the first system is already running.
That's a great elaboration, very accurate, and funny! But in reality that table would always get fixed, the only reason such absurdities will remain in IT systems, is because it's not immediately visible.
I disagree completely - most people know about the problems and they've accepted them instead of tried to gin up the organization effort to fix it, because the last time they tried that they either became responsible for the cleanup or got smacked down by someone who should have been but got it wrong.
The choice between refactoring and money-generating work is a false dilemma. There are other options, and the developer doesn't have to make that decision or carry out the work all on their own.
If the code has turned to spagetti then how do you manage to change code quickly (due to e.g. Corona rules) so you can follow where the market went and not get competed out of business?
When the company is in startup mode and has no customers, it's easy to just throw more mud on the wall.
But when you have an existing business based on 1M lines of code and you want to keep being in the market when the market changes quickly, then spagetti code can be death. Being ready means having cleaned up code beforehand so it is easy to change it.
At an organisational level, you have to make a decision on how much time you spend doing one or the other. It might be that some developers never do any refactoring but someone is always going to end up doing it. Or nobody does it, and the code slowly decays.
Unless you're saying that you don't have to do refactoring at all in the organisation, but the only way to do that surely is always get it right the first time, which isn't hugely practical. You may sometimes encounter a situation where the quickest way to build a feature is to fix some old ugly code, but that's certainly not the case every time.
My whole point is that you don't, because there aren't always just two options. That's the false dilemma logical fallacy.
I'm saying you can fix problems without dropping everything and redoing work. You're allowed to problem solve and work with people to create a third option. And you can prevent new ones by learning and strategizing.
Well whether you drop everything or clean as you go or whatever other strategy, fixing stuff takes time. Even if it's just the mental effort of designing a better way and consensus building.
I'm just using simple analogies for the sake of explanation, but it is nearly always the case that expanding the scope of work to fix previous architectural decisions that were either flawed or no longer relevant will take considerably longer than just fixing the problem at hand.
There may be the odd time, particularly in a large, well defined piece of work, where you can say actually tidying up this other stuff will save time overall. Or perhaps you can batch a bunch of improvements in the same system together into a larger, more thoughtful architectural improvement. All of that is great if you can do it, but it's often not possible.
As far as preventing future architectural issues by learning and strategizing, I feel like that's what we spend our entire career trying to get better at doing ;). But alas I, and everyone else, seem to continue making decisions that don't pan out long term. Even if you did make a perfect decision at the time, often the world/business/third party dependency changes, and what was an excellent decision in the past becomes a pain point a few years later.
It used to be the case that we tried to design infinitely extensible software so future requirements could always be incorporated, but that makes the software unmaintainable. So the pendulum swung to YAGNI and only designing for exactly what was right in front of you, but that leads to major architectural overhauls every few months. True answer is somewhere in the middle, but learning where is something that only seems to come with decades of experience.
Unfortunately older programmers all seem to be forced out of developing and into management or other careers for some reason.
I'm still trying to challenge your assumptions. Why does a different solution necessarily require expanding the scope of work? Like you said, that's where experience helps to have those skills in your toolbox. Doing things better doesn't have to be harder.
It doesn't always require expanding the scope of work, but very often does. I even suggested a few situations where it doesn't, but in many cases fixing the true underlying problem involves expanding the scope of work.
It's hard to argue the nitty gritty without examples so here's a real world one from quite a long time ago, in a company that went bust after the death of the owner.
--
We had a system that had a significant quantity of code written in a custom language that would be compiled by an internally written compiler. This compiler was in some ways a work of genius, written in the 80s, but it had a lot of very deep architectural flaws in the optimiser that meant certain patterns of code would generate invalid output. We didn't write much new code in this language but had a pretty large body of code that needed to continue running.
So during a server hardware refresh, we found that almost everything was crashing. Turns out, a compiler optimiser flaw meant that any time a loop had a number of iterations that wasn't a multiple of the number of CPUs, generated programs would segfault.
We investigated what it would take to fix the underlying issue but it would have been a week or more of work just to understand why it was happening. Porting all the old code would have taken even longer.
Instead what we did was, using a pre-existing AST manipulation library we had written, add a prebuild script that hacked all of the files to include a CPU count check then pad out the number of iterations with NOPs. Took a few hours and unblocked the server upgrade.
--
Another, perhaps less esoteric and more recent example:
A third party open source library we use had an issue where a particular function call would sometimes get stuck in an infinite loop due to incorrect network code in the library interacting badly with our network hardware.
We submitted a bug report and fix, but maintainer wouldn't accept a fix unless we also changed a bunch of other related code, added a bunch of tests etc. which we didn't have time to do. We considered a fork but that would involve keeping it up to date, rebuilding packages and so on.
We worked around the issue by running it in a different process and monitoring CPU usage. If CPU usage goes beyond q certain threshold, we kill the process and try again.
Workaround was quick and has been working fine for over a year now. Contributed patch is still languishing in an open PR with various +1s from other users.
I think your examples agree with my point: You found minimal-time solutions that haven't caused continuous suffering afterwards, and can be easily removed when the root cause is fixed. That's a good result.
That’s a straw man argument. Of course nobody will allocate 100% of their time to clean things up instead of delivering features. That’s the path to bankruptcy. What you should do is to allocate (say) 10% to improving the system. For example, on a team of 10 software engineers allocate one to refactor/improve/simplify/remove pain from the process itself. It will pay itself back many times over in long term because of improved productivity. And even better: take turns. Each developer will get 10% of their time to improve/speed up the things that are slowing them down or is painful/frustrating. The morale boost and increased productivity is worth much more than the time spent.
> In my career I feel like I have seen hundreds of examples of me saying the systems equivalent of “lets put the dining table indoors?” to be told that the dining table is outside because the original budget meant the front door could only be yay wide so we had to leave the table in the yard and put a tent over it. And I’m just left standing there agape at how we eat in a cold wet tent every night instead of fixing it.
I have, too. And then I usually haven't managed to put the dining table indoors. And then new people came in and asked the same question you ask, and by then I was one of the people who tried to put the dining table indoors, and explained how it wouldn't fit through the front door, and how I tried to get it in through the window. And then the new people try to put the table indoors and fail and next thing you see they're either leaving the house or explaining to the newcomers why the table is outdoors.
Ultimately, I've realized that talk like this is cheap, unless you can actually improve things. That requires leadership skills and some political capital in your organization. I don't think the author of the article deserves an engineering leadership role simply for complaining about things. (They might still deserve an engineering leadership role for other reasons, what do I know...)
With apologies to Antoine de St. Exupery; if you want to build a better system, don't drum up Jira tickets to gather user stories, make sprints and divide the work and give orders. Instead, teach them to yearn for a system that's not total bullshit.
Simply complaining is tiresome. Writing a well-reasoned internal blog post that explains the faults, gets traction for improving things, and gets people excited for your brave new world, even though it's not arrived yet; that blog post is what engineering leadership looks like.
> Instead, teach them to yearn for a system that's not total bullshit.
I worked for an organization full of such people. Intelligent, competent, and worked hard. And yet... the system, ehh...
(I've read the Citadelle on a long bus ride many years ago. It was exactly what I needed to read back then, I enjoyed it very much. Thank you for reminding me of it.)
One of the biggest systems-level failures in recent memory is the Boeing 737 MAX story. I read this article and your comment and then went looking for an autopsy, found this:
"The Boeing 737 MAX: Lessons for Engineering Ethics (2020)"
It's an example of a workaround that should not have been tolerated:
> "The Maneuvering Characteristics Augmentation System (MCAS) software was intended to compensate for changes in the size and placement of the engines on the MAX as compared to prior versions of the 737."
Rather shockingly this wasn't even an engineering problem workaround; it does seem that it was solely designed to avoid an aeronautical reclassification of the aircraft that would have required pilots to undergo an expensive retraining program on flight simulators, which might have caused lost orders.
This does look like a systems-level failure, but one at an organizatonal level: the system went from a state where engineering took priority, to a state where financialization took priority. In systems thinking, this could be called a state transition: a fluctuation takes place, and afterwards the system settles down to a new (apparently) stable state quite different from the old state:
> "One factor in Boeing’s apparent reluctance to heed such warnings may be attributed to the seeming transformation of the company’s engineering and safety culture over time to a finance orientation beginning with Boeing’s merger with McDonnell–Douglas in 1997 (Tkacik 2019; Useem 2019). Critical changes after the merger included replacing many in Boeing’s top management, historically engineers, with business executives from McDonnell–Douglas and moving the corporate headquarters to Chicago, while leaving the engineering staff in Seattle (Useem 2019). According to Tkacik (2019), the new management even went so far as “maligning and marginalizing engineers as a class”."
> It’s infuriating to join a new team held hostage by years and years of band aids because they never suffer the bigger picture consequences.
What’s even more infuriating is seeing new engineers join that team, question why the hell something insane is insane, and then slowly grow used to the insane thing. Only for the cycle to repeat when the next new person joins.
I make a huge point of confirming to them, that yes we do acknowledge it’s insane. Even if management doesn’t care.
> In my career I feel like I have seen hundreds of examples of me saying the systems equivalent of “lets put the dining table indoors?” to be told that the dining table is outside because the original budget meant the front door could only be yay wide so we had to leave the table in the yard and put a tent over it. And I’m just left standing there agape at how we eat in a cold wet tent every night instead of fixing it.
Oh wow, that hits home. To be fair, the historical context for the decision can be valuable information, the problem is the next step. Even if you can't fix it right now, you might make steps towards that. Or you might say: Now that we have this heavy table outside, why not attach more things to it?
I agree, the impasse that I often see is between people who think a change must happen “now“ and those who think it should happen “never“. There is a lot of space between those positions, an optimal usually exists in there.
It’s just like re-factoring heavily interdependent code, except without the advantage of the dependencies being written down
> There is a lot of space between those positions, an optimal usually exists in there.
Yeah but finding, or rather estimating, this optimal is a lot of work, and requires you to have one foot in both camps, and some kind of process/authority to make a decision, and some incentive to make a short term sacrifice for long term gain. That's just not going to happen in a weekly sprint planning, in a company that's aiming for the next quarterly report.
everyone's worked in companies like this. did you try formulating very specific, actionable migration plans at any of these jobs? It's one thing to say, "this is stupid! we should use XYZ" and expect everyone to say "wow! you're right, we'll do that right away", and another (quite another) to actually formulate the superior architecture concretely, break it into digestible migration steps, sell the organization on if not the whole architecture at once, then at least on the first several migration steps, and to guide the organization into that new architecture for real.
obviously this is more or less possible given specific organizations and personalities at said organizations. my point is more that the magnitude of the task of migrating the ossified organization to a better architecture, even with a fully pliant staff totally on board with changing, should not be underestimated.
... because we forgot that you can take the legs off of the table ... because the screwdriver we needed to take off the legs was in use elsewhere when the table arrived ...
Don’t ask for permission, just fix the stuff so that it works for you and maybe your small team and then announce it.
There is no documentation and no planning? Just start writing documentation, just start planning. If you need permission, I grant this to you. I‘ve seen too many internal projects not even having a README, so this is now something I start whenever I have to debug something and wished to have documentation.
Someone needs you to do something? Ok, I‘ll share my screen, start asking questions, and write all the important things down.
And now you try to suggest using a screwdriver, but your suggestion gets immediately attacked and buried because the team already have an established culture around the impossible table, and they don't want to change their ways, or made to look incompetent by revealing that there was an easy solution all along.
Or it can be as simple as someone's (or a whole team) job is to maintain those workarounds and bandaids and they're very invested in keeping their job (or they hold all the IP in their head).
Not just an established culture but a product with a long history of success despite its jank and problems. I appreciate the enthusiasm of new hires but often they don't understand priorities or that the goal is profit and not perfection (for most of us).
There are two of me. The me of now, and the me of hindsight. Hindsight me is way smarter, and able to criticise every decision we made leading us into the mess that is now. Now me needs to make quick decisions based on imperfect information, budgets and tight timelines.
If it were up to hindsight me, we'd have carefully designed and orchestrated every past decision. We'd do full design qualification and change control on projects and purchases, researching carefully to never make a mistake. We'd sit as a team and brainstorm every possible implication. We'd write, execute and document tests for everything to prove ourselves. Any sniff of inefficiency and we would stop everything and fix it, no matter the cost. We'd take time to document, investigate and follow through. It would be a glorious cavalcade of plans and CAPAs, qualifications and tests. And reports! Binders and binders of wonderful validation reports everywhere!
If it were entirely up to hindsight me, we'd run our little widget company like we're building a space shuttle. Of course, we'd never make any money. But we'd be doing it right, by gum!
Most companies need to be somewhere in the middle. No hindsight and you end up a tangled mess of short-sighted kludges, all hindsight and you can't move forward. Either way you risk ending up a lead balloon.
If you find yourself in the kludge company territory, then here's some advice from a talk I recently attended[1]:
Start by training yourself and your team in Root Cause Analysis. Empower them to start thinking deeply and critically about what's really causing the problems and inefficiencies you encounter. Understanding root causes naturally translate into solutions that aren't just bandaids. Use these skills in your day to day, and you'll start building a culture of quality around your systems.
> I wouldn’t dare count the number of times I’ve been told the technical details of why something is the way it is, without anyone ever saying the reason why we actually wanted it to be this way. My thesis was usually: we don’t.
In my experience, at the time the decision was made, folks did want it that way. The organization has lost that context as to why and has only documented the technical design.
A curse shared by less effective engineers I've worked with is to rage at legacy decisions unable to convince the organization to revise them. They lack the ability to understand the various stakeholders involved and to come up with a plausible plan. A systems engineer (as referenced in the blog) would understand the various sub-systems that make up an organization and be able to drive the change they desired (the conclusion that it's irreparably broken or you lack the expertise to fix it would be fine too).
Maybe it's just over my head, but the article was quite a letdown after your great analogy. It may have valuable information in it, but I don't see how I could share the article with the people who might need to hear it and have them understand or care. It's preaching to the choir.
I'm reminded of back when I was studying pattern recognition for a system that would become an Expert System (this was before that term was used). I would read many articles saying what techniques would work. I had the urge to ask "But this doesn't show how you got there. Show me your discarded solutions that didn't work." I would like to see your wastebasket.
Similarly, I am inclined to someone who acts as an expert to tell me five ways that won't work.
> Somehow, when the real examples of this stuff are clever engineering around build / docker / polyrepo / release / feature flags / third party bugs, the cleverness makes people think the existence of the workaround should be tolerated. It’s infuriating to join a new team held hostage by years and years of band aids because they never suffer the bigger picture consequences.
The only logical reason to do this is because it has no impact on the business. Or at least, smaller impact than a total rewrite/refactor would have.
If an engineer presented a case where fixing an underlying issue resulted in better business outcomes vs. a short term band-aid, then I don’t know anyone that would tell them no. Businesses want to succeed. They want to make money. If you can help me make more money I’ll let you do whatever you want (within the confines of the law and civil society).
I think half the time there was no need for a dining table in the first place, just a shiny solution waiting for the question to be figured out afterwards.
I work in a small organization which is kinda chaotic, meaning we have a lean process. Very agile, short standup each morning, but nearly no sprint planning.
I think, it's liberating to work in such a company as an experienced developer. You get difficult tasks sometimes, where you need to be the System Engineer and understand the system fully. But you're free to approach the problem in any way you see fit. This requires a lot of trust from the company and your team, but I believe that's a good thing.
It's also a nice for customers, who sometimes have crazy requirements, but they still get results in a reasonable amount of time.
On the other hand, this approach completely destroys newcomers. It's nearly impossible for them to approach a System which they can't fully grasp and is in constant flux. We mentor them, give them easy and introductory task, review their code, but still... It takes a massive amount of time. And I think, it's one of the reasons a company like this has a hard time growing.
And there's the risk of the 'not so competent programmer', who knows how to fix stuff, without thinking far enough.
I don't know, if I'm a System Engineer, but I think it aligns with the description in the article. However, I fear the day, we all agree to overhaul our processes, after which we need to plan and document and review everything.
It seems to be a necessary evil for a company to grow, but at the same time we would also destroy a lot of the liberty we currently have and I don't know which side is worse.
> And there's the risk of the 'not so competent programmer', who knows how to fix stuff, without thinking far enough.
I would actually claim that majority of non-seasoned programmers fall into this category.
It's rare to see an engineer who would actively set out to destroy or cripple the system they are working on - but it is incredibly common to see one fix the problem they have at hand, using what happens to be available, and make the overall system just a tiny bit less pleasant to work with. Or understand. Let alone maintain.
Repeat the above a hundred times and your codebase would make Lovecraft take note. If each such modification reduces the quality just 0.5%, after 100 rounds you're looking at an aggregate damage of 40%.
> It seems to be a necessary evil for a company to grow
Yep, the needs of a company change when and if it grows. Early days, a small team "getting shit done" is probably what you want to find your place in the market, make early customers happy, etc. But as you scale on all axes: time, headcount, headcount turnover, customer count, feature count and so on, that early pile of shit that got done can start to bog you down. Even worse if no one recognises the need to change and continues adding to the pile.
I've seen a few really brilliant engineers struggle with this in various ways:
1. Start hating their job but being confused because they love the company -- not realising that the company has become a very different beast to the early days.
2. Struggling to get out of the "get shit done" mentality and just floundering. Even becoming a net negative contributor in some cases.
Worth remembering this I think in an industry where rapidly changing companies are so common. Your company might be a good fit _now_ but that does not mean it will stay that way!
I think the biggest issue with the grandparent comment's approach is that it makes things extremely difficult for newcomers as was already mentioned. Turnover is inevitable in an organization and that means that eventually there will be a point where the person who wrote the initial code is no longer at the company. Touching that code is a pretty big liability if no one really understands how it works and that bogs down productivity a ton. Another reason why sometimes rewriting something is the right answer.
In my experience, lack of planning and docs strictly reduces liberty, because everyone is afraid to do anything but the most conservative thing, even when more radical work is clearly needed.
Matches my experience. I've lost count of the number of times I've seen people ask "why did they decide to do it this way" or "does this value in the test meet some requirement, or is it a bug?", without finding an answer because it is not documented and whoever did it left the company years ago.
It just takes time, a lot of time. And meetings, endless meetings about what to make and how. Constant reviews of the progress and the code. And documenting everything.
I don't think, that this is necessarily bad. I even think, that this is a must for a large team. But it changes the way you code.
Right now I have the time and opportunity to 'form' the code the way I want. I can try different approaches, rip out or rewrite bad code and add features as I see fit. That's what I call liberty. I don't feel like a raw coder. Instead I feel like an artist in my own universe I where I'm in charge.
This feeling can get lost, when you need to request, plan, control and review every single change in coordination with multiple team members. You're not in charge anymore. You're not the artist, but a cog in the wheel.
I think some people like one way or the other. But it's hard to transition.
To my way of thinking documenting the work is vitally important, even more important than planning in some respects. Or maybe we could say planning is just another form of documenting what we're doing (or trying to do).
Sure, writing code is the stimulating creative part. And purifying code, making it elegant, polishing the creation to a brilliant sheen is highly rewarding. By contrast, documenting what's been done is tedious, boring, a gigantic pain, a total drag. Yet if it's not adequately documented all that creative effort will inevitably amount to nothing but a reason to hate its creator.
Furthermore, is it really possible to write superb documentation for lousy code? The effort to document the work is a kind of quality check on the work.
Can't say how many times it's come back to bite me where it hurts. Looking at stuff I finished 6 months, or God forbid, 6 years ago, poorly documented programs invariably prompt a shameful recognition that I have no idea what the hell I did. At least when someone else wrote it I can justify feeling indignant at the "mess" I have to figure out. But when it's my own, that's just sad.
Of course the work we leave behind (at the end of a job, retirement, etc.) will usually be resumed by someone. Excellent documentation is our finest legacy. Consider how it supports moving a project forward when the code's author isn't there to "explain" how it works.
And if we're honest with ourselves, writing cogent documentation is hard, often much harder than writing code which is why it's such a brutal task. OTOH doing something hard is a good reason to tell oneself "A big accomplishment! Good for me!" Something to feel good about!
A rule I aim to live by: the workday isn't over until I record what I just did and the reasons for doing it the way it was done.
I find it easier to document everything when I picture myself returning to it, and not remembering how to even run it locally. I'm not a fan of extensive documentation (neither when writing nor when reading), but I try to always communicate with future self how to proceed. Strangely enough, this seems to help others too. :-) On code reviews I also push back on anything that I can't use from reading the docs. If I, as a reviewer, don't know how to use an API, what chance does the next dev have?
Sound like a solid approach. No surprise your documentation helps everyone including yourself, all of us are truly in the same boat. Now the trick is convincing other programmers to do what you do, if they did the world would indeed be a much better place. ;)
That's what the second part (code reviews) is about. "Can you explain to me how to run this? Actually, can you just write it down and put it in this README?" But yes, it is an ongoing struggle. :-)
> Instead I feel like an artist in my own universe I where I'm in charge.
But, but: it's not actually your universe, and you aren't actually in charge.
In practice, rewriting bad code and adding features "as you see fit" is roughly the same as going rogue. If the bad code works, then whether to rewrite it (and when) comes down to cost/benefit, and some engineer exercising their "liberty" is unlikely to get that right. The features some engineer decides to implement may be features that neither the organisation or its customers need; but someone will still have to be designated to maintain (or remove) them.
> If you want liberty, start your own organisation.
Over the years, I’ve discovered a that what I really value is fully grasping a problem domain, deciding what customer problems are worth solving, and then aligning a team on a common vision of appropriate systems architecture to solve those problems. As a result I’ve ended up doing exactly this: I tend to move to organizations where I am at liberty to do this while still having the role of an individual contributor.
I work within an organization resembling this, and yet it has remained profitable for multiple decades. I forever wonder when the technical naked short options[0] will get called, but they never have in a serious way. My suspicion is that the chaos is S.O.P. among peer companies, so the market has never experienced a competitor not so hobbled. It may also be that controlling chaos is so expensive it's a non-obvious competitive advantage not to do so.
[0] technical debt is when you have a plan to pay it down. A technical naked short option is when your plan is to hope the subsystem goes away before its deficiencies have catastrophic effects.
I think the underlying reason for why IT companies are so chaotic and inefficient, is because they simply don't need enough people, if they are well structured, do quality work and use simple and powerful tools/solutions.
You could run a super high valued company on just a handful of people, and society is just not ready for that. It's a case of technology advancing faster than human organisation and economic system, and the multiples are just too large, it's a big challenge for society to handle the impact.
So it's better for the organisations to be less efficient and chaotic, and to use dumbed down tools, overcomplicated solutions etc. This will fill the void created by the advancement in technology, so that the organisation can continue to function by at least resembling the traditional model.
You would need to significantly shorten the working hours in order to enable more efficient companies. Otherwise you'd get even more hyper concentration of wealth in a very short time.
> You could run a super high valued company on just a handful of people, and society is just not ready for that.
Here is the hidden truth. So much of the current information sector is just daycare for grownups. Then there's a secret nucleus of people who actually do the real work. The secret to happy employment rests on being able to determine who's actually cutting lumber versus who's just playing dress-up.
If we want to gradually move towards a system of universal basic income, maybe we could help sell it by funneling larger sections of society into IT, and just give them a bullshit job where they can fingerpaint all day to get their paycheck. Eventually you can let them stop fingerpainting and just give them the paycheck.
> The factory of the future will have only two employees, a man and a dog. The man will be there to feed the dog. The dog will be there to keep the man from touching the equipment.
> You could run a super high valued company on just a handful of people, and society is just not ready for that.
My version of this is that those handful of people become critical to the organization, which must be avoided at all costs. Every place I've been that's weighed down by huge IT staffs, undervalues tech, absolutely hates to give coders raises, and sees tech as a cost center, even while preaching how important technology is.
Sadly, this seems to be the case so often that the only companies that really succeed with good tech have builders in the founder's chair who _still code_, at least until the model is firmly established (but even after is good too). Pretty much everybody else resents software devs and thinks they're overpaid.
Weirdly a smaller, better paid team can often be resented _more_ (because of course, you can't see all the money that's being _saved_ by having a small highly-powered team).
I think badly run tech organisations still work fine. Most of the time it doesn’t matter whether your problem is solved with spaghetti and 300 hours, or 1 hour and a fantastic design, since the gains are still orders of magnitude higher than the cost.
It’s just that such an organisation is unpleasant to work for.
> I forever wonder when the technical naked short options[0] will get called, but they never have in a serious way.
What does «called» mean in this analogy? That the subsystem doesn’t go away?
So your experience is that: Stuff tends to stick around? That’s my experience. I have never encountered a throw-away MVP that was actually thrown away if the product continued in the same direction (and didn’t pivot).
In the analogy to the naked short, the option is "called" when the known deficiency causes catastrophic consequences. Example: The file system is so large it takes 60 days for the backup to finish, and thus the "call" would be a multiple drive failure requiring a full restore. Example: NFSv3 with "system" security, i.e., whatever the client says it's uid and gids are, we accept that, and thus the "call" would be a malicious client declaring arbitrary unauthenticated uid/gids. See, e.g., possible publicity stunt for the movie "hackers". These are naked shorts because there are no plans to fix them.
How long have you stuck around at a company? Maybe your company or business type is different, but I have certainly seen apps get rewritten or replaced over the course of 10 years. Some are built solely for some migration period. Some have been redesigned as part of a different system after a few years. Some major systems get rewritten after the continuous addition of integrations over 10 years demands a better architecture.
Complexity never really goes away, it can merely be transformed from one type to another. This organization has apparently decided it rather deals with the complexity of not having processes rather than the complexity of maintaining them. Which is a valid choice: look at India. A society totally chaotic to a stranger but one that has stuck around for millenia.
This is only true if the complexity comes from the underlying problem that must be solved or goal that must be achieved.
Complexity that exists for historical reasons, derived from technology or platform choices, solves a problem you don’t really have or don’t need to have, relates to the structure of the organisation, etc. etc. can of course all be made to go away completely.
In my experience a small fraction of the complexity in most organisations and software is truly, fundamentally unavoidable.
I don't know why you are bringing references to a country ? A country is not a company, running a country is vastly more complex than any organisation.
> the complexity of not having processes rather than the complexity of maintaining them
There are always processes. Sometimes they are explicit and well-understood; sometimes they are hidden, and not open to improvement because nobody knows what they are. Implicit processes tend to disadvantage newcomers, as well as reinforcing social assumptions and conventions (that tend to disadvantage those that are already disadvantaged).
Explicit processes don't have to be heavyweight and bureaucratic.
There was a good essay on the problems caused by informal/implicit processes, from about 20 years ago. I've spent 45 minutes searching for it, but I'm afraid I can't find it.
> the curse of systems thinkers is to be correct, but never valued.
"Systems" are a mixed blessing, but system thinking is almost always
good and useful. We can add value but not bask in it. As a someone
invested in the _idea_ here are a few of my favourite quotes that
illustrate:
- People don't like systems. Especially new ones.
- Systems ossify and become the problem themselves.
- The ideal system exists only in the mind of its designer.
- The ideal systems designer is invisible and can never take credit.
"I mistrust all systematizers and avoid them. The will to a system is a
lack of integrity." -- Nietzsche
"The English have a system, which is *no system*, which is also a
system, only better." - (?? British political philosopher c 1900 -
does anyone know this one?)
"A complex system that works is invariably found to have evolved
from a simple system that worked. A complex system designed from
scratch never works and cannot be patched up to make it work. You
have to start over with a working simple system" -- Gall
Overall, I think the thing is that systems are brilliant, until you
try to actually build them and encounter _people_, who have other
ideas. Neither the force of the better argument, nor punishment,
reward, bribery, or flattery will move things. This is neither the
fault of systems thinkers nor people but the misunderstanding that
(outside the immediacy of war) systems can be imposed. Working
systems evolve and are, if the individuals are mentally healthy and
motivated by good attitude, generally such that people are doing the
thing they would naturally be doing anyway were a formal system not
there.
A good system is like cat that falls off a tall building and by luck
lands on its feet in a box of wool, and licks itself as if to say -
sure I meant to do that.
> "The English have a system, which is no system, which is also a
system, only better." - (?? British political philosopher c 1900 -
does anyone know this one?)
I don't, but that quote was used in a comment [1] about five or six weeks ago, and the commenter's relevant bit was:
Nietzche said it best:
I mistrust all systematizers and avoid them. The will to a system is a lack of integrity.
Or maybe (I think Sidgwick):
The English system is "No system", Which is also a system, only better.
Ha! The reason I remembered the original comment was because, at the time, I wanted to find out who'd said it - it's such a great quote. So I did the usual DDG, then Alexandria, Google Scholar, Dogpile, then Kagi plus a few esoteric search engines and I found nothing. (But there were mentions similar to it when referring to "common law" in England.)
It's frustrating, but as far as search terms go (in the typical tf/idf indexing model), the most unique word is, um, system ... and that doesn't help.
I know only two philosophy people, and I am going to ask them - and get them to ask their friends. Yeah, I shouldn't let stuff like this bug me, but it just does :)
Not sure why you say that. Having read a great deal of both I can
confidently assure you they are deeply related. Nietzsche doesn't use
process notation or equations for calculating closed loop feedback
gain. Shannon, Wiener or Weaver don't directly address the question of
whether the cultural software of our society lies within its
institutions or individuals. Nonetheless they are both talking about
the same subject - one that is highly apropos the precarious situation
we find ourselves in today with respect to our failing social
cybernetics.
Sytems engineering, as a engineering discipline, is treating other things than Nietzsche did. Unless you go to philosophical meta questions about "systems", I have yet to encounter those in real life.
Your last quote reminds me so much of the Gaia X project. The short version of that project is: a standardized protocol of privacy preserving data exchange. But suffering from design by committee and wanting to throw in everything in existence (including blockchains, cause why not) makes it into something that on release (if ever) will just crumble under its own weight.
> The problem here is not including people in your “system”.
Absolutely. All too common. And once you do include them each is not
merely a new variable capable of assuming wildly different values, but
a whole system in itself capable of interacting with every other such
system within your system. That's why reductionists like to try
factoring them out as interchangeable cogs. Pretty much the entire
edifice of modern industrial economics since Adam Smith and Henry Ford
is built on that model simplification/efficiency.
Makes sense. I think where the interchangeable cogs model breaks down (or at least becomes less effective) is in the design/engineering space. To belabor the analogy with Ford, the designer of the Model T is not interchangeable in quite the same way as the nuts, bolts, and assembly line workers
I think cross-discipline training is really under-rated and important. For instance, as a 3.5th year civil engineering student, I’ve been taught systems engineering and project management multiple times and in multiple different contexts. These are integral to ‘physical’ engineering, but seem (to me) to be missing from software engineering. I’m dabbling in programming and software eng now and I’m constantly surprised by the lack of standardisation and the sort of ‘wild west’ approach to things. This is fine for getting things done, but in terms of liability and responsibilities (like what the post talks about), it seems that many jobs are ill-defined and poorly scoped.
Overall, I think software and ‘physical’ engineers should swap experience. Physical engineering could use a tech-injection, and software could use a ‘structure’-injection.
Electronics Engineering student here. My Semiconductor Devices Professor said it best (this is paraphrashing):
"What you gotta understand is that in the beggining of Electronics, people were pretty much trying to put two materials together they thought worked and then tried modelling it. It was pretty much trial and error, experimentation..." and then he hits me with the most "holy s*" moment of my academic life: "... much like programming and software engineering is today. You write some code, run it, see if it works. Works, ok, go ahead, make sense of it, explain in the documentation, next task".
I had NEVER thought of software like this, it just hit me like an atomic bomb in the head, I felt like I understood where in the history of software engineering we are right now. Structure is coming, slowly but surely.
It’s shocking to be able to see it this way. Thank you for sharing this insight. It’s easy to see programming as something uber-sophisticated, because in most senses it is.
The reality is that we’ve been programming (we, the wider public) for a few decades of our millions of years of consciousness. Of course it doesn’t have the scientific rigour of something we’ve been doing for millennia, like construction.
With that said though, I think programming and computers in general is the closest we will ever come to ‘magic’ and wizardry. The fact that we can now effect reality with a few keystrokes is magical.
Before clicking, I thought the article is about someone who only thinks in "system" and not able to deliver concrete things. How wrong I was.
"Give yourself permission to let the organisation fail".
I agree. As someone is similar situation, the job is to let the "design of system" be heard, be debated and be implemented once green-light. You cannot convince the decision making body (be it the CEO, management, or a design committee) that your idea is the right one for 3 reasons IMO
- idea could be wrong
- People won't know what you are talking about unless they had the first hand experience. (A la you don't know what it is like to be a bat)
- Your meritocracy is limited to a small group of decision making body
If you see the "system" broad enough, you will see a market. It is essentially preaching your "system" to a wider audience. Your system maybe wrong, for which your start up will die or you devise another system. Or you are reaching whole bunch of people who understands your (the initial niche), and your living does not depend on a single decision body but a market.
Convincing one body is hard, but broadcasting whatever you believe, some will respond eventually. This is why start up is great.
>Steve recommended investigating Systems Engineering as a distinct subject. Specifically, reading the engineering histories of the Gemini and Apollo projects, and especially about the culture clash between the experimental aircraft guys who built Mercury, and the ICBM teams.
Not a book, but MIT OCW has a class on Aircraft Systems Engineering 16.885J (focusing on the Space Shuttle) that is an excellent example of this. The class starts by looking at the requirements for the system, then examines every single subsystem on the Shuttle, and explains, in detail, why it was built the way it was. Almost always the answer is "because of the constraints placed on the system by the rest of the vehicle" + it has to be as light as possible.
I initially watched the lectures because of an interest in aerospace and it is a fascinating historical series in its own right with some incredible speakers. The lessons for systems engineers are numerous too.
Wow, this whole thing sounds exactly like me. I'm definitely at the stage where I've given the organization permission to fail and am just drudging on now.
I mean, the system I started working on 2 years ago was using a business text field for processing decisions. That caused changes to the system when the business wants to make changes. If they do make changes, the reporting queries have to be modified to look for all the historical versions of that text for audit reasons. If the team asking that change forgets to tell one of the numerous other systems that also uses that field, then there are errors. I proposed we add a field with a code that represents this field so that the business can change the display text without affecting the systems that currently use it. It's been at least 18 months, and nothing has changed. You would think that this is a basic design best practice that should have been implemented from the beginning...
This rang so true. I recently left a large multi-national aerospace company where I and my team had developed local processes that put systems engineering first.
Unfortunately our main contract was developing a component of a larger system being developed by our parent organisation who didn’t have a concept of systems engineering. We tried for years to educate them and I watched aghast as their program costs and schedule continued to spiral out of control.
Basically a bomb burst of engineers all doing what they thought was the right thing but no one owning the system design and saying no to good ideas.
In the words of their chief engineer “its like 10 different black boxes, I don’t know what I’m getting and I don’t know when it’ll be finished!”
There’s much to dive into here — ty — from Steve’s “Systems … where it is all about interfaces and trade-offs” to points you make throughout supporting his conclusion
> we define a stable interface
> reproduce behaviour reliably
> Predictability is great
Within being believed and valued for protecting engineers and organizations, so much progress applying these principles relies on individuals’ and teams’ readiness to adopt and advocate for them. Hope to see your experience with that in part II.
[My colleague] would appear to see time spent in planning and writing documents as essentially wasted time, since he asserts that without process we are faster. We are not faster in reality. We are merely faster to say we're finished. That is not the same thing as actually being finished: we can all think of examples of that. And the uncertainty induced by the unknown magnitude of correction required is, IMHO, the biggest contribution to our inefficiency and ineffectiveness.
I've spent my entire career working for a medium sized organization and in the past few years it has tried to become "agile". Most of this push is predicated on the idea that we will be able to go faster this way. As a result we have deconstructed ourselves. What's weird is that now we aren't actually even "faster to say we are finished." No, now we are only faster to say that we are going to be faster to be finished. We still end up taking a long time but as long as the slide deck says we are going to be done in a short amount of time, then all is believed to be running smoothly. It's very strange/depressing/unsettling.
Systems thinking is great but he way we reason about organizations and systems in a pretty medieval way.
Most systems are described in a simple way: box and line charts which are just voodoo.
We need to develop a algebraic notation for how to represent the various configurations/states of an organization, along with its operations.
With a notation, you can describe what you “feel” and share the knowledge and improve on it. You can also define the organizations desired state and track your progress. You can plop a new manager in it and they will know what to do. You can also do basic engineering like pick a configuration that has the desired cost, throughout and latency that you need.
The author is Irish, and as an anglo-saxon who has lived in Sweden for the last 17 years, I can relate. We get more systems engineers in the Anglo-saxon world because in part because our education at university tends to be more generalist and also in part because of our more hierarchical organization does not involve everybody in decision making, so people at the bottom let their "mother hen" boss take more responsibility. In strong engineering cultures like Sweden and Germany, we tend to have more specialization and consensus building. There will always be systems engineers, but they are more prevalent in some cultures (anglo-saxon).
> so people at the bottom let their "mother hen" boss take more responsibility. In strong engineering cultures like Sweden and Germany
I'm not sure if I understood it correctly or there was a typo, I am surprised because my experience is almost opposite.
my experience with Swedish and German multinationals is that it's all about consensus, to a point where the best decisions would be rejected if it lacks support, or where the off-ramp would be no decision (which is also a decision). this I found in stark contrast to French, Italian, Spanish, UK/US/Australian organizations, that tend to value the ego driven hero who saves the day. Also these former locations seem to do better when dealing with chaos as they know how to "think on their feed".
I'm curious from your experience would these companies be start-ups or medium sized firms, or could there be other reasons I'm missing that our experience is so different?
The most successful companies I have ever worked for hired smart developers and got out of the way. The least successful company I have worked for spent a huge amount of time in meetings, planning and documenting everything, while failing in the market place against competitors that didn’t. The more structure you impose, the less competitive you are. It is OK if you are a monopoly (NASA) but if not then be careful what you wish for.
Thought it was a post arguing AGAINST Systems Thinkers. Was slighlty provoked and curious to read it, since I’m a fan of systems thinking. Found out it was arguing FOR Systems Thinkers…
consider a slow-growth company free of a title system (e.g. where every engineer is just a "member of engineering"). the less-credentialed nature means you don't have to be as specialized to float ideas. the slow-growth nature allows the company to prioritize process over chasing chaotic payoffs to stay afloat. YMMV and i'm not sure all slow-growth, titleless engineering organizations turn out this way, but there's some correlation.
You mean I can get another if I break it, so a clay cup trumps a grail? The attribution is to Mirza Asadullah Khan Ghalib, classical Urdu and Persian poet from the Mughal Empire. I can't see the same meme in the bible, although there does seem to be passing mention of clay there somewhere. No doubt all the ancient cultures valued pottery as it was such an amazing invention!
On the flip side: I am not saying this is the case with the author, but in my experience a great number of believers in systems and processes are people who struggle to solve any problem, and therefore want to have a system where most of the problems will be solved by the system. But that's not really possible, since if you struggle to solve a problem in any way, you will not be able to create a problem solving system, which is a next level problem. Systems thinking is a good idea, but at the end of the day, things need to be done by the people and this can be messy and chaotic, and people are different so making a system that works for everybody is impossible.
This landed so truly for me, it felt like a punch in the stomach.
I wouldn’t dare count the number of times I’ve been told the technical details of why something is the way it is, without anyone ever saying the reason why we actually wanted it to be this way. My thesis was usually: we don’t.
In my career I feel like I have seen hundreds of examples of me saying the systems equivalent of “lets put the dining table indoors?” to be told that the dining table is outside because the original budget meant the front door could only be yay wide so we had to leave the table in the yard and put a tent over it. And I’m just left standing there agape at how we eat in a cold wet tent every night instead of fixing it.
Except it’s usually more like: why do we have to spend $9k on a commercial dishwasher repair contract? Because we have a commercial dishwasher … to get the rust off the silverware … because we eat outdoors every night … because the front door was too small to get the dining table in the house.
Somehow, when the real examples of this stuff are clever engineering around build / docker / polyrepo / release / feature flags / third party bugs, the cleverness makes people think the existence of the workaround should be tolerated. It’s infuriating to join a new team held hostage by years and years of band aids because they never suffer the bigger picture consequences.
The whole article was fantastic. I hope the author has the engineering leadership role they deserve. We need more people like this.