Hacker News new | past | comments | ask | show | jobs | submit login
Everyone is still terrible at creating software at scale (margint.blog)
546 points by caspii on April 10, 2021 | hide | past | favorite | 379 comments



Sometimes I feel like there are two worlds in IT: the one you encounter on a daily basis, and the one you read about on HN. The only thing consistently "terrible" about both of them is the often expressed lack of humility towards solutions you don't "like", or more often, actually, solutions you don't understand.

Gottfried Leibniz suffered no small amount of flak for his conclusion that the reason the world looks the way it does is because it's already "the best of all possible worlds": you may think you see an improvement, but you lack divine understanding and therefore you don't see how it would actually make things worse.

While this is in my opinion a silly idea, I think most people who work in tech could use a bit of humility influenced by this line of thought: if a solution exists today, and it has yet to be replaced, it's at least possible that it's over all the best solution, and the reason you're not seeing that is because of a lack of understanding.

Of course this is not always true, but if nothing else it would lead to more interesting discussions than those stemming from someone saying "if you're not using Kubernetes you're a moron" and another replying "learn how to sysadmin and you'll realize you don't need Kubernetes in the first place".


You do have an excellent point for part of the cases.

We should IMO also entertain one more possibility: namely pressure from the top.

A lot of people who I knew as sympathetic and calm before they took management roles turned into something I could code in one minute: namely a program that asks "how much is this going to take?" and if your answer is above N hours/days then they say "no, we're not doing it". And that's not because they are stupid or suddenly non-sympathetic. It's because their bosses optimize for those metrics and measure them (and their salary, and their bonuses, and even their vacation days) by the same metrics.

Perverted incentives play a big part in the current state of things where everyone just looks baffled at thing X and is like "why in the holy frak isn't this working better?".

Truth is, MANY, and I mean really A LOT of people know that their work is sub-optimal but pressure from above prevents them from making it better.

And people are not independent -- for most people outside IT and other prestige / high-skill professions, standing up to the boss means 3 to 9 months of unemployment and maybe no food on the table. Hence everybody learns to suck it up and do the bare minimum that keeps them employed, perpetuating the cycle.

Multiply that to at least 90% of our world and it's IMO quite easy to understand why it is like it is.


Some of what you say about management is true. But if you really want to fully understand then you need to keep following that trail of incentives. Your bosses boss doesn't optimize for those metrics "just because". There are reasons and persuading them requires understanding those reasons. A company has only so much revenue. Wasting that revenue in the best case reduces profits. In the worst case it kills the company. Changing a perverse incentive requires understanding whether it is actually perverse or not and then proposing a compelling alternative that results in better results for the company.


I am sure that what you say is very true for not a small amount of businesses.

My issue is mostly with my profession -- programming. Sometimes we are so micromanaged and our time is so sliced and diced that all of this short-term thinking easily gets 5x more expensive on a short time frame anywhere from 3 to 6 months.

This has been observed numerous times yet IT managers seem to want to never learn.

It's OK to try and save money. But saving money short-term in order to have a huge expense down the line is just being irresponsible from those managers.


Having led a team for about a year now let me tell you it's not easy. You are always trying to balance many priorities and even with over a decade of coding experience, it's not always obvious which refactorings or tech debt stories or architectural changes are worth prioritizing now, which can wait till later, and which are engineer pipe dreams that I need to gently but firmly kill as quickly as possible.

I have had similar managers to what was described above, in those types of organizations. It may be a lost cause but it may be an opportunity to work on your influence skills; being able to explain and sell your solution is perhaps even more important than coming up with it in the first place.


This is really difficult. Perhaps the best thing is to track the proposed technical debt stories, and do not discard them when they are rejected. Instead, also keep track of how much pain is hit that can be associated back to that particular technical debt.

3 years into major project, my team finally got prioritized to work on and improvement that will take us a couple weeks, that we were not able to work on previously because we could not justify its value. But for the past 3 years the organization has lived with pain that this would have resolved. So now we can justify it. And the thing is, it's not like we didn't know 3 years ago when we said we wanted to make this happen but things are going to be terrible if we were not able to prioritize it but in many organizations it's not good enough to just say how bad things will be. People need to actually have felt the pain.

Honestly, I think that's okay on its face. My bigger issue is that frequently there is no or very little trust around engineers who predict what the major problems will be an advocate for them and then it's not like when they're proven right management now listens to them the next time. I think that's the real problem.


> My bigger issue is that frequently there is no or very little trust around engineers

This is a problem. But at any given time management hears programmers arguing for both sides - the "if you're not using Kubernetes you're a moron" and the "learn how to sysadmin and you'll realize you don't need Kubernetes in the first place" (quotes from the top of the thread). In many cases folks in senior management do not have any idea of these things, and they rely on one of these two opinions, resulting in the other half claiming here on HN that management is all bull.

What the profession lacks is a code of practices - like a Joel Test accepted by every accredited engineer.

I am not sure that is net positive for such a rapidly evolving space. Even now people are debating if unit tests are needed and other basic topics. The only "settled" thing about the profession seems to be that gotos are bad.


I find that the best senior engineers don't say "if you're not using Kubernetes you're a moron" and "learn how to sysadmin and you'll realize you don't need Kubernetes in the first place", but instead say "Kubernetes is fantastic for this use case because of these reasons..." and "sysadmin would be easier to use than Kubernetes for this problem because of these reasons..."

I'm not sure how one gets software engineers to justify and back up their decisions like this on a regular basis, but I suspect it would help management trust their engineers more.


> I'm not sure how one gets software engineers to justify and back up their decisions like this on a regular basis, but I suspect it would help management trust their engineers more.

More balanced senior engineers (I know anecdotally that there are lots of them) speaking up and challenging their fellow-engineers to justify their extreme positions will be a good start IMO.


(Bad) Management chooses to listen to engineers that say agreeable things, not true things.


If your manager listens to the engineer who says "Kubernetes is awesome and does all the things!" and doesn't listen to the engineer who says "I think Kubernetes is a bad idea for this project because x, y, and z", then yeah, it's time to find a new manager. Good managers listen to good advisors.


Anecdotally, we had the opportunity to get better gear for developers than an outdated single 24” 1080p screen and slow computer. Higher management invited a few engineers to share Ygritte ultimate setup. These varied widely between one large ultra widescreen and three small screens. The developers in the meeting started arguing vehemently in the meeting and in the end they couldn’t agree and all developers were left with the same stuff. If you were enterprising enough you could gather two of those 1080p screens until someone from workplace it saw that and installed to desk to spec.


It was a mistake to give an option to discuss this to start with. It's always if not this than that, x brand is better than y,etc. What matters: is it a Windows/Linux/Mac shop?

No matter what brand,the base unit comes with enough RAM, processing power and spacious SSD.

Everybody gets the same brand,as it's easier to support from the IT perspective.

Keyboard,mouse,monitors should be an individual choice. Some want to ruin their vertebrae by staring at a tiny 13" screen and be happy about it,while some need 3 monitors and 2 keyboards to feel the power,why not.

Not sure why some companies are so tight when it comes to the equipment for developers,as it's always a small fraction of the overall cost of the position.


That is copy-book Sayre's Law: https://en.wikipedia.org/wiki/Sayre%27s_law


This is definitely a very difficult spaghetti to untangle, agreed. But IMO a very good start would be to never accept such polar statements in the first place.

"If you use extremist and simplistic generalizations you are not worth listening to in a discussion" would be the first policy I'd put into a team.

Balanced arguments with pro/con analysis is what makes an engineer senior and a contributor to an argument. Everybody else shouldn't participate the next time around.


Agreed. But the balanced people are the least vocal. As Bertrand Russel noted:

  "The whole problem with the world is that fools and 
  fanatics are always so certain of themselves, and 
  wiser people so full of doubts."
I am someone who moved from a programer role into management. My appeal to the balanced senior engineers is to speak up more. We don't have to swing as much as economists, enough to frustrate President Truman to say:

  "Give me a one-handed Economist. All my economists 
  say 'on hand...', then 'but on the other..."
But we can bring more nuance into discussions - especially calling out those among us with polar positions.


As a manager, you need to find a way to pull information from people: with some it's a team meeting,with others it's over a meal during the lunch,etc. I even do this sometimes during the management meetings,when I see people clearly not quite happy about something but reluctant to speak,so I ask them directly what's on their mind and that's often enough for them to tell way more than you could have gotten by just expecting for them to speak up.


All of that is sadly true and it's a fundamental problem with humanity. I'd say that the only solution I am seeing is that the managers must proactively shut down the extremists early and hard.

If you demonstrate that you will not submit to extremists, I've found that a good chunk of the more balanced people are then encouraged to speak up.

---

TL;DR: Silence the bad and loud voices, and the wiser and quieter voices will start speaking.


> at any given time management hears programmers arguing for both sides

I think it was Andrew Grove (of Intel / High Output Management) who boiled this down to the following algorithm:

(1) Invite all sides to a meeting. (2) Ask them what their decision is. (3a) If they agree, implement the decision. (3b) If they disagree, tell them to discuss it, then give you their decision at the next meeting. GOTO (1)

His point being that managers are the least informed (by virtual of organizational pyramids), and therefore the least effective at picking solutions. Build a team that can arrive at consensus, then trust the team.


In these cases wouldn't it have been to work on it slowly as a "side project" outside of the team normal priorities? I always tell my team if they can fit it in and not hinder the current sprint by all means go ahead and do it on the side.

I know this goes against the normal and people want to attach "right now " value (hours it took reflected on check) but going beyond the call can also give you leverage when your yearly review comes up for example.


I'll also add I only have so much energy in the day, and after a couple of years of fighting against problems I coudn't do anything about and managers who couldn't make the changes, and customers that wanted it now, I burnt out, and instead of spending a useless zoom meeting to work on an improvement I use it instead to make maps for the D&D campaign I am running.


While I've seen several examples of this being the only way to get out of never prioritized debt-cleanup that the team can't bother explaining to management in non-technical business-value terms. The reality is a sprint is always over-packed with stuff to do with zero room for side-projects and expecting people to work weekends on side-projects to solve company business needs is not exactly a sound strategy.

What one can do is is deliberately allot time to the team to work on any side-project as long as they are somehow attached to the business. There are ways to do this more organized, for example innovation-weeks or the famous Google-fridays.


That can work assuming the team isn't already overloaded. But when people are being asked to work 60h a week or more to accomplish their tasks, it becomes untenable.


That's exactly what I am doing and aside from a few wrist slaps people have been okay with such an approach. I always tell them I did this for a few hours during the weekend or while I was waiting our DevOps to resolve an outage these particular 2 hours the last Friday. Seems to work really well.


> People need to actually have felt the pain.

Just like those that never have been hacked dont care about security. Or those that have never lost data dont care about backups. Or those who has never had more then 10 simultaneous users dont care about performance. Basically we do stuff in order to avoid pain.


I wholeheartedly agree that we the engineers must work on our non-tech talking skills!

I've been working on it for a few years now and it's REALLY difficult but even the small wins feel very rewarding and are motivating me to refine my comm skills further.


Changing a perverse incentive requires understanding whether it is actually perverse or not and then proposing a compelling alternative that results in better results for the company.

While that's true, considering those incentives and changing them if necessary is part of my boss's job, not mine. My job is only to report the misaligned incentives I perceive to my boss. If my boss isn't able to make me understand his decisions, that's my cue to find a new boss. And I concede, those bosses are rare. I've only had two in twenty years of employment.

It takes time to recognize the qualities you need in a boss to perform well. I know myself well enough in that department to actively look for those qualities. Even to the point where I prefer a less challenging job under the right boss than the perfect job under the wrong leadership.


I agree that often there are good business reasons for killing projects, and plenty of so-called technical debt is actually just the result of pragmatic decision-making. Imo developers should get used to maintaining that stuff, because that's a huge part of the job in most profitable companies.

But there is also a flip-side where I see the opposite happen - especially in startups and VC-funded companies. These companies can take massive losses while at the same time investing huge amounts into elaborate solutions that appear to primarily exist for the purpose of creating impressive bullet points on the resumes of careerist employees. In these companies, even in management, the work seems to be driven by everyone trying to maximize the impact of their own, personal "tour of duty", then making a profitable exit. These incentives are not the same as those of people working at more legacy/boring companies.


> These companies can take massive losses while at the same time investing huge amounts into elaborate solutions that appear to primarily exist for the purpose of creating impressive bullet points on the resumes of careerist employees.

Yep, that's one of the things I had in mind when I said "perverse incentives".


It also takes time to turn a big ship, and by the time one realizes one needs to make the turn it may be too late.

To extend this to non-tech examples, failing big companies, like Sears, do not disappear overnight; the decline was in full view for over a decade but ultimately the ship could not be turned. With tech the problem may be even worse; Uber still has not turned a net profit, but with tech investors like to throw money at losing businesses. Some turn around, but that's not guaranteed.


> Your bosses boss doesn't optimize for those metrics "just because".

Your bosses optimize for those metrics because it's the one they can communicate higher up on the short time they can get higher ups attention.

If it were because the metric was any good, there wouldn't be so many cases of companies bankrupted by technical and cultural debits pilling up.


> Your bosses optimize for those metrics because it's the one they can communicate higher up on the short time they can get higher ups attention.

That's their problem and not mine, is it?

A good manager doesn't just reflect the lightning bolt to their underlings. They absorb part of the shock and process / transform what they will transfer down below. If they aren't doing that then they aren't managers; they are uneducated opportunists in suits.

> If it were because the metric was any good, there wouldn't be so many cases of companies bankrupted by technical and cultural debits pilling up.

Really well articulated, kudos.


There's an implicit assumption here that everyone is "doing a bad job" because of "perverse incentives".

But I'd argue that most 'computer nerds' have pretty deeply seated perfectionism flaws.

To take a simplified example, we tend to want perfect security, and like to prove that we're smarter than each other by understanding complicated attacks.

When it comes to running a business, though, a 10 person startup cannot afford to act like their threat model includes defending from hostile nation state actors willing to do Stuxnet-levels of effort to hack them. If they do that much work on security their product will never get off the ground.

The goal of most companies when it comes to security is never to have perfect security, but to have good enough security to stay out of the headlines.

It is like tradeoffs that have evolved in nature. You don't see organisms that maximize one feature at the expense of everything else. That energy always needs to go into competing concerns.

So sometimes as technical people we fetishize the perfect solution to some problem in the Enterprise, when actually doing that solution would be horrible for the Enterprise due to the costs.

(Of course the flip side is sometimes management only cares about window dressing the company so they can bailout with some golden parachutes and that has nothing to do with running a healthy company -- but not every example of a company not being perfect is due to deeply cynical and perverse incentives like that, but may be entirely rational).


> But I'd argue that most 'computer nerds' have pretty deeply seated perfectionism flaws.

My goal here absolutely was not to turn this sub-thread into a ping-pong of "but managers bad at X" and "but but but programmers bad at Y".

I am very aware of the nuance you introduced and I've been guilty of being a tech perfectionist much more times that I'd be comfortable with sharing. But finally I learned.

(Granted, many haven't and maybe never will. In this regard you do have a solid point.)

> So sometimes as technical people we fetishize the perfect solution to some problem in the Enterprise, when actually doing that solution would be horrible for the Enterprise due to the costs.

Absolutely. Not denying it. It's just that what gets me is the reaction to some of the engineers' flaws is to go to the other extreme.

Which is just as bad state of things. Both extremes are unproductive.


"Hence everybody learns to suck it up and do the bare minimum that keeps them employed, perpetuating the cycle."

In my point of view, that's how you lose solutions to stay in the comfort zone, and I'm not saying the people whom actually "just obey orders" but those who make them.

As I gain more experience, with age and some wisdom (I hope), I see that most top-down decisions are actually the worst way to decide things, people who are "at the bottom" maybe not understand how a entire system works however they do really knows (and suffer) on how their work is sub-optimal.


"Top-down" has become an epithet that disgruntled engineers throw around, but the reality is that there are problems at all layers of an org, and they should be solved at the appropriate level.

Trying to solve every problem top-down manifests as micro-management that beats the morale and initiative out of strong ICs. But on the other hand, trying to address everything bottoms-up leads to local maxima, no standardization across tech or product, and warring factions within the company.

There is real art to this. Being able to grok the incentives and structures of a vertical slice through the org (C-level, VP, Director, Mgr, IC) as well as the tech (Brand, Product, Pages, Components, APIs, Middleware, Storage, Infra) is a rare and extremely valuable talent that allows solving problems that are impenetrable by any traditionally defined role.


Everything you said is true. And it still doesn't change the fact that most organizations [I've been in] utilize the top-down approach on 99% with the occasional 1% of bottom-up.

That's an unbalanced state of affairs that completely loses very valuable feedback everywhere along the chain.

Part of the job of a manager is to find this perfect balance on the level they are working at. And most don't. They just pass the pressure put on them, down below to their underlings.


That's evidently correct, and all the empiric results point into correctness too. As always I recommend reading Deming. Or go directly to the Toyota production system. The best working system is one where management sets the goals and workers decide the means, while one trusting each other to do their job.

https://en.wikipedia.org/wiki/Toyota_Production_System

But somehow, management never accepts this reality. Even the Japanese auto industry is regressing into Taylorism.


There may not be that much difference between Lean (Toyota) and Taylorism. Don't forget that Toyota (Toyoda) is not that far from Fordism - "you get whatever color you want as long as it is black". Employees get to tweak the process but the fundamental model stays the same. They don't get to change the assembly line, nor the parts, nor what is assembled. And Toyota guaranteed life-long employment when it was implemented.

I've worked the line twice - first time as a teen looking for any work, the second time as an undercover analyst hired to examine why Lean was failing at a plant, both doing the same type of job: electrical component assembly. I got physical injuries (minor) both times. The majority of the assembly line workers had repetitive stress injuries, including losing fine motor skills and wrist control due to use of impact wrenches and exposure to bad ergonomics. If a human delaminates, you can't glue them back.

Although rational (Taylorism, Management by Objectives, MBO, OKR, Milestones, management by goals ) vs. normative (Lean / Toyoda, human-centric, Agile, humanist, culture ) division of management styles/rhetoric has been proposed, they should perhaps be seen through the prism of Creative Destriction (not disruption, and surprisingly, CD comes from Marx).

In a vulgar approximation, imagine capital-backed entities served by the manager class, which are stressed by short deadlines and performance-tied bonuses and careers. They are constantly shifting basic approximations of solutions and perspectives, without challenging the system.

Give employees more autonomy, and output and managerial control suffer. Instill more rigid control, and risk losing the ability to innovate and "knowledge producing" (Drucker of MBO fame) talent such as programmers.

In an antagonistic system, whatever you come up with has its own roots of destruction on top of built-in conflict (Marx again). Hence the back-and-forth re-framing between "rational" (Taylorism) and normative "culture, baybe, Holocracy!" so fast that today it is a zombie that has both and neither.

You end up with projectification, corporate "culture" that demands the soul and meeting arbitrary "milestones" from employees, but the corporation itself can only guarantee arbitrary termination.

There is a great and influential analysis "Design and Devotion: Surges of Rational and Normative Ideologies of Control in Managerial Discourse" by Barley and Kunda. I would suggest reading "Management fashion" by Abrahamson as well. Both can be found through G.scholar, including no-registration copies.

https://www.jstor.org/stable/2393449 https://journals.aom.org/doi/abs/10.5465/amr.1996.9602161572


Sorry, but if all you got from Toyotism was Lean, than you got somewhere already fully geared into Taylorism. There is much more to it, and yeah, even one of the tenets you will get from Deming is that short deadlines and performance-tied bonus and careers can't work. He also goes a great length about empowering employees to change things like part sourcing, but yeah, not the final product.

(I don't know about worker health problems, and I imagine that poor as Japan was at the time, any empiric data on it would be useless. But, anyway, I don't think the management structure is the correct place to fix that one problem.)

And I'll have to break into your Marxist narrative, but all that was once incredibly successful in a fully capitalist framework. It never became non-competitive, it was only replaced by order of top-down command structures.


I would appreciate an expansion on the "break into".

The approach I outline can explain and predict why the two Ts rotate, including at Toyota (Toyota going Taylor, or rather normative - rational).

I prefer not to base my experience on what Deming or anyone else wrote, but how their writing relates to real-life practice (though of course as a critic of Taylorism as represented by MBO and other Performance Evaluation rational schools, he is a welcome breath of fresh air).

Have you observed, preferably both as a participant and independent, Deming idealism in a functional real-life system? I have only seen a shift towards projectification and cultural hegemony.


What breaks the narrative is that the system is clearly not antagonistic. Giving the correct kind of autonomy to the workers increases every single item that management should care about, at least in a capitalist system. And both logic and evidence point that way.

> Have you observed, preferably both as a participant and independent, Deming idealism in a functional real-life system?

As a participant, never. As an independent, we are talking about the mentor of the movement that teached the Japanese management how to go from an industry that didn't produce anything good into the highly developed one that outcompeted everybody all ever the world.


> What breaks the narrative is that the system is clearly not antagonistic

So why the shift to Taylorism at Toyota, the heart of the Toyota system? Why the shift to projectification from life-long employment at Japan? How can one explain chronic injury if workers get to plan and improve the workflow (or are wasted humans not considered waste)?

And the Toyoda principles at the heart of Toyota predate Deming. (Ohno, Taiichi (March 1988), Just-In-Time For Today and Tomorrow, Productivity Press, ISBN 978-0-915299-20-1 )

The list of paradoxes can continue indefinitely, or one can elect to look for the root cause.


> from Deming is that short deadlines and performance-tied bonus and careers can't work.

well, for the performance pay, Deming is more subtle: its not that it doesn’t work, its that the performance must be something meaningfully under the employees control for performance pay based on it to work (otherwise it can’t influence anything), and it must be a holistic measure of company value (otherwise you get perverse incentives around microoptimizations), and your measure of performance must take into account the degree of uncontrolled variation in the process (otherwise you give false negative and positive signals due to random variation rather than real problems/improvements).


Pushing down the decision making is important for that reason. The “makers” know best most if not all of the time. The manager is there for support and for facilitating the learning. E.g. With decisions we make mistakes. As a manager it’s important to ensure that the team reviews past decisions and learns from them to improve the decision making process.


> A lot of people who I knew as sympathetic and calm before they took management roles turned into something I could code in one minute: namely a program that asks "how much is this going to take?" and if your answer is above N hours/days then they say "no, we're not doing it". And that's not because they are stupid or suddenly non-sympathetic. It's because their bosses optimize for those metrics and measure them (and their salary, and their bonuses, and even their vacation days) by the same metrics.

I had this exact same thought recently when reflecting on my behavior in my new role as a "technical product owner". All of it was reflexive, as if I suddenly forgot all of the software engineering knowledge I accumulated over the years and became a deadline-driven cog.

I don't have a solution yet; I think it comes down to that I don't yet speak the same language that people I report to do, and thus I feel like I can't defend my position well enough. It comes with experience, I guess!


Part of what makes you an enabler of programmers -- and not just a guy who screams "faster! harder!" -- is to be able to push back a little.

There's sadly an inherent conflict of interest between the people doing the job on the ground (programmers) and the rulers (managers).

Your job is to find the middle ground -- not to completely side with your higher-ups.

That would be a good starting point for you IMO.


I’ve been reading these Dilbert rants for twenty five years. I probably made some of them on slashdot.

We as an industry tried the all engineers thing and it didn’t work.

Have you considered the possibility that the boss’ bosses have a point? That the intelligent people you once respected have been convinced rather than intimidated?


Have you considered the possibility that the boss’ bosses have a point?

It's the organization that shapes the individual, not the other way around. Changing a cog in an existing gear system only works if the cog is of a particular shape, and it will get chafed and dented if not sufficiently so. If the cog is too different, it will break and be replaced again.

Occam's razor says that if many different people fail within the same organizational structure, the problem is not with those people.


That's just begging the question, though, since the comment you replied to was essentially asking whether the promoted SEs are actually unthinking cogs, vs people who've been convinced by their new perspective.


But my point was that that's a false dichotomy: where is the option that they were thinking cogs that found themselves unable to change the shape of the gear system around them?


I would be ashamed to call myself a thinking human if I never considered it.

And as said in other comments, for some companies and businesses they are right indeed.

But not in programming / IT in general. At least not where I worked (and where many of my former colleagues worked).

It's all optimizing for short-term and never the long-term. And I have seen people playing the long game gaining a lot so I also know I am not imagining it.


Predictability in the short term has a definite business value in and of itself, regardless if it makes development slower - or perhaps, a value to management. Certainly the degree matters.


You know, I am actually VERY okay with short-term thinking but not when it's 100% of all decision making.

Make it 75% short-term objective and give the other 25% to longer-term planning and/or simply playing with ideas and I'll be very happy to work in such a place.

I am not a fanatic -- and most programmers are actually quite understanding (if a bit too timid sometimes) people.

We just want the pendulum to not be on one of the sides 100% of the time. That's it really.


> We as an industry tried the all engineers thing and it didn’t work

Example of when this was tried?



I think OPs point applies to organizational management as much as programming:

> if a solution exists today, and it has yet to be replaced, it's at least possible that it's over all the best solution, and the reason you're not seeing that is because of a lack of understanding.


This. It's also really difficult to build a mastery of things when you're constantly moving from one piece of code to another according to shifting priorities.

And once you build mastery of $fwk/$lib/$tool, it's deprecated and you migrate to another one. GOTO START


> It's also really difficult to build a mastery of things when you're constantly moving from one piece of code to another

Conversely, it's surprisingly difficult to get engineers to master things they aren't intrinsically interested in. We've had a stable tech stack and core architecture for 4+ years and one of my major challenges is to stop employees constantly trying to introduce $new/$shiny or major refactor X because it wasn't built how they personally would have done it and get them to just learn and understand the stack we already have - which addresses all our needs at most 5% less optimally than $new/$shiny.


When someone’s future salary is tied to acquiring $new/$shiny skill then you are going to get this sort of rational behaviour from the engineers. Smart people respond to incentives.


This is very logical because engineers know they can be fired any minute with zero recourse. Inevitably, they want to invest part of their current work time into being able to to apply for the next job.

If people have a good social safety net and a good job security, the depth of their skills improves by a lot.


You make a really good point about the relationship between job security and motivation and commitment. In my case we actually do employ people under medium to long term contracts, so this is less of a factor. But I think the phenonemon you mention is widespread and endemic in the IT industry which definitely contributes to the technology fetishism and resume building type dysfunctional behaviours that plague us.


Sadly yes. I really want to... stand still for a bit and learn things properly, you know?


If the manager is a former engineer, they might just be following a rule of thumb, that bigger tasks are impossible to manage, and above a certain size threshold, carry an existential risk for the business.


I’m not sure I’d call the incentives perverted (only perverted in the sense that “perfect” software is not the goal).

Management and owners are a far from perfect group that make plenty of bad strategic decisions for a host of reasons, but they also have to deal with the real/perceived constraints of time, budget, market, competition, etc. that are easy to not appreciate when you are focused on the design/build portion of the business.


It reminds me of that Instagram thing, you look on Instagram and see beautiful people living perfect lifes and feel completely inadequate in comparison.

Then you look a little deeper and realise most of them are pretending, they're photoshopping their selfies, renting their expensive cars for a day and taking their selfies on other peoples property.

HN can be a bit like that, you can feel completely inadequate compared to these super-geniuses doing everything perfectly, when really you're doing ok.


There’s a bell curve of software engineering ability in the world. There’s a few super geniuses, a bunch of smart cookies and a whole lot of people in the middle.

HN attracts a broad range of ability, but generally upvotes the smartest content. So if you sample from “upvoted HN articles and comments”, the content you read is heavily biased toward things written by unusually smart people. But there’s a range.

By contrast, most companies will attract and hire people like their current employees. So the variance at any particular company is much smaller than in the general pool. Everyone ends up working alongside people like them.

So if you work at a big bank or something, you probably don’t interact with many people like the super geniuses who write interesting articles that get upvoted a lot on HN. They’re over represented amongst HN content, but most programmers are average and getting by just fine. And remember, hotshots are equally ungrounded. Lots of good engineers you meet at Google were hired at straight out of collage. They have no idea how the sausage is made anywhere else. Lots of folks here don’t have your experience. They have no idea what it takes to do software at a company that sees their job as a cost centre. Or what it takes to get good work done in spite of political pressure to stagnate.

Having a variety of voices gives all of us something to learn and something to teach.


> There’s a bell curve of software engineering ability in the world. There’s a few super geniuses, a bunch of smart cookies and a whole lot of people in the middle.

We see the effects of that throughout our industry and yet it's never publicly acknowledged.

> And remember, hotshots are equally ungrounded. Lots of good engineers you meet at Google were hired at straight out of college. They have no idea how the sausage is made anywhere else.

One of the most eye opening experience one can do is 3x the compensation for a software role and see what caliber of engineers it attracts. I remember one company who did that for college hires and they were shocked. They had been whining for a while about the "talent shortage" and that it was hard finding folks in the local market. Turns out the folks they really wanted to hire were there all along, just invisible.


> HN attracts a broad range of ability, but generally upvotes the smartest content. So if you sample from “upvoted HN articles and comments”, the content you read is heavily biased toward things written by unusually smart people. But there’s a range.

Does HN really upvote the “smartest” content? Citation needed, please.


Indeed. HN is an echo chamber like most things of its ilk. It's just like the Emmy (etc.) awards; actors praising actors for acting; the rest of the world is largely orthogonal.

The amount of pretense and smug here isn't any less about its domain than any other self-selected community.


I'm also skeptical about that. It depends on whether or not the smartest content aligns with people's financial interests. Sometimes the smartest content goes against most people's financial interests so it is heavily downvoted.

HN is a financially-oriented tribe. It's like in the Bitcoin community. Only the pro-Bitcoin content gets upvoted and anti-Bitcoin content gets downvoted. The smartness of the content hardly matters at all. Sometimes the smartest content is at the bottom of the list.

Usually, I read both the top and bottom parts of HN then I get some of the best arguments on both sides.


What's a good argument from the bottom of this thread?


The one from dalbasal on the last page of comments is pretty lengthy and good.

Also, user ailmanki (also last page) said:

>> Every system requires sooner or later another system on top. https://en.m.wikipedia.org/wiki/Chaos_theory

That would explain what is happening.

The user narrator (also last page) also raises an interesting point:

>> I'm surprised this article didn't mention test driven development for building large pieces of software!

^ He then makes a good case for TDD.

TBH I don't see much improvement in quality at all reading the top comments. Some are dead wrong, for example, the user snidane says (which is/was in the top three on the first page if you ignore nested sub-comments):

>> The tooling is simply not there, so every software project keeps pushing the boundary of what is possible in its own unique fragile way.

Not only is that comment dead wrong (the tooling is definitely there, I've worked on some of these scalability open source tools), it contradicts another comment just above it (by user the_duke) which received just a few more votes:

>> At scale software stops being a technology problem and becomes a people problem. And groups of humans don't scale.


I am not certain about this either. My experience has been that the newer posts on the new tab are frequently more interesting than the posts on the leaderboard.


For instance, the OP is not very well written or informative, IMHO. Great discussion though


Isn't this typical HN though: the quality is most often in the discussion, rather than the original link?


Not necessarily, although I do appreciate the overall quality of discussions here. Reminds me of a very good 00's forum (yes I am a youngling)


> There’s a bell curve of software engineering ability in the world.

Probably not; across both the world as a whole and the profession specifically, a power law distribution is a lot more believable.

Even in a particular firm, its probably closer to a power law, or maybe an offset bell-like curve where the median is above the mode and there is a much longer high tail, then a common bell curve.


The simple solution is not to look on Instagram (which is a bit trite).

The more complex solution is to develop a worldview where you only compete against yourself, i.e. is what I'm doing now going to make me a better person in X time.

It's up to each person to decide for themselves what better means, it could be an emotional thing "will it make me more empathetic?" or a financial thing "will this allow me to earn more money?" or something else "will it make me a better programmer, chess player, cyclist, *happier*?".

For all the size of the world and its complexity it collapses down to a local optimisation problem for you as a person.


It doesn't even require them to pretend (though I do realise many do pretend). All it requires is people to be more likely to post their successes, or if mistakes then at least mistakes on the path to success.

Even if everyone were to post the mundane it wouldn't generally be what would get upvoted or liked in a given medium enough to counter the successes.


This is my thinking, too. On Instagram, we're seeing the best photos of the people who are on their one vacation of the year. Even if we ourselves post a vacation photo and it gets shared and liked, we won't think of it as one of "those" beautiful photos, because we just got lucky with that one. We remember our embarrassingly bad photos.

And on HN, comments are filled with insight backed up by decades of deep experience. But the Perl expert in one thread is unlikely to be the Russian historian in the other. There are few masters-of-all-trades (though they do exist, as if to inspire and mock us). Not to mention the majority of readers whose experience is rarely relevant enough to comment.


AA saying: "don't judge your insides by other people's outsides"


A think I have been realizing a lot, is that in all aspects of life, a person of even average intelligence can easily imagine how things ought to be. The licensing office should be more organize, the software should start instantly, the government should be more efficient, etc etc.

However we are humans, only recently evolve from something a bit like an ape. Hampered by emotions and constant fits of irrationality. Actually achieving the correct behaviors that we can easily imagine is very hard and very rare.

So we are correct in our criticisms but could be more calm and understanding about the fact that everything is terrible, and strive to make even very small improvements and enjoy those.


A think I have been realizing a lot, is that in all aspects of life, a person of even average intelligence can easily imagine how things ought to be. The licensing office should be more organize, the software should start instantly, the government should be more efficient, etc etc

I don't believe this is true, an average person can easily imagine a local maxima, but everyone's local maxima is different.

For example it might be very convenient for person A if the licensing office were open 24 hours but probably not so convenient for person B who would have to cover the night shift every other week or person C whose taxes would need to pay for it. And so on.

Nearly every sub-optimality from one perspective is an optimality from another.


True, but don’t forget that the frequency and severity of those fits of irrationality depends tremendously on how accepted irrationally is (and what personal consequences it carries). I know it sounds like a contradiction, but I think we need both high standards and an understanding that working together on intellectual tasks is hard.


We are at the very beginning of the history of computing. The age of print lasted 500 years, and there's every sign the age of computers will be longer. The notion that anything we are doing now has been perfected is hard to swallow. Especially given the way each innovation creates new opportunities for improvement.

I'm willing to assume that this is the best of all possible worlds, that everybody did the best they could in the moment. But having all that history behind us gives us lessons such that tomorrow's best possible world is surely better than today's.

I agree that we should have less technical arrogance and jackassery in the future. But that emotional immaturity is also part of today's best possible world.


Chesterton's fence [1] captures this idea: one should not tear something down until they understand it.

[1] https://en.m.wikipedia.org/wiki/Wikipedia:Chesterton's_fence


In my head, I've started to understand the suck in terms of timescales. Leaving products that are too complicated for the people who build it aside, even if you have the most competent and well-oiled team onboard, there are two things that together guarantee their output will suck:

- Usually, the "right way" to do something is only apparent halfway through doing that thing, or after you finished that thing.

- There's rarely time to redo things with benefit of hindsight, because the priority is always to work on business goals first.

That's how things were everywhere I worked or seen, so I believe this is likely a general phenomenon. But the consequence of that is strong path dependence. It's not that using Kubernetes makes you a moron or a smart person. It's that somebody already deployed Kubernetes for a project that's too simple for it (or they were too inexperienced), but business pressure makes it impossible to spend a couple of weeks undoing it, so everyone is stuck with the pain. Perhaps the reason for the initial Kubernetes deployment was because it was the fastest way to fix some other issue, and the company could not allocate enough time for engineers to do it right.

To be clear, this isn't me ranting about "stupid managers with their priorities". Pursuing business goals is the raison d'être of a software company (it's pretty much a tautology). I'm just saying, it's hard to improve things without lots of slack, and slack is a competitive disadvantage.

Perhaps another facet to this is: business lifecycle is too short for software lifecycle. If you were to spend time on redoing things properly, by the time you were done, you'd be out of business.


People are afraid of change and uncertainty. Basically all our solutions are basically the best we could do without making any leaps into the unknown from our current position. Just because we couldn't have done better given our starting position, that doesn't mean we couldn't have done better if we'd started over with a new initial position.

Think of it like gradient descent. The location you end up is the best location in the neighborhood of your starting position, but it's almost certainly a local minimum, and if you started over from a different initial position there's a good chance you'd find a lower minimum.


My favourite walking dead character and comic idol is King Ezekiel, because "embrace the contradiction."

Since we can't avoid thinking in generalizations, and they're usually false, the only option is to accept contingent thinking. Sometimes, respecting "the way things are" is the right way. Sometimes, rejecting the way things are is the right way. It's often how things get better. There isn't a true "way" you can live by in all circumstances.

There are hints though. If you're using disrespect/challenge to write a good essay (like this author) or to try and change something.... rejecting "the way things are" might be good. If you're just whining, probably not.

Humility good. Complacency bad. These are definitions, not value statements. They're not necessarily contradictory, if you care about such things... but they can have a Hillel/Shamai^ dynamic to them. Sometimes, an adversarial discussion like "Kubernetes" vs "Learn Tow to Sysadmin, You Moron" is enlightening. Sometimes it's no-it-all asholery. I don't think there are universal rules governing the distinction.

In any case, I don't think this author is being "terrible." He does recognize that scaling is a problem that he doesn't fully understand and meanders through a lot of possible reasons. None of it is mean. The title is a little click-bait, but I don't think it's a foul. He's quite respectful, in fact.

^Like political left/right, but more bearded.


> There isn't a true "way" you can live by in all circumstances.

It's like applying principles like DRY. If you push that principle to the extreme, your code will be shit. DRY is a good principle, but not a rule to follow blindly.


When you say that, I'm reminded of Feynman's book[1] describing the challenger accident investigation.

Basically he's investigating and goes and talks to the software guys and quickly concludes everything is awesome, and moves onto investigating hardware.

I was initially taken aback by this part. But he later he wrote:

"To summarize, then, the computer software checking system is of highest quality. There appears to be no process of gradually fooling oneself while degrading standards, the process so characteristic of the solid rocket booster and space shuttle main engine safety systems."

It makes sense, I mean they can fly simulations over and over before the shuttle is even built, let alone launched.

[1] The first book was "Surely You're Joking Mr Feynman", the second with the challenger investigation was, "What do you care what other people think?"


This flys in the face of how progress occurs. There’s a reason why movie theater seats are full leather reclinable with cup holders. It’s that absurd push for imagining ‘better’.

Large systems in general are hard to refactor. Our impulse in tech is to always consider the refactor because we see how in other systems a refactor is impossible or takes blood (our governments and countries, institutions like education/healthcare). Those in tech have seen many systems to compare against (what went wrong, what went good, even on the most basic projects).

The first thing they tell you about capitalism is that humans haven’t encountered a better system, which is true. Make that argument to a programmer. They either saw a shittier system, a better one, a more fitting one, a proper one, a terrible one, etc. You pretty much can’t tell us anything because we’ve seen it somewhere.

Being content with systems is bad for all of us. Being content with your life is good for all of us.


Wow, a lot of responses, this one will get buried, but here I go anyway.

In my opinion, what makes most of the wrongs we see is that computing is very young and we haven't had the time to set up half decent solutions for anything.

There is a huge demand for any solution, because a bad solution is better than no solution. The Kubernetes vs sysadmin is like the Diogenes story. There are two versions. In one of them someone tells Diogenes "if you praised the king, you wouldn't need to eat lentils" and he replied "if you ate lentils you wouldn't need to praise the king". In the other version it's just the same story only Diogenes talks first.

I don't like lentils. I don't like to praise the king either. Both are terrible solutions, but while we have kingpraiseless burgers, we need to choose something that somehow works.


> I think most people who work in tech could use a bit of humility influenced by this line of thought: if a solution exists today, and it has yet to be replaced, it's at least possible that it's over all the best solution, and the reason you're not seeing that is because of a lack of understanding.

It also doesn’t have to be the best, and probably isn’t – the space of all possible bit strings large enough to encode Kubernetes is quite big. Being “good enough” is sufficient, and often the discussion really comes down to differing opinions on what that means. These is no uniquely and clearly defined objective function here. People need to stop thinking that theirs is the one and only.


> doesn’t have to be the best, and probably isn’t

Precisely. And, building the truly best solution, is often a mistake, in that it can take pointlessly long.

Good enough is good enough, and others might find things to improve, and be correct that it'd been better in their way — the important thing is instead to realize that actually making any improvement one might have found, can be far down on the priority list. Maybe there'll always be other more important things to do, so it'll never happen.

> > "at least possible that it's over all the best solution"

can be rewritten as:

> > "at least possible that it's a good enough solution"

And maybe it can make sense to make improvement suggestions, and then, when the others say "yea but there's too much else to do", then, ok, maybe forget about it and be happy anyway.


I've seen this a lot when new people join a software project and immediately start telling people the "right" way to do everything and that all their solutions are far from optimal. Sometimes this can be useful if the project was a mess, but more often than not the new guy is failing to take time to understand the subtle reasons behind the team's various decisions. It's likely that the solutions were actually closer to optimal that meets the eye when you consider time constraints, developer resources, company politics, etc.

I've been on both sides of this, and look back and cringe when I think about projects I've criticized before trying to understand them. I've also had the experience of being early at a startup and having later employees come in and constantly complain about how sub optimal everything is. The problem is they're missing the time crunch that was in place, and the fact that if subpar solutions weren't used the company would have never gotten to market and they would have never gotten a job there. Software isn't created in a vaccuum.


I think software is relatively young and evolves faster than any branch I've seen simply because the compute platform changes so much over night. Just imagine that the PC I started coding on was a P1 machine with 16MB RAM. My smartwatch has 10x compute power. Cloud computing, IoT, GPGPU, SSD, multicore - major underlying constraint shifts in how we do compute.

Even the best possible solution is only best in the time it was created, and because it's good it will get entrenched and keep getting used, even when the constraints it was created in no longer apply even remotely and it's actually a very suboptimal solution given new constraints.

Everything from programming languages, OS kernels, file systems, distributed computing frameworks - soo much of it has ideas and solutions that are technically better but it's hard to overcome the inertia


The "PC" I started on was a 1Mhz z80 with 1KB of RAM - the first computer I owned had 48KB.

I'm sat in front of a machine that is literally billions of times faster and many orders of magnitude more powerful in every other axis (storage, bandwidth).

And yet, strip away the fancy IDE, the container tooling and it's all still my brain, some text on a screen and my understanding of the problem.


But the problem isn't "how to do this in 48KB of ram" these days it's "how do I do this on a distributed compute cloud with high level services".

Neural networks were a hypothetical approach, nowadays you can rent a ML pipeline for a couple of bucks and download prebuilt models.

So taking an approach from that era (for example C or UNIX) has obvious legacy baggage and isn't optimal in modern context but is still used because of inertia.

Also at this point it's less about your brain and understanding and more about being able to search and mash together. It's not just having a fancy IDE, internet and search fundamentally changed how you learn and solve problems.


Agreed on all counts.

The primitives we use are at the wrong level of abstraction for the problems we are solving but..well I haven't seen anyone come up with ones better enough they displace the existing solutions.


> Gottfried Leibniz suffered no small amount of flak for his conclusion that the reason the world looks the way it does is because it's already "the best of all possible worlds": you may think you see an improvement, but you lack divine understanding and therefore you don't see how it would actually make things worse.

https://en.m.wikipedia.org/wiki/Pareto_efficiency

https://en.m.wikipedia.org/wiki/There_ain't_no_such_thing_as...


While I do agree on the humility and lack of understanding points, my perspective is more, that the current solution is probably not the best, but what the best solution would be is pretty much unknowable for any non trivial system. Instead, it is more likely that you are approaching a local maxima. So the question is, whether it makes more sense to optimize your current approach or seeking another maxima which implies making things worse for a while before they get better and you might get stuck halfway there.

Either way might be a better choice for a given situation and due to lack of information and changing environments carry their own risks.


A lot of things have entirely historic reasons, that don’t apply anymore.


if you can enumerate these reasons and demonstrate that they don't apply any more, then you are in a good position to change the thing.

If you don't know what these reasons are, it is hard to know with certainty that they no longer apply.

see above reference to Chesterton's fence.


I don't see it that way. People are creatures of habit and prefer to torment themselves with existing problems that they have learned to deal with rather than get involved in something new.


People also love to create and get new and shiny things. Combine that with force of habit, and you get truckloads of engineers that tear down and replace what was working and only needed maintenance.

Both forces exist, the important part is to balance them properly.


>>People also love to create and get new and shiny things.

Ah yes, the novelty seeking magpie developer


> Gottfried Leibniz suffered no small amount of flak for his conclusion that the reason the world looks the way it does is because it's already "the best of all possible worlds": you may think you see an improvement, but you lack divine understanding and therefore you don't see how it would actually make things worse.

Isn't this line of thought disproven by a single improvement? People before the improvement could have argued along the same lines and then be proven wrong.


Interesting. This kind of seems like the software version of the efficient-market hypothesis from financial economics. It's pretty much obvious that markets have inefficiencies. But at the same time, they're the best we've got and constructing something that's better in general is really hard.


Good analogy. I'd say the efficient markets hypothesis would be better interpreted as "you cannot deterministically identify an inefficiency", i.e. you're always assuming some amount of risk with any trade. You could probably make a similar case for improvements more abstractly - you cannot be perfectly confident a priori that something will be net value additive over the previous configuration.


The recent HN article about how our brains prefer additive solutions vs the subtractive also seems to point in the direction of ever increasing complexity.

Maybe the field of software can really mature only when looking for subtractive solutions become part of the computing culture.


> ... I think most people who work in tech could use a bit of humility influenced by this line of thought: if a solution exists today, and it has yet to be replaced, it's at least possible that it's over all the best solution, and the reason you're not seeing that is because of a lack of understanding.

Agree. See: https://fs.blog/2020/03/chestertons-fence/

That said, we may not get far if no one makes an attempt to tear down those fences. Tech is like art in that respect: Lets practitioners experiment like no other engineering discipline.


The other thing is nobody likes working on problems that don't challenge them. They want to work on problems that are +1 level above their current skill. The problem is that the 'first solution' they developer/engineer is very likely to be shitty (and they probably won't even know how or why its shitty) purely because they're not experienced.

Ideally you want 'seasoned vets' for whom this level of engineering/development is old-hat and have them churn out a solid product that handles all the edge cases that you only know about when you have a long-term experience with the technology/domain.


Chesterton’s fence as applied to software: don’t replace a bad software solution until you understand why it is that way.

Still, looking back we can see how software solutions were not optimal for their age, and how amazing things can be coaxed out of old hardware using new insights. It would be pretty remarkable that precisely now is when we finally achieved optimal software. More likely, we’re still getting it wrong, and this is just one more step along the line.

The truth can be a little of both: yes, we can do better than what exists now, but we’re not guaranteed to unless we understand what came before.


It would also be nice to look at the efficiency of software engineering and business practices from 10 years ago, 20 years ago, and 30 years ago and realize just how far we've come. Software is running more things, bigger things, more important things, more complicated things, and we get better all the time, it's ok to live in an imperfect world and just make things a little better every day. Its ok to try things, find they're suboptimal and make changes, it's ok to retry things that have been done before but with a new spin.


How far have we come? I think the answer is sort of disappointingly at most a small constant factor.

The thing is, most software projects have an inherent complexity that can’t be reduced. This gives an implicit limit on what can be reasonably created by humans.

And most software is insanely complex — simply because otherwise they would not be useful in many cases. So writing software from scratch takes pretty much the same amount of time as it did 10 or 20 years ago. Yeah performance is much better both in hardware and in software (runtime, compiler), but eg. desktop app development may have become less productive since. All in all, there is No Silver Bullets, and there haven’t been since either. The only thing there is is incorporating already existing software.


This is so true.

There’s also the case where you or someone is hired to offer a fresh perspective on an existing system. After looking at it if your solution is “it looks great the way it is”. The person who hired you would think “why the hell did I even hire this guy?”

So even though you know the current system it’s a good system you try to propose a solution just to show value. That it was worth it to hire you.


Two rebuttals to Leibniz are "that was then, this is now", and "but this is special". Both are fluff until you answer the real question, which is "why are things the way they are"? It takes both humility and courage to engage with something you don't understand, but if you are curious, it is also deep fun.


In a bigger sense it just doesn’t matter. A lot of the criticisms in the Unix Hater’s Handbook are still trenchant decades later. And so what? Everyone knows how to use Unix and they’re used to the deficiencies. Up and down the stack until the advantages of the new thing are so overwhelmingly obvious that people will jump.


It’s an interesting mental model but I would put forward that it should be the best possible world we are currently capable of creating. Due to our own biological shortcomings I don’t think we’ll ever reach the best possible outcome in anything.


Then why do you call it a "possible" outcome?


It’s possible but infinitely unlikely


1 in infinity likelihood is 0.


I disagree. So does quantum theory


This is how I feel about JavaScript. I used to let the HN expert problem solving crowd show me all the problems and think of how bad it sucked. Now I’m happy it just worked most of the time.


> the one you encounter on a daily basis, and the one you read about on HN

This is true of any "real world" subset, vs its online outlets; HN may just be more niche than some.


yes, but there are also things done because "they have always been done this way." That's also a dangerous pattern imo.


Doesn't matter if you use kubernetes. If the system has not been designed for scale-out you will ultimately lose.

K8s is just a container orchestrator. And containers are just virtual machines.


That makes sense.

> containers are just virtual machines

I think this part is false though. Containers are, well, contained processes with contained storage. No simulated hardware like VM implies


Its an unecessary complication when explaining containers to the uninitiated. Just tell them they are essentially VMs and they'll follow along with the other concepts just fine instead of looking at you wide eyed as they try decipher how "contained processes with contained storage" interacts with their use case. Once they get experience, they'll learn the details.


VMs are not 'simulated hardware' either honestly.


To varying degrees they actually are. The hypervisor provides simulated devices and the kernel inside has device drivers to speak to them. For efficiency these devices are as "thin" as possible, but they are still simulated devices.

Unless you're using something like PCIe passthrough to give raw hardware access to a VM.


How would you put it, then?


At scale software stops being a technology problem and becomes a people problem. And groups of humans don't scale.

We naturally organize into hierarchies of small subgroups with limited interaction boundaries. Each subgroup will adopt slightly different methodologies, have different tooling preferences, will have integration and communication overhead with others, and so on.

This can not be preventend, only mitigated. Which is perfectly fine. The important step is recognizing that software development is really not special [1], and that the limiting factor is the human element. Then you can build your organizational structures around these restrictions.

[1] https://news.ycombinator.com/item?id=26713139


This is all outlined by Fred Brooks in 1975. I've beat this dead horse before, but I suppose I'll beat it again.

In the late '90s and early aughts everyone read Mythical Man-Month. There is even a law attributed to him, Brooks' law [1]. It really feels like no one internalized what he was actually saying. I suppose in a world of trends and fads, timeless advice is hard to recognize.

> [...] the increased communication overhead will consume an ever-increasing quantity of the calendar time available. When n people have to communicate among themselves, as n increases, their output decreases and when it becomes negative the project is delayed further with every person added.

Microservices, of course, exasperate this by actually increasing the communication required to make software.

[1] https://en.wikipedia.org/wiki/The_Mythical_Man-Month


> Microservices, of course, exasperate this by actually increasing the communication required to make software.

I HATE microservices. I've never seen them used appropriately, although I imagine it can be done. But I think if you increase human communication by using microservices you're defeating the raison d'etre of microservices. That's not the fault of microservices - it's the fault of the humans using them badly or where they shouldn't.

Story time. I once contracted to a fortune 500 doing a $100M total rewrite with literally 100 microservices all in their own git repo. It was such a disaster I contemplated shorting their stock. I don't know how the story ended, but I feel confident it didn't end well.


Microservices can be okay, but they are greatly overused. They are a solution for very large organizations with a ton of teams. For 100 micro-services to make any sense at all, you need to have at least 200-300 developers.

Total rewrites, on the other hand, are never anything other than a complete and unmitigated disaster. The only way that you can possibly make a $100M rewrite successful is to do it Ship of Theseus style, where the rewritten code is incorporated into the working and actively-used production software long before the full rewrite is complete.

Even then, the full rewrite usually won't complete, so you need to make sure that project is still useful even if you only do a partial rewrite.


I've heard of the "Ship of Theseus" rewrites as a recommendation for "evolution, not revolution".

I agreed that the big rewrite (i.e. "revolution") is usually a disaster.


Full rewrites are usually almost always a disaster. I should know, I've done a few on side projects. This case was a dramatic example of that.

> For 100 micro-services to make any sense at all, you need to have at least 200-300 developers.

Yeah, at minimum. I think these guys were running at least 2 microservices per developer. That made no sense to me.


Has there ever been a large scale successful rewrite? How much of Netscape/Firefox was rewritten from the ground up?


Yet... the economics of software is such that software companies balloon in size.

Software companies/teams/applications of today are many time larger than in 1975. It may be more efficient to have software that's federated and developed by small teams but... Facebook's revenue is $90bn. A social network 10% FB's "size" will make much less than $9bn in ad revenues^.

That means that product teams grow until the point Fred Brooks warned about play out... until the marginal team members' contribution is negative.

^For example, Reddit has 10%-20% facebook's users, but only about 2% of its revenue.


>Microservices, of course, exasperate this by actually increasing the communication required to make software.

I'd argue the exact opposite. Microservices with defined interfaces and APIs mean that everyone doesn't need to communicate with everyone. Only the people who work on a given microservice need to communicate amongst themselves. Across those groups the communication can be restricted to simply reading documented APIs. That in turn drastically lowers the amount of communication needed.


> Microservices with defined interfaces and APIs mean that everyone doesn't need to communicate with everyone.

This will be true if every team handling their own microservice will treat documentation like a real API like product and make it understandable, concise with clear examples. Also it needs a proper way to accept/debate bug/feature requests.

But in my experience this is not happening. There is documentation but it is usually auto generated and most of the times out of date.

So instead of reading it and trying to figure it out oneself it is more simpler to just contact the developer thus => more time spent in communication.

The same when I need a new feature from their API I setup a meeting as usually sending an email or creating a ticket does not guarantee a response => so more communication.


You can automate the communication by writing good documentation and api reference with examples.


This is part of the reason microservices are becoming reasonably popular. They facilitate scaling people. If you think about it one of the biggest projects we have, the military, scales through assembling near-identical pieces structurally (but not functionally, an artillery unit has a totally different role to an infantry unit).


Microservices may help scaling but tend to come with a fixed amount of overhead which makes them more effort at a very low and low scale. They are also perfect as long as you got the overall architecture right and do not have to make cross-cutting changes because this turns a coordinating people in a team to coordinating people across teams problem.


Yes! Ironically, microservice architectures replicate the same set of advantages and challenges you have scaling teams. The abstractions (team / process boundaries) become much more rigid. But in exchange, within a team / service, change becomes easier because there's less to understand and navigate.

How well the whole system functions depends deeply on how well those abstractions map the solution space. When you're doing something routine (eg most web development), lots of people have figured out reasonable ways of breaking the problem down already. So it works pretty well and you don't carry much risk that the abstractions you pick from the start will hurt you down the road. But when you're doing something more creative and exploratory - like a marketplace for VR, or making a new web browser or making something like Uber - well, then you take on abstraction risk. You will probably make mistakes in how you initially break up your teams and services. Executing well depends on your organizational ability to learn and refactor.


Something that only works for huge sizes isn't scalable by definition. Scalable means it can scale up or down


I disagree to an extent. Very few solutions are scalable across an entire range of a problem space.

Something being scalable, to me, means that it can cover a _broad_ range of the problem space, but doesn't imply it covers 100% of it.

If we imagine that the "problem space" is managing software projects from 1 person up through 1,000,000 people and you told me that there was a solution that worked for the range of 50-1,000,000 people I would describe it as scalable, even if I wouldn't recommend it to a 5 person team.


Great point! Actually "scalable" in any direction is meaningless until you provide the scale, performance, reliability, cost and time requirements you have to consider while scaling. A single server for a typical web workload is just fine, if you don't need to be up more than ~ 99,9% of time, if you don't have more than 10s of thousands of users. Considering the availability, you could e.g. reboot into a smaller or larger VM. (I know, this is crazy to write on HN but a single virtual server is plenty reliable for most things if you are able to do a very fast recovery from backup to limit the extent of a disaster.) :-) If you don't consider "serverless", it really doesn't get much better than a single server considering cost. If you want more reliability/ availability and more performance, you probably have to take other approaches. E.g. more servers, services like S3/ lambda, maybe a mainframe (cluster) in certain situations etc.


Well, with the ideal technical leadership, you start off with a monolith and clearly delineate boundaries inside it.

Then as the project grows chunks just get carved out.

Though real life is much messier.


And predicting cross-cutting concerns is at least as difficult as predicting all of your business changes, which is at least as difficult as predicting the future.


I see it differently. In my experience with 200+ devs, Microservices itself is a buzzword to a certain extend and allows for different team to avoid talking to each other: they build their "own" services, which are copies which leads to bloat.


Yeah that’s not gonna work.

I think you have to align incentives and the contractual level. For example, don’t pay by the hour, Pay for an API. Mandate that it’s documented in a machine-readable way. Require it to integrate with your centralized authentication and logging services. Mandate thorough, human-friendly documentation Programmatically validate as much of this as possible, add periodic human review (ie surveys, interviews).

The goal is to treat each team as a vendor for their service. You hold them accountable for the product and pay them for outcomes.


This, but typically a team of 20 has 1 monolith, afterwards team of 20 has 8 microservices. Equivalent of one battle squad now has to learn a bit of tanking, learn a bit of flying etc.


One question which is open for me: Should groups be stable?

Scrum and other methodologies assume stable teams. It takes a few sprints for a team to assign somewhat accurate story points to stories. It takes time for people to find their roles in a team. Whenever someone joins or leaves a team, it takes time to find a new equilibrium. So if you aim for peak performance teams, it takes months of stability to reach that peak. You better not change it then or it will take months again.

In contrast, most organization have a constant need to reorganize to adapt to new business models, customers, product portfolio, etc. For consulting gigs which takes less than a few months, it makes no sense to aim for peak performance and stable teams. Instead you should aim for smooth transitions. In such an environment, more formal methodologies make sense because good documentation, unified code styles, and consistent processes make transitions more smooth.

Of course, this isn't a black and white decision but more of a scale from stability to flexibility. There are small lifestyle companies which are very close to the stability extreme (the Sqlite team maybe?). Valve with its flat hierarchy might be an example for the flexibility extreme.


The sub groups are going to be somewhat homogeneous internally. So you will end up with a group who thinks leetcode profeciency is important in hiring and another group who thinks QA experience is more important and another group who thinks diversity is the most important. All those groups think they are right and better than the other groups and they don't work well together.


And who manages to do that?


The tooling is simply not there, so every software project keeps pushing the boundary of what is possible in its own unique fragile way.

People don't want solutions to yesterday's problems. These are considered trivial and already solved, such as invoking a shell command (which just hides a lot of complexity under the hood). Noone will pay you for invoking existing solutions. They pay you to push the boundary.

By tooling I mean programming languages, frameworks, libraries and operating systems. All of whuch have been designed for a single machine operation with random access memory model.

This no longer holds true. In order to scale software today you need multi machine operations and no such OS exists for it yet. Only immature attempts such as Kubernetes, whose ergonomics are far from the simolicity that you'd expect from a unix like system.

And random access memory model breaks down completely, because it is just a leaky abstraction. For full performance memory is only accessed linearly. Any random access structured programming language completely breaks down for working with GBs of data, in parallel and on a cluster of machines.

I don't think we'll push the boundary further if we keep piling crap on top of existing crap. The foundations have to change and for that we'd better go back to the 70s when these ideas were being explored.


> Noone will pay you for invoking existing solutions. They pay you to push the boundary.

Most businesses pay developers to solve problems, not to push boundaries. They typically prefer developers using "boring" tried-and-true approaches.

In my experience it is developers who push for "boundary pushing" solutions because it is more exciting. And of course the vendors selling those solutions.


If it was true that the only way to get "paid" is to push boundaries, 99% of everyone working in IT would be out of a job. Kurt Vonnegut once said that "the problem with the world is that everyone wants to build, and noone wants to do maintenance". I'm reminded of that quote daily working in the software industry -- the people "pushing boundaries" are a tiny minority and if they were the only ones working our modern world would come crashing down in an afternoon.


You can hardly expect people to talk about all the maintenance they did in their performance review


...why in the world wouldn’t you talk about the ways you made an application better, more resilient, or heck, just able to continue operating??

No, I don’t want my local government official to build a new damn road. I would clap and love them for fixing the one I drive on everyday! Even better if they do it in a way that’s going to last for 100 years instead of 10, and that’s the way you brag about maintenance work.


Why?

Work should be balanced between creating new code and maintaining existing code. And managers should value both.


Of course they should, but it just doesn't seem to work out like that.

I was basically pushed out of my previous job because I was the bug-fix guy and liked it.

When I was the 'new stuff' guy, I got big raises every year. When I moved to being the bug-fix guy, and other people were the 'new stuff' people, suddenly I stopped getting good raises and barely met inflation. Even my coworkers were heaping praise on someone who I knew was over-complicating things because he was doing new stuff.

As I was leaving, a system that I had rewritten twice was up for another rewrite for more functionality. I would have loved to do it, and I knew I could. They gave it to someone else. I heard, 6 months after I left, that they failed and gave up on the rewrite. I am absolutely certain I could have done it again.

Getting a new job got me a 40% pay raise. It's not that I wasn't still worth a lot more every year. It's that they couldn't see it because it was all maintenance, and not shiny new stuff.

I still prefer bugfixes, and I still rock at them. But I end up doing mostly new stuff and don't say anything because I know how that goes.


You answered yourself.

Maintenance is seen by business as a cost, innovation is seen as potentially bringing more money. The incentives are all wrong and that translate to engineers (and teams, and companies) chasing features and not fixing bugs.

A rewrite is often the only way of getting enough resources to fix up things.


Because the company's internal rubric for performance reviews awards 0 points for this kind of work.


They should value both but just don’t in my experience. Building a system can get you promoted. Performing maintenance to keep it running generally will not. Doesn’t matter if the maintenance project demonstrates a higher level of proficiency than the original development. Doesn’t matter that it’s extending the life of a system providing known value, as opposed to building something with theoretical value.


Why? Maintenance can easily be quantified. Fixed x number of issues, implement y number of feature request/changes. Limited downtime to z amount of time by doing following actions...


Programming is creating a "Hello World" program. The rest is debugging and maintenance. ;)


It's what we do at least. More than 60% of the work is maintaining old code/systems.


Yes, put it in terms of churn mitigation, compliance, security, and data integrity and you can absolutely sell maintenance at your performance review


> People don't want solutions to yesterday's problems. These are considered trivial and already solved, such as invoking a shell command (which just hides a lot of complexity under the hood). Noone will pay you for invoking existing solutions.

I am not exactly sure what you mean by this, but taken literally; most companies pay well for this and it's also the most common work programmers do all over the world. But maybe I misunderstand your meaning.


Heck, aren't Google devs complaining about "proto-to-proto" work ("protobuff to protobuff", not "prototype to prototype"), where they're basically just CRUDing together pre-existing services? Google pays a lot and has huge scale plus is considered quite innovative (I think they publish a lot of papers each year), yet most of their work is the typical enterprise middleware development job.


> Any random access structured programming language completely breaks down for working with GBs of data, in parallel and on a cluster of machines.

Why would you need a cluster of machines to work on mere gigabytes? You can get single-processor machines with terabytes of ram. Even laptops come with 64GB now.


Yeah, big data these days should be petabytes of data or at most high hundreds of terabytes coupled with very intensive access patterns.


It's almost like the parent comment is just putting keywords together instead of a coherent argument!


> Noone will pay you for invoking existing solutions. They pay you to push the boundary.

I wouldn't put it like that. 90% of DevOps jobs is to invoke the existing solutions quickly. Sure, there will be a little bit of pushing the boundaries here and there was needed. But largely I'd describe those jobs as putting the Lego blocks together every day - and there's lots of money in it.


DevOps Engineer here, former Lead (moved from startup to Big Corp). That's literally my job day in day out. Put the Lego bricks together. Everybody around me tells me I'm doing a good job and I'm sitting here thinking, "but all I did was Google these things and insert some variables in some Helm charts?"

The other part of my job is more difficult, which is encouraging teams to de-silo and automate wherever possible but so many of the employees are threatened by this idea at Big Corp that I'm encountering a LOT of resistance.


Same here. It's easy to forget the difference between a master LEGO builder and the average person just piecing together bricks though. Knowing what to build and when/how is a lot of the job too. I try and remind myself that we're also often there to help other people build better things, perhaps analogous to automatically sorting LEGO bricks so other people building can move faster with fewer compromises.


This applies to most high-value knowledge work. Figuring out which problem to solve and how to define success is much more difficult than actually turning the crank to implement whatever solution.


I think your first point answers your second point. It's only human nature for people to worry about their jobs.


Quickly and cheaply. People who only do development tend to look at every problem as some code they need to write.

We saved multiple customers a lot of time and money by just doing minor load balancer or web server configurations, rather than letting their developers doing a custom application to handle things like proxying or URL rewrites. Similarly we also frequently have to ask customers why they aren't offloading work to their databases and let it do sorting, selection, aggregation or search.

Having just a small amount of knowledge about what your existing tools can actually do for you, and knowing how to "Lego" them together solve a large number of everyday problems.


>People don't want solutions to yesterday's problems. These are considered trivial and already solved, such as invoking a shell command (which just hides a lot of complexity under the hood). Noone will pay you for invoking existing solutions. They pay you to push the boundary.

You'd be surprised. Most software production re-builds and re-invokes existing solutions that just aren't systematized yet. It certainly doesn't push any boundaries...


I think the problem is we keep using the same crappy tools because people are scared to be an early adopter.

Meanwhile, there are things sitting on the shelf that solve these problems in a principled way and make things simpler:

- ML languages

- Nix

- Bazel

- Erlang

- Rust

Some tools that are great and have large adoption:

- Git

- Terraform


> people are scared to be an early adopter.

ML was first developed in 1973. Ocaml in 1996 and SML in 1997. Great tools which haven't been popular in 20-40 years probably have something beyond fear of early adoption inhibiting them.


I'd say herd behavior and network effects are the main issues.

Do a quick search on YouTube on "what programming language to learn". You'll find video after video using popularity as a primary or even the primary factor to base the decision on.

Non-technical management tends to do the same, based out of the belief that languages are all pretty much equivalent and a desire to hire easily swappable "resources".


Yaron Minsky joined Jane Street in 2003. The success of Jane Street and the use of OCaml there is well known in finance circles. Finance is full of people who don’t exhibit herd behavior - many firms have a culture of studying fundamental causes and taking contrarian positions. That begs the question - why did OCaml not get widely adopted in finance?

Jet was purchased by Walmart in 2016. Mark Lore from Jet was appointed as CEO of Walmart to turn the company around, particular its technology operations (to compete effectively with Amazon). Jet’s tech was mostly developed in F#. Yet, Lore did not push for its adoption widely within Walmart.

IMO explaining away the failure of ML languages to gain market share over multiple decades as “herd behavior and network effects” is lazy.


> why did OCaml not get widely adopted in finance?

I think it has been pretty successful. The bigger success story in finance is F# though, which takes a bunch of ideas from OCaml.


I think it's tooling. But tooling follows from adoption in most cases.


Agreed. The stuff "sitting on the shelf", as your parent comment said, have problems too (eg tooling). They might solve some problems, but are far from the silver bullets we are looking for.


For your typical scaling org, I think data layers are often the main issue. Moving from a single postgres/mysql primary to something that isn't that represents the biggest hurdle.

Some companies are "lucky" and have either natural sharding keys for their primary business line, are an easily cacheable product, or can just scale reads on replicas. Others aren't, and that's where things get complicated.


Tbh, That's why for our new projects we've completely ignored relational databases. They're a pain in the ass to manage and scale poorly.

DynamoDB, on the other hand, trivially scales to thousands (and more!) of TPS and doesn't come with footguns. If it works, then it'll continue to work forever.


This is funny to me since modern relational databases can get thousands and more TPS in a single node. My dev machine reports 16k TPS on a table with 100M rows with 100 clients.

> pgbench -c 100 -s 100 -T 20 -n -U postgres > number of transactions actually processed: 321524 > latency average = 6.253 ms > tps = 15991.775957


Yep. And 98% of software written today will never need to scale beyond 10k TPS at the database layer. Most software is small. And of the software that does need to go faster than that, most of the time you can get away with read replicas. Or there are obvious sharding keys.

Even when thats not the case, it usually ends up being a minority of the tables and collections that need additional speed.

If you don't believe me, look through the HN hiring thread sometime and notice how few product names you recognise.

Most products will never need to scale like GMail or Facebook.


How do you handle cases where strict synchronization is required?

Banking app is a classic example.


I think Nix will take a long while to increase adoption, because it is hard to learn. The language is new, the concept is new, the mental model is new. But you need to master all of those to use the tool effectively.

Same goes for the other items in your list. Git had enough time and force behind it, and I believe the other tools will succeed as well. But it will take time.


I would add E[1] language to that list.

[1]http://erights.org/elang/index.html


> ML languages

What are those?


I suspect he means Ocaml, sml and bucklescript.


agreed. ML [1] is short for meta langauge. Because it is not very C-like syntax by default, it can feel exotic and alien to most of today's unix graybeards, 90s enterprise, aughts startup folks, and even the JS crowd.

see also its 'newer incarnations' Scala, F#, Nemerle...(don't slay me)Rust(ish)

[1] https://en.m.wikipedia.org/wiki/ML_(programming_language)


Oh, I see. Thanks!


Building up the tooling is how we advanced over the last decades. It is a very slow process though because one tool has to become stable before tools on top of it can progress instead of adapting to changes.

Large software projects do not have decades. Only few years.

Distributed operating systems are nothing new though. Plan9 for example.


I'd say the problem is way different and it's a people problem. A small percentage of people invest enougu resources to be able to have the big picture and write higher quality solutions. There is a huuge percentage of people with minimal education learning along the way and by learning along the way they reinvent things as a part of their learning process. This also incentovizes rising popularity of low quality but simpler tooling


The number of developers (and developer managers) who want to only complete tasks where someone (else) has defined the inputs and the outputs is extraordinary.

It is my firm belief that the vast majority of value in Software Development is in understanding the problem and the ways that Software can help. Crucially this includes knowing when it can't.

When asked what my job was once I said, "to think and write code, the hard part is doing it in that order."


This is a pet peeve of mine - I've worked with folks who were/are good developers but wanted to constrain their role to only development tasks. No effort to get exposure to build automation/deployment pipelines or understand how Docker/Kubernetes work at a high level, no desire to understand the bigger business domain or how our customers use the product. Pretty much just "give me a fully groomed ticket and leave me alone until I tell you it's done."

On the one hand, I get it - sometimes it actually is nice to have a ticket that you don't have to think too much about the bigger context for, and can just crank out code.

On the other hand, I don't think it's particularly good for your career security if you're limiting your own role to "minimally communicative code monkey" - if that's all you want to do, there's an awful lot of competition out there.

I've made an effort the past couple of years, across multiple jobs now, to get some real exposure to the ops aspects of my roles (most recently, prototyping and then helping implement CI/CD pipelines for Azure Durable Functions) as well as making an effort to understand the business domain, where the gaps in our product offerings are, and what the competitors in that sector look like. It's really helpful in terms of looking for a more efficient way to solve a business problem, not to mention being able to say things like "hey, the market we're in is moving pretty heavily to Kubernetes, so it's really important that we decide how we're going to support that."

I'm not saying you need to be (or can be) an expert in all of those things, but I think having the high level exposure is really important, at least when you get to a senior IC level. A minor bonus is it helps you develop a Spidey sense for when the business is pursuing strategies that don't seem likely to work, giving you a chance to either try to offer feedback/seek clarification on why, or to pull the plug and find a new gig if it seems bad enough.


It's very frustrating and makes it easy for management to outsource your role. Sometimes that's the right thing, but given how little most "ordinary" companies leverage technology and how utterly most non-tech understand what can (and can't) be done it's usually the wrong call.


To put is cynically, everyone knows that yesterday's tools don't work. So it's better to try today's tools, which you don't yet know don't work. This allows one poorly thought out idea to replace another - especially when selling new stuff is profitable.

That said, it's implausible that the main problem is scaling since mainframes scaled considerable data quite a while back.


Agreed completely. Today there's such a large disconnect between how you think about software (APIs, services, systems, data flows, replication, data storage, access patterns, and so on) and how you actually develop software, with plain text files on disk that don't at all resemble our mental model.

I've been working on a new type of framework that is about making how we develop software match this mental model. Where the higher-level primitives we think in terms of are first-class citizens at the language level.

I think only then is it possible to reason about, and scale, software in a much more productive and effective way.


> Today there's such a large disconnect between how you think about software (APIs, services, systems, data flows, replication, data storage, access patterns, and so on) and how you actually develop software, with plain text files on disk

I view this a little differently. People have tried again and again to move the abstraction layer for coding above text files. This was happening back in the 90's (in the guise of so-called "4GLs" [0]) and still today (now rebranded as "no-code" [1]). I myself spent a good deal of effort trying to code "diagramatically" through top down UML (with an amazing product for it's time, Together). So the ambition to shift programming up the abstraction food chain has been tried continuously for 30 years and continued to fail every time. Eventually I changed my view and decided that there were fundamental reasons why higher level abstractions don't work - software is just too complex, the abstractions are too leaky. The details matter, they are critically important. We are in a long arc of figuring out the right abstractions that may take a century or more and in the meantime, we simply have to have the flexibility of text based formats to let us express the complexity we need to manage in the meantime.

[0] https://en.wikipedia.org/wiki/Fourth-generation_programming_...

[1] https://techcrunch.com/2020/10/26/the-no-code-generation-is-...


I don't disagree,but I think there's space for several levels of abstraction between where we are today and no-code tools.


> By tooling I mean programming languages, frameworks, libraries and operating systems. All of whuch have been designed for a single machine operation with random access memory model.

You should learn about Elixir (or anything on Erlang VM).


> People don't want solutions to yesterday's problems. These are considered trivial and already solved, such as invoking a shell command (which just hides a lot of complexity under the hood). Noone will pay you for invoking existing solutions. They pay you to push the boundary.

People are paid to work on products which provide value to the business. There is always a Minimum Viable Product, MVP. Meet that. Exceeding that is the boundary that needs to be pushed, not grabbing the latest, possibly untested, tools off the shelf.

> And random access memory model breaks down completely, because it is just a leaky abstraction. For full performance memory is only accessed linearly. Any random access structured programming language completely breaks down for working with GBs of data, in parallel and on a cluster of machines.

This is why constantly re-engineering (for the purpose of engineering) is not the most useful method. I used to work in FoxPro with databases that were GBs of data. If today those GBs are difficult to handle, which they aren't, then there is a problem with how the stack was put together. GBs are trivial.


Erlang/Elixir would like to have a word.


It is very unfortunate so few are willing to give it a try


I watch Erlang / Elixir. But I really like strong static typing and type safety. So for now, I'm not learning anything beyond basic beam theory, lest I will have to unlearn it later.

Gleam lang at least is definitely something I watch closely and may (may) be the BEAMs breakout moment.


Software is indeed a lot like cities.

But cities ain't pretty. Dig the ground and nothing looks like what's on the map. To create a new building, you need to cut a lot of plumbing and re-route it somehow.

Stuff gets old and breaks all the time, and maintenance teams are on the ground 24/7 to fix it. NYT subway is the mainframe of the city. Look at the central steam heating that's still pumping water into high rises.

Sure, you can sell a shop, and people will be able to repurpose that space to suit their needs very efficiently. But isn't that what Docker does? Provide a cookie cutter space to run an app?

But in cities, there are places where trucks can't go, where you can't park, where unplanned work happens, where the sewer floods. That's when the city 'bugs', and people need to manually intervene to fix it...

Trying to find an ideal way of running software at scale is just as utopian as building the perfect city using 'super structures'. It's a dream of the 60s that's never going to happen.


Maybe a large US city is less of an ideal than say a medium sized Swiss one...


Ideal in what sense? NYC has more than double the GDP of the entire country of Switzerland and approximately the same population.


I mean in the sense of what perceived chaos or deterioration is “allowed” in a city or it’s infrastructure. E.g whether a plot can be left unbuilt or half torn down for example, and the standards which are expected of public infrastructure like roads or train stations.


Cities like Tokyo might be more comparabe, and much nicer than NYC in many aspects, especially the public transportation system. For a major city NY looks pretty shabby and not that nice in average.


My own highly simplified opinion why software development "doesn't scale": because software development is all research and zero production - with the notable exception of asset production in game development, which actually is quite easily scalable once you've figured out a workflow that works for a specific (type of) game. If you think about how to scale software development, don't think about an assembly line, but about a research institute.


I have been working with software development over the last 20 years and agree with that point of view 100%. Software development is the act of learning/researching. The source code we create is not a “product”. I really liked your metaphor of the research institute. Spot-on. Thanks for sharing


Man, I feel like you took the words out of my mouth.

100% agree.


Very often, even tough I only ever write very short programs (100 to 1000 lines) in Python, C++ (and learning Rust), I look at my finished program and feel like there is a million things to improve and a billion other ways to do it. Each variable could have a better name, each abstraction could be abstracted more, the whole thing could be rewritten in a different style or language.

Whereas if I design something in CAD, I send it to the printer, try it, improve it once, print again and never ever again think of it as long as it does it's job.


I think under market conditions you would be continuously tweaking the model, and manufacturing processes, based on feedback eg wrt robustness, comfort etc and also taking into account changing usage parameters/expectations, changes in material supply costs, regulatory requirements etc

Of course these concerns matter more if you’re making a lot of these “widgets” but even in the case of a one off you’re going to have servicing costs and the occasional redesign/replacement/upgrade ...

It’s not that far out of line with software - widely used products go through tight highly iterative development cycles whereas one off solutions tend to be just “good enough” with bug fixing and the occasional feature request.


When you start drawing diagrams of non-trivial software, for instance Blender (an open source tool for building 3D models and animations), you start to understand how complex and complicated software is.

No other machine built by humans is that complex.


Regarding comments about manufacturing plants and the LCH, sure both of these examples aren’t software directly (but they also wouldn’t be possible without software), but they are also always under constant fixes, feature upgrades and optimisations.



Ah yes, the famous Large Hadron Collider, the most complex piece of human engineering that doesn't have any software.


I think most of it is just software


I'd say manufacturing plants are easily as complex as Blender.


How about car software?

I used Blender as an example because it's open source.


I'm working on driver assistance stuff. In terms of raw functionality the software is not that complex. The complexity comes from other aspects: Safety-critical, hardware-software-codesign, commercial, resource constrained. Then, of course, organizational disfunction creates accidental complexity as it does everywhere. All of that together, and a software developer barely achieves a thousand lines of code per year.


>How is it that we’ve found ways to organize the work around so many other creative disciplines but writing software is still hard?

I think OP heavily overestimates the organisational praxis in other disciplines. Nearly every creative discipline I was ever in was largely purely ad-hoc with very little explicitly stated organisational approaches to the craft. Academia, movie and music production, and creative writing for instance have much less readily-available principles than software.

Software is probably the most thought about creative industry I can think off in modern history.


I think the software industry could learn a lot from the collaborative attitudes and approaches employed in the more typically creative industries. Those industries you mentioned may not have as many formalised principles, yet collaboration is so imbued into the process that it naturally lends itself to large scale creations.


Alan Kay said at one point:

> I have always done my work in the context of a group that is set up to maximize a wide spread of talents and abilities. Just as “science” is “a better scientist than a scientist”, such a group is “a better programmer and systems designer than any individual”. Really learning to program — etc. — is really learning about how to be part of a scientific/engineering/design team. As with a sport like basketball, there are lots of fundamentals that all need to do extremely well, and there are also “special abilities” that not every member will have, but which the whole team will have when it is functioning well as a team.

I fear that very few people really spend the time needed to really learn the fundamentals and very few people learn a scientific mindset that would allow them to cooperate effectively with others without getting their ego in the way.


I thought somebody might mention it, but since nobody did: a best boy grip is the chief assistant to the person in charge of building rigs and frameworks for things like cameras and lights.

Best boy basically means 'chief assistant', second in command kind of thing.

Grips build the structures which cameras and lights are hung on. They don't touch the electrics, gaffers and electricians do that: grips make the towers, stands, tracks: the physical stuff on which you put things.


All these specialized jobs--best not, gaffer, makeup--really highlight what's different between software and movie production. On a movie set, nearly everyone is doing exactly the same thing they did on the last movie set. In many software jobs you're never repeating your last project.


These are just roles and we have those in software as well. For example, Scrum defines "product owner" and "scrum master". In Hollywood they call the "project manager" a "producer" but the job is very similar.


There is no "scale" so to say. To create MS Windows which has lets say 50 mloc, versus MS Calc which has maybe 1/1000 lines of code compared to Windows you don't "just" write x1000 times more code and done. Writing one Windows is many orders of magnitude harder than writing a thousand Calc sized apps. That's why "scale" don't work, it's not scale, it's difficulty spike. If you'll hire large enough corp, with several hundred expert programmers (no trainees , no juniors), and do what they think they want - don't interrupt them from flow, schedule almost no meetings, communicate strictly async via email, allow then self organize and self manage - I think that such a project will be an epic sized fail.


Yesterday I watched a video about how Netflix scaled its API with GraphQL Federation.[0] The video contains some neat visualizations that helps you see how complex data accees problems at scale can get. And this is just the services level they talk about.

No mentioning of the underlying infra with all its complexities needed to acheive the goals of flexibility, reliability, speed and cost cutting.

You don't have to be of Netflix size - when you start getting tens of thousands of users, complexity hits you real fast.

[0] https://youtu.be/QrEOvHdH2Cg


Torvalds once said [1] that the only scalable software development methodology is open source, and I tend to agree for two reasons:

1. The project structure, tooling and documentation that lets new contributors jump in quickly makes software development easy to scale. In Coasian terms, the transaction costs around development are minimized.

2. It enforces hard interfaces and it clearly separates development from operations. Lack of discipline around these issues is a source of much accidental complexity.

[1] I can’t seem to find the quote, I read it in an interview a few years ago and stuck with me since.


To go further I think it's not exactly open source but remote-first that made software development scalable.

If you grossly simplify it to its innermost core, making development scalable means that if you have 10 times more people you can have 10 times more features/bugfixes/speed improvements/... The only way to do that is to make sure that an additional developer doesn't depend on other developers to work, and that can only happen if everything is properly documented, the build instructions are up-to-date, the processes are clear, basically anyone can start from scratch and get up to speed without the help of anyone else.

That kind of organization traditionally doesn't happen in on-site companies, where newcomers are followed by senior people, they have to follow some introduction course to familiarize themselves with the processes, they have to ask many questions every day, they need to be synchronized with other persons which brings some inefficiency beacuse everyone works at their own pace, etc... This all disappears when everything is properly documentend and every contributor can work in the middle of the night if they wish. I think the Gitlab Handbook goes over this quite well and describes a framework to implement that kind of organization but the rules are retrospectively obvious for people already used to open-source (https://about.gitlab.com/company/culture/all-remote/guide/):

- write down everything

- discussion should happen asynchronously. Any synchronous discussion (by text or call) should be only very small points. Whatever the type, write down the conclusions of that discussions

- Everything is public (to the organization), including decisions taken, issues, processes


Yeah, remote-first is important, but it's not only it.

Another very relevant factor is that people can just clone stuff, create new projects, and everything moves independently. So the open source development model has a pile of solutions for dependency management that team based development doesn't adopt.

But the one thing I don't get is why team based development doesn't adopt those solution. They are not expensive. Yet, even when I was able to dictate into everybody's requirements, I wasn't able to force teams into adopting them. Instead they insisted on synchronizing themselves by much more expensive and less reliable means. My guess is that most developers never dug any deep into open source and have no idea how it's done.


> the only scalable software development methodology is open source

“open source” isn’t a development methodology, or even a distinct set of methodologies.

> The project structure, tooling and documentation that lets new contributors jump in quickly makes software development easy to scale.

Plenty of open source projects don’t have that, nor is there anything restricting those things to open source projects.

It is true that some open source projects, because they see the value in new developers jumping in quickly, prioritize having structure, documentation, and tooling that supports that. It’s also true that some proprietary software projects do, too, because the project owner sees value in that.


There are books written about open source development. Don’t confuse somebody dumping source files on Github with open source development.


> There are books written about open source development

There are books written about different authors idealizations of how to do open source development. The fact that many people have written about different methodologies for approaching a particular challenge doesn’t make the challenge a methodology.

And, yes, I am aware that the very approach known as “the bazaar” (from the essay “The Cathedral and the Bazaar”) is sometimes referred to erroneously as “open source development”, which is a particularly glaring error since all of the examples of both “cathedral” and “bazaar” development were open source projects.


Yeah, and that does not mean those books describe how open source projects function in reality. But they should, how things are done changes.


There are a gazillion ways to develop open source. It's more like an ethos than a methodology.


> There are a gazillion ways to develop open source.

Yet every large project behaves in a similar way, with only a few larger variations.


That's why one of the seminal books about Open Source is the Cathedral and the Bazaar? Because "there is only one way to do things"? :-)


Hum... The Bazaar on that book is a very specific way to do things. So specific that I don't think anybody really follows it.


> the only scalable software development methodology is open source

Some companies such as Microsoft, Google, Amazon would disagree.


The Linux kernel had 20k contributors since its inception, you’d be hard pressed to find those numbers for a single project at any company.


Consider the bit at the end of this blog post:

https://devblogs.microsoft.com/oldnewthing/20180326-00/?p=98...

Bonus chatter: In a discussion of Windows source control, one person argued that git works great on large projects and gave an example of a large git repo: “For example, the linux kernel repository at roughly eight years old is 800–900MB in size, has about 45,000 files, and is considered to have heavy churn, with 400,000 commits.”

I found that adorable. You have 45,000 files. Yeah, call me when your repo starts to get big. The Windows repo has over three million files.

Four hundred thousand commits over eight years averages to around [130] commits per day. This is heavy churn? That’s so cute.

You know what we call a day with [130] commits? “Catastrophic network outage.”


To get closer to an apples-to-apples comparison, it'd be necessary to know whether the commit counts in each case include all development branches for all development groups. By design, git-based development can be highly distributed.

Also, even if we normalized both cases for 'code files only' and/or 'kernel code only', there could still be architectural, code style, and development process differences that lead to different metrics for each project.


He's really not selling Microsoft as a fun place to work, is he?


Yes, the linux kernel is a big codebase but so are a lot of others [1]. And you will find lots of private companies not mentioned that have long-standing projects with >10 mil LOC.

If you took the 100 biggest code bases in the world. I would be surprised if more than 10% of them were open source.

[1] https://www.informationisbeautiful.net/visualizations/millio...


I know a medium sized French-American multinational. You've probably not even heard of it. A decade ago they had multiple products that had several million lines of code. Their entire platforms probably had 100 million. I can't even imagine how many contributors they had, the oldest project was started in the 80's and it probably had 1000 contributors over time, at least.

And again, that's for a middle size company you haven't even heard of. FAANG, Microsoft, Oracle, IBM for sure will have stuff dwarfing that.


Someone from oracle posted here a while ago, where they described Oracle 12.2 has 25 million lines of C https://news.ycombinator.com/item?id=18442941


And yet it's a less appealing product than Postgres.

An explosion on the number of lines of code is one of the way development teams fail.


Is it actually less appealing? My understanding is that DBAs consider Oracle very good, just very (very) expensive. This also lines up with my experience tuning queries against Oracle vs Postgres backends. The folk wisdom seems, to me, to be Postgres is 99% as good in the common cases and 90-95% as good in the difficult cases.


Oracle DBAs consider Oracle clearly superior. SQL Server DBAs that know all the ways to optimize SQL Servers consider SQL Server clearly superior. I don't know of any PostgreSQL DBA, just generalists that work with Postgres, so I don't know about them.

The truth is that Oracle is ridden with coherence bugs, and have a much worse performance picture out of the box than Postgres. But while improving the Postgres performance requires deep digging into the DBMS itself, Oracle has a lot you can optimize just on the SQL interface. But there are plenty of cases where Oracle just can not become as fast as out of the box Postgres, and many where it could in theory, but a bug prevents you from doing things right.

Overall, three is no definitive ordering on the speed of the "good 3" DBMS. It always depends on what you are doing.


SAP told me their framework is a billion LoC of Java and 30k tables in the database. That was years ago so it has probably grown further. It is only the framework, so no useful application yet.


How many of those 20k contributors worked on drivers, and how many - on the actual core (~150kloc)? Every driver is like a separate subproject, having 20k people work on hundreds/thousands of drivers (unrelated to each other) wouldn't sound as impressive.


That's the scary thing. How many real, core contributors does even something like the Linux kernel have? People who have actually written more than a couple patches that landed and stayed in the kernel. I'd be astounded if it's much more than a hundred.

Most open-source projects have one to maybe five real contributors, outside the drive-by pull requests to fix some bug.


I guess (and I have no way to verify this, it's really just a guess) that the number of developers that worked on the Windows code base in its 35 year history is in the same order, if not higher.


Since Open Source software drops liability by definition, a huge complexity factor disappears.


This post seems like the author is early in their career and is realizing everything in the world is half broken and not perfect. What stood out to me is when he said “you could build what takes weeks in an enterprise in a weekend.” I don’t know about everyone else but I’ve become a much much better engineer since college and I still have to carefully handle every edge case for production and write a lot of tests before shipping something and I really don’t think code I write in a weekend will be production ready at a large company. Truth is by any metric, the massive systems in place by Amazon are pretty amazing examples of scalable software. At my day job at least managers have seemed to figure out that they set the high level goals/milestones and then let the engineers figure out how we want to get to those goals. This system has worked pretty well (of course the deadlines can get tight and we accumulate technical debt) but once we hit a milestone we deliberately decide to take some small amount of time to fix technical debt, as long as we kept track of it as we scrambled which also requires some discipline. The way things distill up to management is for my VP he doesn’t know what details you did on x day, he wants to know did you help implement y one of our org’s high-level goals. This system works to some degree and I haven’t thought of a better one yet.


Yeah, OP is definitely an early engineer, or not very intelligent. Enterprise software takes longer because there's multiple stakeholders and it's better to take a week to build it properly than it is to hack it and fix bugs for the next 3 months.

The decisions you take to build software has to include the expected scale. If you're pushing something to 10 users, you'll take different decisions than if you push it to 20M users. This is done at the planning phase, and senior engineers are aware of this. I'm not going to spend 2 weeks optimizing the shit out of a system that'll be used by 10 people.


The city metaphor is evocative and new to me. I like it.

> Conversely, if we built our cities the way we build our software, you would need to enter the shop through the special garage, and exit through the roof to walk a wire to get to another custom made building from scrapped containers to do the checkout. And some of the windows are just painted on because they’re an MVP.


That does actually sound like real cities to me. You have to walk 1 km to get to a point 50 m away because they never put an underpass in when they built the railway line. That flagship building stands half finished for 4 years because the people involved got bought out by another company that never got around to finishing it.

And then you have that one section left over from the original release that all the engineers agree desperately needs to be refactored and upgraded, but due to cost and politics they never get to do it. And anyway you have a couple of power users who insist that the backwards and broken way that part is implemented is actually perfect and shout very loudly anytime you suggest changing it.


Large software projects are almost as complex as cities and yet we have almost no one working on them. 4 developers can be enough to build a product used by millions. That’s kind of incredible IMO. 4 builders/architects/engineers don’t go nearly as far.


4 architects are also enough to design a house that can be built millions of times. If you include the builders into the city projects, it would be fair to include the tens of thousands of people working in data centers and network exchanges as well.

Also like a sibling comment states, 4 developers is not even close to a "large" software project.


Software allows incredible leverage by reusing stuff. I would never consider four developers a large software project though.


Thought the same, but I also think we already have software-as-a-city if you take operating systems, cloud platforms or for example message queue systems on top of which you build components (e.g. microservices at larger scale). Message queues are the streets and as such they are probably not the most efficient but at least they are flexible enough to allow you to experiment with the buildings: rebuild, renovate and repurpose them while leaving the streets more or less intact.

The problem with this industry seems to be that once something is solved, i.e. we already have reliable, battle-tested "streets", there is a big pressure to push everything further, build even more complex systems, faster. The pressure is rather natural: you will have a competitive advantage if you can push the limits and build something that can't be done based on the previous architectures, within a limited time frame.

For example, building desktop apps is a solved problem. These are you streets, these are your building blocks. But because it's a solved problem there's little money to be made out of it. The money lies somewhere on the edges of the map (e.g. SaaS) where there are still no roads and no general urbanization plan, and it's where the businesses tend to flock to. Hence the chaos, uncertainty and quality problems in most of innovative software.


> Conversely, if we built our cities the way we build our software, you would need to enter the shop through the special garage, and exit through the roof to walk a wire to get to another custom made building from scrapped containers to do the checkout.

I live 1.6km from my office. I drive 9.2km to the office. Cities aren't as straightforward as you might think.


I think Stewart Brand’s concept of Pace Layers fits well with the city metaphor of how software changes over time at different rates https://sketchplanations.com/pace-layers


Software crafted carefully for the beauty and structure is one thing. Another is software for customers where delays of shipping the functionality has financial impact. Will you perform a change with cost of 1 day or do it right and spend 5 days? What will you be rewarded for? Short-term thinking is right until it's wrong because it stalls the whole company/system.

Another challenge in software comes from the added cost due to decisions that are tricky to reverse. Worst thing is that they only show at certain scale, meaning that in many cases they were the right decisions in the past (business context).

If you only operate in a single market, you can process prices with a single currency, so you get away with bad modelling - without the currency. To scale to another market, you can just deploy your application twice. If the company succeeds in new markets, you will replicate the solution. Fast forward a few years and you have a migration project to add currency to your software and data to optimize deployment.

Organizational complexity is another domain. Understanding what places to adjust is key. For example: how complex is it to change the tax rate in your software? How do you find out which applications to change? How do you know who needs to perform the change? Do you have to broadcast a question to the whole company and ask them for performing the change until X? How do you know you adjusted all places? Do you have to deploy the change manually or will the change be automatic based on time?


This is one of the reasons I’m in the erlang derived system of languages. You never have to use the distributed tools if you don’t want to but they’re really powerful and really elegant when you start to need them.

This is only one part of the problem though because as someone stated below even though server systems have networking stacks they are not entirely designed around the concept of seamless computation over multiple nodes so you wind up with abstractions that solve the problem but do so fighting against some occasionally nasty gotchas that make it a lot harder than it needs to be.

This is all to say I’m greatful for those who maintain things like erlang, elixir, kube, go etc. They’re somewhat fighting against the stream so we have an easier time when we do need to scale.


That's a great essay! I think he describes the issues quite well, and I enjoyed his metaphors (I like the city metaphor).

I also -like him- have no answers for "the big picture." I ran a small cog on a big wheel for a long time. I feel the basic quality (as in "few bugs") of our work was good, but I also feel that it took too long, and failed to flex, so it often fell flat with our customers. Our software was developed, using a hardware process, and there are vast differences between the two disciplines.

I am fortunate, in that I don't need to play in anyone else's sandbox. It means that my scope/scale is quite limited, but I am pretty happy with what I am able to do. It's actually fairly ambitious for a one-man "team."

Since striking out on my own, I have experimented with what I term "ultra-agile" techniques; with some success. Hard to codify, though, as they depend on a high degree of experience, as well as the fact that I'm a bit "spectrumish."

The closest to success that I have had, is to design all of my software as an extensible infrastructure. I don't know, exactly, how it will be extended, but I write "hooks" into it. I often end up using these "hooks," myself, in future work.


> At least, building a new project within a company should be easier than starting from scratch, but my hunch is that many companies fail that test.

This observation seems both true and important.


I have yet to find an organization that has figured out how to effectively transfer learned lessons to other projects/employees without retaining those specific people for long periods.

Add in high levels of turnover and unless the specific person who learned the lesson is also on the project, there isn't a clear wrong way to prevent a wrong turn.

I am part of a new team at a startup and everyone on the new team is a relatively new hire. As far as I know, we haven't taken any meaningful learnings. Despite the company existing for years, we have done everything from scratch.


I help often customers understanding technical debt, and code quality. Most people focus of code metrics, but they tend to forget organizational metrics, and how that impact the software. There are several studies on this. I particularly like one from Microsoft [1].

One of the metrics that I use address your point:

Number of Ex-Engineers (NOEE): This is the total number of unique engineers who have touched a code and have left the company as of the release date of the software system

Implications: This measure deals with knowledge transfer. If the employee(s) who worked on a piece of code leaves the company then there is a likelihood that the new person taking over might not be familiar with the design rationale, the reasoning behind certain bug fixes, and information about other stake holders in the code.

A large loss of team members affects the knowledge retention and thus quality.

[1] https://www.microsoft.com/en-us/research/wp-content/uploads/...


> Number of Ex-Engineers (NOEE): This is the total number of unique engineers who have touched a code and have left the company as of the release date of the software system

I have been on a project where I arrived after the first guy's replacement was replaced. So three people had used my desk before for that project.

So much knowledge was lost.


High levels of turnover typical suggest high dysfunction of some kind.

Also, the way to disseminate knowledge is by talking about problems. As in, if you manage to create culture where people do that, you will occasionally hear "team x tried that and had peoblems, lets ask them".


"Ironically, in my experience, just bouncing off ideas of one another is not the way this works well. What you can end up with is people throwing their ideas into the ring just for others to find faults in it. In the worst case, this can turn into a competition who knows most and who is smartest."

This right here is exactly why I learned to say nothing in brainstorming sessions and try to dodge then outright if I can.

My experience is these are typically for someone higher up in the pecking order needing their own ideas validated.

My usual work around is if something needs to be done, quietly work on a POC on your own and solicit feedback or even bring in stakeholders.


I see another person using Fred Brooks "accidental complexity" in a wrong way.

Software development is hard because "essential complexity" is hard. Whatever people want to do is hard. You can make it easier to write software as much as you want, but you are not making "world" easier. That is main point of Fred Brooks essay.

I see a lot of devs are making statements that now development for them is essential complexity. While no, software and code never will be essential complexity, software only helps us to solve essential things faster.


IMO, technical leadership isn’t allowed to build at scale or the leadership doesn’t exist.

The race to the bottom incentivises profit over principle. Boards don’t care if something is built to scale if the competitor gets to market with a cheaper solution. Scaling is a day two problem.

I’m bearish on my future employment as someone who questions the motivations for profit driven development.

I naively thought the market would reward the best product, instead I see the cheapest product being rewarded. What happened to innovation and doing what’s best for society?


> reward the best product, instead I see the cheapest product being rewarded.

That's because you are looking at best from one angle and the market from another.

To the "market" the best may well be the cheapest thing that mostly does it's job/most of the time.

When I was young I used to think users cared about bugs and they do but they only care when the bugs are sufficient in number and degree to cross a threshold where they outweigh the utility of the software, That sort of 1 in 100 times it loses an hours work, well if the 99 times it saves me an hours work, shrug.

What we (as techies and I'm generalising wildly here) want to build is heirloom quality carpentry, what we get to actually build is franken-furniture from ikea packs that we hope has some some instructions and won't fall apart next week.


One bug in a shopping cart preventing checkout ruins the entire product. Bugs come in a different shapes/sizes.


> bugs are sufficient in number and degree

In this case that would be a high degree bug.


You answered the question yourself. Cost is by far the most important factor, so if you focus on quality you will not be able to sell your product.


As long as our resources are limited we always need to consider the tradeoffs. This is not specific to software.


I think it's fairly common for even established companies to prioritize short-term costs over long-term costs, even when the long-term costs of NOT investing in good tooling are astronomical. So absolute cost isn't really the deciding factor.


> The Team Topologies book suggests to favor teams that are end-to-end, that fully own a problem to be solved, supported by platform teams and teams that manage a very complex piece of technology.

Does anyone have first hand experience with this working really well? It seems sensible but in my experience it does not work. The platform teams, I think, should be collaborations between a small number of members from each end-to-end team. The platform would be allowed to grow only as needed by the use cases. As soon as you put a platform team in charge with a mandate, what they build and what is needed starts diverging in the name of "Long term planning". Instead of fixing the struggles of today, they think they have the capacity and formula for fixing the problems of a year from now. In my experience, they do not. Would love to hear from others


I worked in an automotive project where we switched from "subsystem teams" to "feature teams" organization structure in the middle. On the requirements and system test level, it was a clear advantage. For the implementation it was a mess because where we had clear owners previously, it was now a lot of discussion who should be responsible for maintenance.

From Jan Bosch, I heared the proposal to have "component owners" in addition to "feature teams". They are senior developers, normal members of feature teams, and explicitly responsible for specific components. This means they should review all code changes from any team but not do any functional changes themselves (as long as they wear the "component owner" hat), just maintenance. This might work. (Unfortunately, he hasn't published this idea anywhere yet. It was an internal talk which I cannot share.)


Figure out the problems today have happened before, and will probably happen again. Not just the ones people get upset about, any idiot can see those.

Fix the ones people blame themselves or each other for. That’s more future proofing than most people do.


Computing is a lot like American cities, in the that the infrastructure (yak shaves) that would make everything easier are almost invariably under-invested.

One of my favorite examples would https://en.wikipedia.org/wiki/Capsicum_(Unix). It's incremental which is good. But people wrote the kernel part and declared mission accomplished like Bush in 2003, ludicrous! One has to do that and then overhaul the userland and either get those patches upstreamed. And these days "userland" isn't a collection of init, shitty scripts and Gtk/Qt, but a bunch of libraries, especially programming language standard libraries.

This would dramatically change the security and ergonomics landscape, because so much global state programmers

"Containers" was always the wrong metaphor because software is about composition, but containers are inert and only their mass and volume composes (very crude). Better to think about plumbing or rail.

Another perhaps more opinionated example is getting everyone to use Nix (or fine, something like it.) Whether with container style virtual global state, or nicer capsicum, we need to make it trivial to install and develop the entire commons. All "builds on my machine but you can use it" just leads to a lack of integration so no one can help the original author smooth over the gaps. It will also allow machines to be demystified, allowing people to toy with all the software on the system, which will help reduce the programmer alienation which allows so much accidental complexity to occur in the first place.

But yeah, almost nobody is doing these things at the scale they deserve, and even the megacorps drown in their own technical debt like sunbelt cities that are just metastasized suburbs. Eveyrone is in the "well I'll eat shit and shut up as long as my competitor is too" mind site. It's disgusting.


Scale up vs. Scale out.

Those require different architectural considerations.

E.g. with scale out your chache has to work suddenly in a distributed way.

And as we all know scale-up is expensive, can quickly become very expensive, and more importantly it has upper limits. In most cases scale-out is actually the only option, even if scale-up would work for some while. Many times I have seen architects go the easy route and scale up. They either get promoted or switch companies until their solution either gets too expensive or can't scale any more.

In the meanwhile other systems are interacting with this system and would need to make changes to adjust to a scale-out system.

E.g.: A distributed cache can result in lower write performance, or someone else could have overwritten a certain entry because locking failed (e.g. SELECT FOR UPDATE).

In many cases those systems join the list of legacy systems.


>Everyone is still terrible at creating software at scale

Terrible is a relative term. Terrible compared to what?

Who said/showed (much less proved) that there's a better way and we just don't follow it to achieve it some optimum?

There's also a semantic confusion here. Compared to e.g. the car industry we're infinitely better at "creating software at scale". We can create a billion copies of a software and distribute it our the world, with marginal cost close to zero.

But the author doesn't mean "creating software at scale" like when they say "car production at scale". They mean production of "large software".

Well, let's see the car industry make a car with the scope, flux requirements, shifting environments, etc that large scale software has...


Rather meandering and over-verbose, this post never really seems to reach its point.


Just like most large scale software.


I've legitimately never seen so many words used to say absolutely nothing.


Software is predominantly a fashion business, not a technology business.

Just allowing the market winners to drive that fashion means we aren’t able to progress the state of the organisational art scientifically.


The problem is we stop pursuing answers on this topic, thus stop making progress. It's basically like what Bret Victor has described in 'The Future of Programming'. [0]

There were a lot of language zealots at the end of the last century, especially on evangelizing Object-Oriented Programming. Nowadays everybody can easily counter those arguments with 'No Silver Bullet' without further thinking, it's arrogance in disguise of humility. There are still a huge amount of accidental complexities to deal with in most tech stacks. Most businesses would die fast and leave nothing behind anyway, while the progression of the industry would accumulate and benefit the whole industry itself.

Java looks slightly better for creating software at scale than C. C looks slightly better than FORTRAN. FORTRAN looks slightly better than machine code. Say there's a language that looks like Haskell but has tooling and ecosystems as good as Java, I believe it would also slightly better than Java.

[0]: https://youtu.be/8pTEmbeENF4


Two quotes stood out to me:

> What if software were built in the same way? What if the core parts of our business would be like streets, and all that newfangled stuff is something we could build on top, experiment, tear it down if it does not work?

To me, it kind of already is. The foundation on what we build most software on today; TCP/IP, HTTP, Win32, POSIX APIs, the C-runtime library, threads & processes, etc. Much of that is about 50 years old now.

> Somehow, the code in front of you is just the tip of the iceberg of a lot of mental representation of what is happening

I think this is a key point. When we look at code, the comfort with it is relative to how well we understand that abstraction, and have a good model for what it is doing "under-the-hood". I think a lot of the burn-out and "fatigue" some people feel is where these abstractions are regularly reinvented (I'm looking at you, JavaScript frameworks), and thus you have to spend time learning the details of a new abstraction before you can just read/write a piece of code comfortably.


The problem is that there are perverse incentives for people to enrich themselves as quickly as possible and make problems like performance and scalability a liability for someone else.

I can create a startup, get a massive amount of funding, sell the company and make millions of dollars... all without ever seriously caring about performance and scalability.


I prefer single-duty one liner pieces of code/commands over large and complex programs that tie all these code snippets into one large abstraction.

I would not be hired because I only enjoy executing 'one-thing-well' snippets of code/commands. My minimalism would be a turn-off for most software companies or code shops.


Bad software (therefore all software) is like the recurrent laryngeal nerve in giraffes [0] (a freak of evolution). It is unlikely to be the result of a singular minds "design". It is more of a result of evolution through countless independent decisions, each of which are individually an improvement, but put together results in weird inefficiencies that are impossible to fix without starting from scratch. And when was the last time you were allowed to "redesign" something from scratch at work?

[0] https://bioone.org/journals/acta-palaeontologica-polonica/vo...


If you think of all of the decisions that are made between having no software/users/"scale", to being something like Netflix, you can imagine how many opportunities there are for things to become "terrible" (or really, for a system to be made a small percentage worse, which obviously accumulates over time). I would challenge the assertion that most sufficiently scaled systems are "terrible", more that a person did not like them for one reason or another. If a system accommodates the demands of its users and maintainers, I don't really think you can label it terrible. Scaled software isn't art, it is either good enough or it isn't.

I would argue that it is becoming easier for "terrible" software to scale well. I would say that's a much bigger win.


Some reasons that I don't think have been covered yet:

- experienced programmers are promoted out of the tech track - it causes all sort of problems, people keep reinventing the wheel, they are forced to relearn through their own mistakes, transfer of knowledge and skills is hampered

- (probably as a consequence of the first point) people with 3 years of experience are believed to be senior programmers

- worse even - experienced programmers are promoted to non-coding roles aka architects. Over time they increasingly disconnect from the tangible artifact (code) while still hanging onto a false belief that they can function just fine by embracing it through a metaphor (diagrams, etc.) ("Simulacra and simulation"?.. but possibly I'm digressing)


Cities are a good metaphor. Now consider we want to move some buildings around. Teardown some buildings. Create some new ones. Move a bunch of roads around. We need to think about water supply. Electricity. Where to put the people. Which roads can be closed. How traffic should be diverted.

Luckily with software this can be done more easily. We only need to design the new architecture. Design how everything should be executed. Finally we hand over the blueprints and let the computers do all the work.

But it's still going to take some time. A process which is largely invisible. The construction yard is there. But everything is code. You can't really take your stakeholders for a tour around all the work that's happening.


> I find myself coming back to the film business,

The difference I think is that film-industry has separated the creative parts from the production parts and the creative part is not really complicated at all. Write the screenplay. That can be innovative but it is a single author typically who will do it. No complications because everybody's screenplay has to work together. Because there is only one.

In other words making movies is more like making buildings than making software. Buildings can be huge and very expensive but their design is still quite simple and based on designs the architects have created in the past.


TFA mentions “Notes on the Synthesis of Form” by Christopher Alexander but doesn't really do the book justice: "things get exponentially easier if you take problems one piece at a time".

That's sort of what the book says but it's missing the critical part. The whole book is about how to split the solution space into pieces that can be taken "one piece at a time", as it's obviously not the case that just any split will do.

It goes through some real world examples of design and goes on to build a more general semi-formal theory of design, components and independence.


One thing that might make software fundamentally different from other stuff, is that before being able to make a meaningful contribution a person needs to invest X hours just to understand.

In most other creative fields, this can be hand-waved away. No other engineering field has anywhere near the complexity.

Therefore, all important people on a project are stakeholders of some sort, and anyone else is not able to make real improvements.

The project will move in whatever direction satisfies the incentives of the important stakeholders.


> No other engineering field has anywhere near the complexity.

I’m a software engineer, so am not qualified to say this, but I highly doubt this is true.


I remembered this quote from Grady Booch, at 13:00 in this video:

https://youtu.be/adiVOdztQ34

"These software artifacts we produce, are perhaps some of the most complex humans have ever created."


Maybe it's time to finally differentiate the tools for software as in functions, algorithms, classes, libraries, etc. and software as in e-shops, databases, sso services, streaming services, etc.

We have good science and steady (albeit slow) progress in the former while it feels like the latter is more or less subject to stagnation. For instance, when I need to setup a DB cluster, why is there no tool that takes my requirements and generates deployment scripts, monitoring, migration tools, etc.?


The latter is called industry experience.


I’ve come to think about this more and more in terms of politics. The core problem is that not all people come to work with the intent to build high quality software in a rational and efficient manner. Also, what do you think film production would look like if you just hired a lot of “senior film developers” and let them hash out who should be the director and who should hold the microphone bar?


I agree with the post that software at scale is a big open question.

What gives me hope is that there were successful mega projects though not about software. Für example, I read about the Manhattan Project over Project Atlas to the Apollo Project. Those were all mega projects with high innovation and uncertainty like software development. I have no good theory what makes it successful though.


> There is something peculiar about software that makes it different from other crafts.

What distinguishes software from other techne is a lack of physics. There is no 'solid ground of reality'. All other forms of making involve discovering and then applying the governing laws.

Hardware architectures, operating system, and programming languages, to an extent, do furnish a phenomenal context and where these characteristics are stable and well established (e.g. Memory Models), science appears [1].

But clearly, this ground of reality is indeed 'soft' and the practitioners usually re-invent not just the wheel, but the ground that it rolls on as well.

The dilemma -- which nearly means facing a choice of a multiplicity of lemmas -- that confronts the designers of hardware, OS, and languages is precisely that tension between generality (and corresponding weak "world" semantics) and specificity (with robust but constrained semantics).

[1]: https://ocw.mit.edu/courses/electrical-engineering-and-compu...


I know this comes with experience but imagining how to scale or how to actually implement it is daunting. Obviously not everyone is going to have to need, but there's just so many tools and so many things to keep in mind to scale. What would you reckon is the best way to learn how to scale? Possibly one step at a time.


Pick the most expensive servers/databases/storage and increase the number of users. Every 1-2 changes in orders of magnitude of the latter will require you to think thoroughly about how to best use resources and not explode your budget.

Also read from the highscalability blog, there's a lot of experience recorded there. See for example this transcript of an AWS presentation about a step-by-step guide to scaling: http://highscalability.com/blog/2016/1/11/a-beginners-guide-.... Of course it's done with AWS bricks but the ideas are universal


Wow this is a great resource.

I found so many parallels in what I've done while scaling up. Only thing I did differently was starting with MongoDB and changing to SQL later; huh.


Thanks, wasn't aware of the blog. Will take a look.


The way I learnt a lot myself was to run applications with little budget, e.g. running them on anaemic servers with few resources. When you have no option but to improve performance, because the alternative is too expensive, it gives you a good prod to design with it in mind.


This is a competitive advantage for companies who can do this better than others. Different technology stacks and different development mentalities make this easier or harder. If it were easier, than smaller "startups" wouldn't beat companies like FANG.


There is at least one massively scaled system that relies heavily on software, that doesn't often get a mention because it just works. When was the last time you picked up a land-line phone and it didn't work because of a fault on the system?


Reading about Neuralink yesterday makes me wonder if in 50 years it will be possible to connect a bunch of minds together to achieve a task none could do on their own. Could be programming or some other large scale task with a lot of interdependence.


That's called communication.


The whole point of the article is that communication on a very large scale is very hard. There's a point that adding more people makes things worse. What I mean is, one day, maybe more people will be able to collaborate by becoming one big brain.


This isn't a software specific problem. 90% of work in every business is inefficient.


Maybe this happens because the inefficiency is resilience/anti-fragility? The implication is that highly efficient businesses exist for a short time but they get wiped out quickly when their environment changes.

Sometimes a CEO thinks up a radical crazy idea. While it distributes through middle management it gets twisted. Once it reaches the front line workers nothing really changes. One might consider this inefficient change management but maybe the company protected itself from a stupid CEO decision.


Every system requires sooner or later another system on top. Entropy is increasing. https://en.m.wikipedia.org/wiki/Chaos_theory


I have often thought about the comparison to film, and film has been around for over 100 years. So in like 2080 software will be like film. In the year 4000 software will be like houses. In the year 10,000 software will be like nature.


I mean, I think amazon does a good job. Google too, Microsoft gets some points too.


We have actually degenerated back to CLI. "You don't need UI" lol.


I'm surprised this article didn't mention test driven development for building large pieces of software!

I thought the test-driven development idea was impractical and pedantic when I first heard about it. I gave it a shot, though, and the thing it allowed me to do was to prevent massive systems from becoming impossible to modify. When a system gets so big that it is impossible to fit it in one's mind, most developers get scared to do new releases because there's probably some critical part of the system they forgot. They forgot all the requirements and QA test cases for at least one obscure part of the system. Even if all the QA test cases are documented, the process to release becomes increasingly difficult and time consuming.

With test-driven development, you can just run all the tests, and if they pass, say ship it. The key is not to get too many tests, especially integration tests which take a long time to run and like to break when designers make UI changes. Usually, I start with the happy path integration test and then write tests as I'm developing for things that don't work right. About 40% of the stuff works the first time, and I never have to write a test for it. 45%, I write the test and fix it, and it passes, and I can forget about it. The other 15% is usually some tricky algorithm with many corner cases, and that's where most of the testing goes. I typically write one happy path integration test and then fill in the lower level tests as needed. When the happy path works, most of the system is working. That said, spending time on integration tests is usually a massive waste of time. It either works the first time or something lower down broke, and one should write a test for the lower level part that runs in a shorter amount of time than the integration test.

I was able to port a large enterprise app from an Oracle backend to Postgres because there were tests for everything. The port just amounted to getting all the tests to run on the new database and the necessary data migration. This migration was by no means a trivial feat, but it was at least possible, and it ended up working and saving the company millions in license fees.

The point being. A system with millions of lines of code is approachable if it has good tests. A new developer can work on it, and if all the tests pass, that developer didn't break anything. I can go back to the code I wrote years ago and still use it if I run the tests, and it works. I can then see how it's supposed to work, add features, and so forth. Without tests, this is very difficult because as systems become bigger, they break more and more if they don't have tests.


A lot of guessing and hunching and try to find one answer to a couple of problems where the question isn’t clear. So I don‘t know what point this article wants to make.


Infrastructure as code services is actually matching the City metaphor quite well. With the huge difference you can rescale and reshape your "city" at will


Software development is research work.

If you take processes from people who build machines and apply them to software, it will fail. Software means inventing machines.


We are now seeing the limits of everything, USB2/3, Gb network (pushing beyond increases complexity to a ridicoulus amount), Gflops/W (2 is CPU peak for electronic transistors) and persistence (the most durable drives where made 10 years ago on 50nm).

So now we know that one man can build, host and improve/maintain a product himself with the entire earths population as customers.

This is a short window, before thermodynamics takes charge and ruin the prospects, enjoy it while it lasts!

The first one to do Minecraft without hiring anyone or selling it wins. :P


In a few years, 2.5Gbps networking is likely to be where 1Gbps networking was not that long ago. What was once niche and expensive will be $20/port very soon, then $10, then $7...


We have had well working 100 Gbit Ethernet for about 8 years now. It's been trickling down in recent years: https://www.servethehome.com/mellanox-connectx-5-vpi-100gbe-... (meanwhile the high end is at 200G/400G now for nics and switches respectively)


Sure I was talking for the home... most people can't afford $20.000 routers or fiber.

I'm prepared to say that unless they can cram 10Gb/s into existing fiber (without increasing fixed- or energy- costs too much) we're stuck at 1Gb/s... forever!


Anyone with $40 USD can buy a used 10gb ethernet card and wire it with cat6.

I don't know why you think there is some 'limit of thermodynamics' or that someone needs to 'invent new physics'.

Every computer for 15 years has come with a 1gb ethernet port.

2.5gb usb ethernet dongles are $30 USD and can be hooked into a $35 rasberry pi. A 2.5gb switch is around $120.

I don't know where you got these ideas, but they wouldn't have made sense two decades ago, let alone today.


You again?

Are you stalking me?

Do you know how bad your username is?

How hard would it be for me to get you banned?

You don't want to see that hard worked 2000+ karma go up in smoke!

Because then you cannot downvote.


If we take a step back, you have said that "we will be on 1gb ethernet forever" "because of the physical limits and energy" and that people will need to "invent new physics they are only executable in an environment of energy abundance".

I then point out that 2.5gb pcie cards cost $20, usb dongles cost $30 and a switch costs $120 (with everything using the power of a few light bulbs at most).

Instead of confronting this or saying something that makes sense, you say that I'm stalking you, call my user name 'bad', then say that you are going to 'get me banned'.

Did I miss anything?


Fiber cabling is as cheap as copper. But yep 10/25/40 gbe gear is cheaper. Though many have sub 1g internet connectivity and no infra at home and this kind of connectivity is more interesting at a colo.


My whole point is that home hosting of servers is the last revolution and I think I can leverage my MMO partly from two home 1Gb/s fibers! You need two for 100% read uptime if one fiber goes down...

Saving on the most costly: CPU, storage and bandwidth costs... I still need big machines in central euro, iowa, asia but then those would only be for latency sensitive real-time networking.

Patch data, the async. database and even two world instances can be hosted at home!


See I don't think so, because of the physical limits and energy. I think both are peaking at the same time because while you might be able to invent new physics they are only executable in an environment of energy abundance, and peak energy happened 10 years ago.

Try getting 1Gb/s to work without going crazy first! ;)


802.3ab 1000BASE-T works over all of the most common Ethernet cables and structured wiring I've thrown at it, even wiring that probably fails to meet even the loose Cat5 spec.

I've not had any experiences being made crazy by it in the last decade. Some of the very old gear had interoperability issues, wouldn't auto-negotiate, etc, but that's often the case on any new "standard" tech. Nowadays, you can take the random $15 adapter, $2 cable, and $20 switch and expect it's all going to "just work" at gigabit speeds.


What router do you use?


My router is an Edgerouter-X SFP, but that is only the gateway from the local network to the Internet, not the main switch. The main switching is done by a mix of Ubiquiti 8-port PoE switches and a Mikrotik central switch.


And you can push 1Gb/s reliably over that?


I can push 1Gpbs everywhere on the local net. I can pull 265Mbps and push 12Mbps from/to the internet, limited by the 250/10Mbps internet plan I bought.


I think it's my ethernet cables... just ordered cat6 we'll see!


Software is more art than science. Show me the F=MA of software. What is the industry accepted fizz buzz?


The world is terrible at creating software because the developers are the last to be paid.


I sometimes suspect that this is a consequence of the anti-engineering mindset that a lot of the software industry embraces. Things like YAGNI, programmers don't need to study CS, not hiring testers, etc.

Maybe if software "engineers" were treated more like actual engineers then we would start seeing better results.


Does the old adage "80% of all software projects fail" still hold?


Define “fail”. We go through our lives labeling failures as successes.


Here's why:

First, consider a car. Imagine if somebody told you that a random company would be changing the software that runs your car, every day, as you drive it. Freaked out? You should be. That's what a modern tech company is. They just don't have the same risks, so they don't care so much if they fuck up, so the work isn't organized very well.

Next consider cities. Do you want to build something? Great. I'll ignore all the costs (there are many).

First you're gonna need a design by an accredited expert. That design needs a dozen permits approved by the government before you can even touch a trowel or hammer. Then a team of experts guess to town, changing grade, pouring foundation, waiting to cure, installing basic plumbing, and then getting an initial inspection. Then comes framing, inspection, plumbing, inspection, electrical, inspection, HVAC, insulation, drywall, fixtures, trim, walkways, driveways, flooring, landscaping, grading, and inspection.

So far we have used specialist teams to build sections of the building and only continue when a strict inspection by authorities says we can continue. No customer has used the building yet. Also consider that this building will not scale. If you reach capacity, go make another building.

Building software at scale is a combination of monkeying around with a car's internals while someone is driving it, and building a building. Only the risks to human life are lower, and we do not have teams of very specific contractors strictly following government approved codes and zoning laws to build one thing that meets one set of criteria.

The people doing the job(s) in tech aren't good enough at it to do the things we're asking of them. And they aren't just building one piece and walking away, they are constantly fiddling. And very often, they have no master plan approved by inspectors according to well known and inviolable codes.

On top of that, they have never sat down and figured out how to do all of this really efficiently amd reliably. Scrum/Kannan are general processes; they do not tell you specifically how to build a website in an extremely efficient and error-free way. But we've done that a million times by now, poorly. It's because we haven't yet codified it as an engineering discipline and stripped out the fat. And we haven't done that because there's no requirement to, because nobody's life is on the line, the way it is with cars and buildings.

And it's hard. It's hard because we still hand-make components rather than buy them off the shelf. Every company I've worked at has re-created the same god damn thing 1000 times, by hand, because for some bizarre reason they thought it was a better idea to forge steel than to connect plumbing. Imagine if your plumber hand-crafted her own custom fittings for each job.

Really, we do a marvelous job today considering how completely undisciplined, unregulated, haphazard, and dangerous what we do is. There are many approaches we could take to simplify the actual process of it all, and make it efficient. But the complexity and difficulty will be there for a very long time. We can also move to pre-fab, but you'd have to convince people that writing new software just to build a tech product is a bad idea. Good luck with that.


Conversely, writing software for a small scale is absurdly inefficient.

I consult for government departments that often have legally mandated requirements for making something available online. That could be a form submission process, some geospatial information presented on a map, or whatever.

The problem is that when you might have 500-10K user transactions annually, it becomes crazy expensive to write bespoke software, even with the most agile process and the lowest overhead tooling.

Take cloud deployments, for instance. Sure, you can just "click through some wizard" pressing next-next-next-finish and be up and running, but the security team won't allow your infrequently-maintained web server on the Internet without a web application firewall. Setting one of those up is days of tweaking to avoid false positives.

Need to send mail? Azure bans outbound port 25 connections, you have to use Sendgrid, or something like it. Time to read up on yet another unique and special API!

Collecting fees and penalties or making payments to citizens? Woah there... there's a massive API surface you have to learn. Security on top of security is needed. Whitelisted IP addresses. Client certificates. That have to be rotated, manually!

You'll forget some essential maintenance, of course, and then you'll have to set up triggers and alarms so you don't get burned the second time. Which entails mailing lists that change dynamically because the team of contractors has a turnover rate faster than the typical certificate expiry time. Send too many alarms and all recipients will configure an Outlook rule to ignore them. Not enough alarms and you'll miss issues. Just setting this up semi-reliably is an exercise in itself.

Really basic stuff becomes difficult, when you realise that 99% of the alerting and monitoring features in Azure and AWS are designed for systems at their scale. It's all about analysing beautifully smooth curves of graphs aggregating millions of points of data, where deviations are glaringly obvious. These tools are utterly useless when you get one real transaction per day, swamped by a thousand bots. The load balancer health check is 99.99% of the traffic for some of these sites!

Then there's the human element:

Have you tried justifying the time to set alarms for a system where a week-long outage might only affect a dozen customers?

How about the budget to upgrade to a newer operating system for something that is not technically broken -- merely hard to support now?

Or have you tried doing any sort of maintenance on a system that was built by contractors hired for a fixed term, all of whom are now gone?

Meanwhile departments are renamed every couple of years to suit the whims of the latest batch of politicians, so everything has to rebranded. Even tiny little sites use by practically nobody.

I've seen sites up for 10 years, where I estimated that they cost $2,000 per citizen that actually used the site! Madness.


It is easier than ever to deploy software on a small scale. It sounds like your security team is failing to provide a suitable platform and you chose the wrong cloud provider for sending email.


If you're only getting one real transaction per day, maybe it makes sense to just train someone to Flintstone the processing.


Every system requires sooner or later another system on top. https://en.m.wikipedia.org/wiki/Chaos_theory


TLDR: Complex systems are complicated to build


Fred Brooks was right:

"Brooks insists that there is no one silver bullet -- "there is no single development, in either technology or management technique, which by itself promises even one order of magnitude [tenfold] improvement within a decade in productivity, in reliability, in simplicity."

The argument relies on the distinction between accidental complexity and essential complexity, similar to the way Amdahl's law relies on the distinction between "strictly serial" and "parallelizable"."

https://en.m.wikipedia.org/wiki/The_Mythical_Man-Month


To add/elaborate systems get complex because the underlying reality is not simple, as much as we’d like it to be. No matter what paradigm (OO, strong/weak type etc.,) we choose or methodology (waterfall, agile etc) we adopt to build the underlying reality can’t be changed or mounded to fit the software, it is always the other way around.

Just a few days ago there was this thread about date calculation. One would think how complex can it be? But as one digs deeper they keep discovering layers and layers of special cases.

Take Uber as another example. Push a button get a car was their aim. We see people commenting here that they will be able to build it over a weekend. But the real world of cab hailing has thousands of unwritten, implicit rules that are in people’s head. All of them have to be codified and need to work seamlessly, and on top you add constraints of distributed systems and physics etc.

Or take tax laws, it’s a labyrinth of rules large enough that people have built DSLs.

So, any software that’s solving real world need will be complex. We just have to bite the bullet and deal with the reality.


This the template for a clickbait headline on HN.


I enjoyed this.. fun sort of intentional-meandering style.

I propose that scaling (as defined here) is a general problem, not just a software problem. I realize this is a hard case to make. Technical debt, refactoring hells and such don't plague film sets or factories like they do code shops. But.... Consider this:

- Consider factories. A factory is just a huge pile of physical capital organised to efficiently do a specific, defined task. Everything is optimised to reduce marginal costs. Capital efficiency is strictly enforced by capital money markets. Marginal cost efficiency is strictly enforced by real markets.

... Either of these two constraints (marginal costs too high or not enough capital) usually come into play before engineers have had a chance to redesign, refactor, repurposed and abstract a factory to the point where these are a problem.

- Everything that is in a movie is in a shot. Movies have a finite number of shots and a director can direct each one. A film may have a whole team replacing Toby Philpott with Jabba the Hutt, an army of costume designers, set designers, actor psychologists, etc. But, everything that gets into the movie gets there in a shot, and the director can work shot-by-shot and wield god-like control of lots of labour that way.

Software is unique. Software is free from most physical limitations. There are no material costs. There are no capital costs. The only economic resource that a software enterprise wields are the engineers/engineering itself.

When a Ford Company scales, it raise money. It uses that money to to build factories, buy materials, etc. These are all scarce/finite resources needed to start/continue making additional cars.

Google, MSFT, FB, post-AWS amazon... When they scale up, they just scale up. They hire more engineers. They produce more software. The only "resource" being scaled up is the people making the software.... Something has to be the limiting factor.

In any case, the "scaling creative work" problem does exist in factories too. The difficulty 50s era auto manufacturers faced competing with Toyota are sort of evidence. They struggled with "technical debt" in the form of car models, factory design, company culture and such that couldn't adapt flexibly.

A lot of tesla's wins (besides marketing, fundraising and software) have come from their recent blank state start. Auto manufacturing today is highly caught up in the "Toyota Way" of doing things. It's been that way for decades and parts of it are explicit assumptions of international trade deals. It's very compartmentalized. Flexible within compartments. Rigid without compartments, whether they are outside the department or outside the company. It's a lot like the city metaphor this article mentions.

The problem is... sometimes you need to design the factory and the car simultaneously. At times that has screwed Tesla. The quality control benefits of the Toyota Way are not to be taken lightly. At other times though, it works. The factory designer has been banned from designing the car and the inverse for 40 years. I think their version of technical debt is the reason.


I really hate titles like this.


I'm not.


+1 for Design Thinking




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: