Read this submission and thought - this is the kind of thing dan luu would post on Twitter. Then I saw the submitter!
It's honestly incredible how companies will have issues with the most core fundamental way people are attempting to use their product and no one noticed. Being unable to check out is basically throwing money into a fire and I've had it happen all the time - I take my business elsewhere. People just don't give a shit about their job or what they're actually doing
Many years ago I worked for a big ecommerce "platform" with about 200 stores on it. They broke their checkout for non IE users for 9 months due to a shit security banner overlapping the make payment button.
Why 9 months? Well because no one gave a shit, not because it couldn't be fixed. I was handed the defect and saw there were over 150 customer reports attached to it. Total fix and test time? 4 minutes.
Asked around. Well the CEO was on fire about it, had lost several customers, had several $million on lost sales. Was anyone made responsible? No. And that's where the problem lies. Ownership and responsibility. It's up to that to come from the top down because as hard as you can try and own something from the bottom up, some asshole will no doubt screw it up somewhere in the chain of command.
Same thing at the last 4 companies I've worked at.
I worked for a Japanese corporation, and a really major default posture, was to make sure that someone was always on the hook for $THING, at every step of the development/maintenance process.
We never ended a meeting, without making sure that everyone had specific, well-documented marching orders, and we had an insane JIRA workflow, designed especially, to ensure that someone owned the issue, at every point. GitHub issues (my preferred method, these days) would never have been acceptable, as it does not force ownership.
They had huge Excel spreadsheets, that tracked issues, and there was always a “responsible person” column.
It could be a massive pain, but things seldom “fell through the cracks.”
I've seen so many problems caused by not having clear direction from leaders. I once pointed out to the president of a small/mid-sized company that he was acting CTO and it was hurting the company. All department heads reported to him, but he's non-technical so it was a matter of convincing him what the best thing to do was.
What ended up happening is political shenanigans from these department heads (2 of the departments were technical) because there COULD NOT BE clear direction due to the non-technical nature of the CTO. I recommended he hire an actual technical CTO who could call the technical department heads on their bullshit and actually get a meaningful direction going between the two.
I've seen variations of the headless leadership too many times over the years.
This is one of the more frustrating kinds of bugs in my personal view. I am trying to give you money, why is this hard? Bonus points for letting me get to the checkout screen before erroring out. On the other hand, I do take some cold comfort in the fact that it's self-punishing behavior; usually I react to companies misbehaving by not giving them money, and they nicely help with that, so...
Believe me, it is even more frustrating for the company.
Accepting payments is an incredibly hard problem, regardless of which payment processor you use. No matter how many tests you do, there is always a bank somewhere trying to include some second factor authorization (3DS, etc) in an iFrame that completely breaks but only in production. And when it breaks, it is often impossible to reproduce.
Over my career, I have lost so many customers due to payment system failures... Words cannot describe the terrible punch in the gut feeling of getting a support email from a customer trying to give you money, but they can't.
Another kind of bug that costs money: processes using 100% CPU for abnormal amounts of time. Back in the day when we deployed Windows Small Business Server, sometimes the WSUS process gets stuck and uses 100% CPU. In one particular instance this went on for about a week before anyone noticed.
I wondered how this affected power consumption. For this one particular server we had power consumption metrics and sure enough consumption was something like 30 watt over normal for that week.
So who pays for that? In those days electricity was cheap so for the customer it didn't really make a difference, but think about it globally, how much electricity and thus also money has been wasted on stuck processes?
What the article doesn't mention is that the people making the money aren't the people writing the code (critical path or non). The people trying to make money are usually responsible for workers feeling overworked, exploited and unappreciated. Sabotaging the critical path is an easy and obvious way to “take revenge” on the oppressor. Even when such intentional sabotage is not at play, the exploited workers probably just don't really care about the oppressor’s balance sheet.
The most effective way to mitigate this is to give workers appropriate time off, healthcare coverage, and a fair share of the profits. But no profit-seeking company will ever do that.
Consider the flip side of your suggestion: If you make money when the program works, that also means that you don't make money, or you make less money, when it doesn't. The last thing I need as a programmer is for a poor management decision to put me in a situation where a bug can be blamed on me and I can get my salary taken away.
It's also a bizarre idea that your programs won't have so many bugs if your workers have healthcare. (I'm also skeptical that many programmers who are employed don't have healthcare in the first place.)
Early this century, I was a reader of a rather intense pair of guys that wrote Web Pages That Suck[0].
A pretty common refrain, was talking about design that prevented people from easily giving you their money. NNG[1] takes a similar posture, these days.
They tend to hold up Amazon, as the king of “Make It Easy to Give Us Your Money” design.
This is a very astute observation and it's something I'll bring into our org. Simply going through the exercise to identify the critical paths, with both Product and Engineering involvement, would be valuable.
Once identified, the test coverage, monitoring, PR process, and regular reviews would follow ... but much of the value would come from the identification process because it highlights that not all code in a system has the same value/risk.
Nobody tests end to end any more. The write a patch, run some unit tests, deploy to production, and stuff like this breaks and stays broken without anyone noticing. There's a web shop I like that now literally renders as a blank screen in my firefox setup. I have to use my phone to access the site. I emailed the shop owner and he said omg he'd get after the site builder (it's a shopify site). Nothing has happened at least so far.
Also there is the one where they make you identify pictures of street lamps for N iterations before believing you are human. Hey I want to give you money. Robot money folds in the middle just like human money, so shut up and take my money ;).
I have been frustrated at times with developers not looking ahead or testing their software, PMs not caring that software is tested properly, etc. It feels sometimes like the only metric that counts is how many story points are finished after every sprint and everybody just tries to cross their todos, but nobody steps back and actually tries to use the product.
It's honestly incredible how companies will have issues with the most core fundamental way people are attempting to use their product and no one noticed. Being unable to check out is basically throwing money into a fire and I've had it happen all the time - I take my business elsewhere. People just don't give a shit about their job or what they're actually doing