> When civil engineers make mistakes like this there is a postmortem and we learn what the mistake was and how not to do it again.
We do that in software too. It’s called an agile retrospective. Somehow it leads to more of the same crap.
And thats because management usually doesnt allow for proper engineering. They call that getting lost in the details.
So then we end up with things failing miserably again and again. But agile books get sold, mckinsey and other bastardisation factories generate profits, and so on.
Yeah. We pretend to do it in software. If the liability isn’t placed legally on anyone’s shoulders, then it’s not actually being done.
That’s indeed what makes it easy for management to consciously or subconsciously allow outcomes that do real harm like people not getting paid. There is no incentive not to be a %#^*up. At worst a manager looks bad. But they can always blame the engineers since there’s a system in place, so it couldn’t have possibly been the manager being a %#^*up.
> We do that in software too. It’s called an agile retrospective.
A retrospective is not an incident postmortem, very different things.
A retrospective tries to look at how the agile process went wrong again this past sprint, but is not focused specifically about one production incident.
Companies should be doing incident postmortems on any incident of significance to identify action items to improve what went wrong on that specific case.
A retrospective’s documentation is essentially the smallest opaque piece of what an investigation would draw upon working backward. It happens fairly quickly after the causal factor is introduced and is likely to be completed before the software actually causes harm in production at a scale like this.
The results of a retrospective could make for a good source for recommendations, but only if it covered the causal factors.
In civil engineering there is a responsible engineer. Nobody cares what management says - business takes a back seat to public safety, the engineer is in charge. Designs and products are required by law to be backed by an engineer who is personally responsible so the business excuse of “management told us” no longer carries sway: management cannot release a product without a responsible engineer by law so management MUST ultimately be subservient to the math.
Software “engineers” cling to their management driven paradigm and resist true accountability and responsibility for engineers because, ultimately, it’s easier and more profitably personally to hide behind the excuse that “management” made the decision.
> Software “engineers” cling to their management driven paradigm and resist true accountability
Yeah just gonna stop you there...none of us were given the choice to be accountable and hide behind "I will lose my license if I commit/deploy this." I'm also an actual engineer in addition to a software engineer, and being able to say "I'm not willing to go to jail for this" works when you have a P.E. license in an actual engineering project, but definitely does not work in the software industry. It doesn't even work in engineering if you don't have a P.E. outside of "No this will literally hurt/maim/kill someone"
Yes, either way (software or P.E.) you might be fired for saying "I'm not willing to go to jail for this" but it's much less likely you'd be fired for it in "hard" engineering, because everyone else on the team understands that you're not being entirely hyperbolic when you say "I'm not signing this because I could go to jail." If you said that in a software project people would rightfully think you're fucking nuts and don't understand how the world works, because there's an infinitesimal chance a software engineer would go to jail.
Even the criminals who deployed the Uber Greyball thing didn't face any consequences, which truly shocked me, as that was blatant obstruction of justice.
> Software “engineers” cling to their management driven paradigm and resist true accountability and responsibility for engineers because, ultimately, it’s easier and more profitably personally to hide behind the excuse that “management” made the decision.
I wish it was that easy.
In almost all software companies, with the exception of one where we built software security tooling, management asked for cutting corners, to which engineers objected.
Guess who got the blame when things went according to what has been said?
The engineers.
In one case it was so pathetic that the company lost tens of millions of pounds in revenue. There was a code freeze.
Turns out a marketing director _told_ an engineer to apply a change. That change tipped everything over. Fortunately the marketing director was “freed” later on.
Similarly in other companies people payments were either processed twice, not at all, accounts lost, user login details mixed or leaked and so on.
Every darn time technical debt was raised someone would say that we should not over engineer things.
Who said the "computer programmers [..] let people down"? "A computer failure" is the rallying cry of middling bureaucrats all over the world because to them it translates into "no one is to blame", used most prominently of course when something is their responsibility.
If you can't make payroll with the payroll software, start hand signing checks, but don't blame "the software" and "the computer programmers".
Okay this is probably something that needs a very high level of stability. But many apps/websites arent about life and death. If hackernews goes down no one dies. If a bridge goes down people might die. That is why civil engineering is more regulated (and work takes more time than if it wasnt).
I think programmers can actually misjudge when their bugs are negatively impacting people's lives, especially when they are in the mindset to justify not fixing a bug or improving on a broken design.
I've had some experiences working on some popular software, and have definitely seen people trivialize the impact of a bug in ways that do not appreciate that end users are putting the software in the middle of their life, and hence not understanding what it means to the end user when stuff is broken.
So a programmer saying "dude relax, it's just a website" doesn't really hold weight to me. To you, a programmer who doesn't want responsibility or hasn't prioritized a bug, it's just a website. To the end user, maybe they are using it in a way you don't expect, they want it to work, and it actually is meaningfully harmful to them when it fails in some way. You can say it's on them for "misusing" your app but ... Again, that's blaming the user for your bugs.
A real world example I recall was a colleague who trivialized data loss and reasoned that nobody stores important stuff on that app. The customer complaint was that the "trivial" data that was lost was a recording of his deceased father, or something of the sort.
> A real world example I recall was a colleague who trivialized data loss and reasoned that nobody stores important stuff on that app. The customer complaint was that the "trivial" data that was lost was a recording of his deceased father, or something of the sort.
Wow. I would be beside myself. Ecamm PhoneView is something I used regularly. They discontinued it and suggested moving to iMazing, but it doesn’t contemplate the same use case or workflows, and is a poor replacement albeit very useful and powerful for other uses.
This is why I strive to place myself in the customer’s shoes when building and improving something. Don’t take from them, listen to their feedback, let them guide the product within reason, and don’t lose sight of the fact that they’re paying you to work for their interests.
Some software work requires the same level of engineering as public works like roads, bridges, electrical grids, pace makers, etc.
Most software probably doesn't require that level of rigor. IMO, those who work on critical software ought to be formally licensed like "normal" engineers who design roads, bridges, and so on.
I think about chiropractors when I think on this topic. They don't have the same training, the same level of rigor or skill (IMO). But they can still help people, hell some folks swear by it. But the point is, they aren't formally licensed in the same way and still make their living.
It isn't a perfect analogy, admittedly. Nor is it a welcome topic on this forum! Regardless, I believe some industries should require licensed software engineers.
Industry standards are overwhelmingly better than government regulations (laws).
> Okay this is probably something that needs a very high level of stability. But many apps/websites arent about life and death. If hackernews goes down no one dies. If a bridge goes down people might die. That is why civil engineering is more regulated (and work takes more time than if it wasnt).
Yeah, sometimes folks are working on something that doesn't matter at all, but that's the exception rather than the norm.
As a general rule, what we do can cost someone their life or their livelihood: sure my foul-up might not kill any users/ customers, but the lost ad revenue might cost some junior Nobody their job as part of budget cuts. That person loses their income for rent and for food, they lose their health insurance (maybe for their family), and I don't have a care in the world because my foul-up didn't impact me in any meaningful way.
Everything involves risk and we shouldn't punish individuals for taking a risk we told them to take -- but we often take risks without considering the consequences, and without _bearing_ any of the consequences. It's called a "moral hazard".
It requires tech to move beyond the standard refrain that this is the manager’s fault. It’s hard to get somebody to accept liability when it can directly hit their pocket book.
But it is the manager's fault: that's the person who holds all the cards in the decision-making process. If you want it to be someone else's fault, then start advocating a different chain of decision making, rather than complain about it being a default refrain.
Other engineers don't take direct responsibility either. I've never heard of an individual engineer getting sued because a rocket they worked on exploded, or a bridge they built collapsed.
Plenty of tech companies do post mortems. This is almost certainly a failure of management. If a single engineer caused this, the process is broken.
What the tech industry needs is professional indemnity insurance and competent management.
I don’t understand how this story is being framed as the union letting the workers down, when it is the employer who failed to pay the workers properly? And the Union has forced them to implement a workaround?
I think the union failure they are referencing is the weak accountability that the union is holding USPS to. The union signed off on this money order compromise even though it has significant disadvantages for the workers and gives full absolution to USPS.
The root failure here is USPS no question, but the union response is weak.
Yes, I understand that the workers are unhappy with the way their union has negotiated for them, but I don’t see how that’s relevant to the employer not paying the negotiated amount. Why is that a failure of the union like the article states?
As far as I understand it, the decertification effort seems to be about a previous unpopular collective bargaining agreement. The issue that the article highlights is that the union agreed to a workaround that didn't work for everybody. I'm not convinced that enough people would be so inconvenienced by this workaround to move the needle on the (already weak-looking) decertification effort, personally.
A union is more important when you’re working for the Federal government or even some state or local government as they make and enforce the rules.
For example, if you work for a state labor department… Public sector worker safety is regulated by… the state labor department. There’s an organizational bias to not rock the boat.
Any Federal or quasi-federal agency is a dictatorship. Some rural mail carrier in Montana or New York is 50 layers away from someone who has the power to affect change. You need the union to represent your interests and short circuit the org. Outside of the federal space, if a company fucked up payroll, the state labor department would unleash significant pain on the company. USPS is tasked with policing USPS!
If the union is failing some constituents, that’s going to drive a desire to get more effective representation. The benefit of focused representation may outweigh a smaller bargaining unit. (In general, it’s better to have 500,000 members than 50,000 members)
I doubt the union “forced” them, cutting checks manually is the standard practice for payroll screwups like this. I’ve seen it happen in a handful of companies, all but one outside their control. Two of them offered to cover any resulting overdraft fees.
The Union stuff is backstory and fluff to fill out the article.
We should expect the resignation of the CIO, at the very least.
My classes in finance said that rule #1 is to make your payroll (followed closely by pay your taxes). Failure to do this - even when the problem isn't cash flow, but IT failure - is inexcusable. It's the CIO's job to ensure that any risks to this are mitigated.
Fireing the CIO only helps if you give the next one enough power to solve the issues, even if it means serious disruption in the organization, with unhappy unions, middle managers etc, which may lead to resignations, short term cost increases and even operational downtime (though hopefully less critical services than payroll)
A lot of organizations, the company culture becomes so entrenched that almost no amount of C-level meddling will make a big difference. And this is especially common in government organizations (and in companies with very strong unions).
Unfortunately, suits and other manager types rarely take responsibility for their failure of leadership. They'd rather blame a contractor or a junior IC than admit they don't know what they're doing.
Why shouldn’t the contractor and junior IC also lose their jobs for this level of fuck up? This is literally someone’s kids’ food on their table. Why does anyone in the chain get a pass?
Because firing people at random isn't likely to prevent future errors.
Errors of this magnitude are either a symptom of some deep-running problem with an organization, or (much less likely) freak accidents.
In the former case, random firings aren't very likely to improve the organization; in the latter case, you'd be firing the people best qualified to do a post-mortem and making sure the same or a similar failure never happens again.
Have you ever worked with large-scale, historically grown systems? More likely than not, the people that have actively made the wrong decisions (or have, through inaction, allowed an initially-working system to degrade) are long gone.
Dilution of responsibility is a real problem, but harsh punishment of everybody that, under some (your?) metric that made an error is definitely not the solution.
If you're truly interested in this (and not just angry and looking for somebody to take it out on, even though the root cause here is probably systemic – which is not the same thing as "inevitable!"):
The aviation industry and its culture of failure analysis and prevention is a great place to start, in my view. We have a lot to learn from that in our industry.
"Large scale, historically grown systems" get like that because there wasn't sufficient accountability in the first place. Things like this don't happen in the aviation industry because there are regulatory bodies holding people accountable, something almost entirely missing from the software industry. That was the original point made in this thread.
> Things like this don't happen in the aviation industry because there are regulatory bodies holding people accountable
Important to note that this has been largely diluted in the US via regulatory capture, leading to things like the 737 Max disasters - so looking at the aviation industry a few decades ago is probably the way to go.
The reason we have hierarchy in business is to ensure the burden of error is shared. Bosses and managers are there to ensure their employees are providing a reliable service (among other duties). Was a process of sign-offs implemented to prevent errors like this? Was there a process by which the employee circumvented processes that lead to a disruption of the service? Or did the boss/manager not keep tabs on what was happening within their purview?
exactly yes - because in a large bureaucracy, "stall and blame" are the daily routine, and your skill at competitively doing this in a crowd, is how you got your promotion.. not everyone has seen this in action, but once you see it .. and it often takes years to see it.. you will know the truth of it..
I believe that this USPS situation is a degeneration of internal fights that are lasting more than a decade now.. it is about money and authority, indeed
As far as I know, no one was fired. The Phoenix pay system for the government of Canada is a good example of so many failures. People really need to stop thinking that building systems with 10s of thousands of business rules is « easy ».
Why do US companies pay out every week instead of every month? Is there some sort of benefit to it? I would think the overhead is higher and makes things more complex especially if you have an outage.
In the US some organizations pay weekly, some bi-weekly (26 times a year), some semi-monthly (24 times a year) and some even monthly. If any one of those is your organization's standard, your payroll's systems are set up that way and the "overhead" is built into your standard procedures.
The article emphasizes "this week" not because the USPS pays weekly, but because Friday of this past week was the standard bi-weekly payday.
Unless the money is in to your hands or in your account, it’s not yours.
In fact, every day in between the day you worked and the day you were paid could be considered theft - as the value of your cash has gone down slightly in that period due to official government policies to devalue cash at a certain percentage every year, and the company is earning interest (or investment) until it’s in your account.
The US isn't what I would call a "worker friendly" place so IMO the payout period is completely under the decision of companies yet most choose to pay ever week or ever other week while in Switzerland for example almost every company pays out monthly.
I was curious if there was some benefit for a company to pay more frequently when it appears to be more of an overhead.
You’re right, and I’m surprised the companies don’t have longer pay periods in the US actually. I also wonder if there’s an economic benefit to shorter pay periods? Because the faster that money is in the worker’s hands, then the faster it’s distributed again into the economy.
Monthly seems too long for me from a worker’s point of view though, you’re effectively giving the company a short-term, interest-free loan at that point. And when you factor in the fact that your purchasing power is reduced by the time you get paid, and any opportunity costs, the real cost to the worker can be quite high.
If I had to guess, it's an artifact from when people had to pick up their paper checks from their boss. Now with computerized direct deposits, I don't see why it can't be every week (or even daily).
Agreed. And if you move apartments/flats, you will need several monthly rents to pay your deposit, so a two-week paycheck is not helpful, especially when you start a new job at the same time.
That sounds like the most ridiculous thing I’ve heard based on the article.
USPS had a “catastrophical” payroll error and has been unable to resolve it.
This is USPS’s job and they haven’t been able to do it. The Union has stepped in and implemented a workaround that will be very helpful to the vast majority of USPS’s employees until USPS fixes its fuckup.
> The Union has stepped in and implemented a workaround
The USPS is, itself, an issuer of money orders. I assumed that's what they were referring to. I'm not sure at all how the union "implemented" anything here.
From the article, it sounds like the union may have suggested or even demanded this as a solution to the problem. Under DeJoy, I wouldn't be surprised if the carriers otherwise would have been forced to wait another two weeks for a double paycheck.
There's still confusion over this; from what I hear most workers are saying they'll wait for the post office to work it out rather than have to pay back money.
I gather the problem is that while the regular schedule time was accounted for properly and paid, the unscheduled time, substitute routes and Sunday / holiday Amazon package deliveries and that kind of thing was completely left out. Many carriers do a lot of that class of work in addition to a scheduled route.
Many folks who do a lot of that are going to be demanding verification of their pay for previous periods now too. Who knows how long they've been screwing that up? It gets insanely complicated with postmasters pretty much estimating hours and pay rates differing from person to person based on arcane rules no one really seems to know.
DeJoy should be in prison for subverting the election to maintain Trump in office. These are aren't technical problems and they aren't accidents. This is willful, malicious infrastructure decay.
And there is a large amount of advocacy for the USPS to take over consumer banking. No thanks. One thing about the government is that you can't sue them for their mistakes.
> And there is a large amount of advocacy for the USPS to take over consumer banking. No thanks.
Where are you hearing this? Only thing I've heard in the last few years is that the USPS used to handle simple banking for people, and that it would benefit rural and underserved folks. I've not heard anything about the USPS "taking over consumer banking".
> One thing about the government is that you can't sue them for their mistakes.
Right, because for-profit banks have treated us so well in recent years...
There are three sentences in this post and two of them make some pretty wild claims without any evidence. I've never even heard of this "postal bank takeover" idea before, so I'd really appreciate some links where I can learn more.
And of course you can sue the government; people do it all the time. If you're harmed by a federal official in the execution of their duty, you can sue. You'll have to prove damages and negligence/malice, of course. Both the federal government and the state governments have explicit tort law laying out the circumstances within which a private citizen may sue the government.
Yeah, that's not a thing. It's been suggested that the USPS revive postal banking, which would merely make it another player in the banking system, not the sole banking entity in the U.S.
We have been doing payroll for over seventy years. To fail to implement a payroll is a dreadful mistake, and a person is responsible
When civil engineers make mistakes like this there is a postmortem and we learn what the mistake was and how not to do it again.
Until we start taking responsibility, like real engineers do, computer programmers, we, will continue to let people down like this