> We had built a system that generated thousands of lines of logs for every test, with lots of “failures” recorded in them. Things like “tried to initialize FOOBAR_CONTROLLER: FAILED!!!,” but we just ran that code on all machines, even ones without the FOOBAR_CONTROLLER hardware. So no one noticed when another 5 lines of errors popped up in a 2000-line log file.
This right there is a big red flag. The whole bendy business is bad enough, but here you're actively training people to ignore the wolf cries.
Don't allow false failures in tests. The entire test suite needs to be binary: either everything works, or it fails.
Likewise, this is why it's important not to let compiler warnings slide. If your build process spews out a bazillion lines of cruft then that one new warning which indicates a serious error will get lost in the noise. I don't go so far as to enable warnings-as-errors or anything but I do build with -Wall and generally won't check in code which builds with warnings.
As a company you can decide where to fall on this scale, I applaud you for -Wall'ing but for most it's too much, the important thing is to make sure that everyone has agreed which warnings are acceptable and you refuse code that violates that.
We're using PHP (so runtime warnings, instead of compiler issues) but we have a zero-error policy, if anything raises a warning or error we refuse to merge (or ticketize fixing since a bunch of stuff is legacy and we're still paying it down).
The standard in PHP-frameworks like Laravel is to automatically throw Exceptions when a Notice happens, which at first I thought was overly aggressive but I would never go back to writing PHP where a Notice is thrown and ignored.
It depends on the era of PHP to be honest, pre-PHP 5.2 (IMO) notices were mostly for pretty small stuff that was accepted as being necessary for sanity, i.e. trying to read a value out of $arr[$dem1][$dem2][$dem3] would require three separate isset checks to do safely, I always leaned toward building a user-space recursive isset function but it came with a penalty to performance.
Starting in 5.4 and getting far more prominent in 5.6, PHP started breaking bad backwards compatibility and failing to respect notices could easily lead to errors on version bumping up to 7.
As I mentioned in the sister comment, by ignoring notices you're setting yourself up for garbage values being passed through your code and into user-facing functionality, including handling of money. That doesn't depend on the PHP version.
There's an extra point with notices: they are likely to lead to logical errors instead of technical ones. The algorithm works, but it receives a garbage value (null aka zero, usually). So it gives some garbage as a result. In nine cases, resulting garbage goes further into various barely meaningful code, but in one it will go into some user-facing functionality, including handling of money.
Isn't that the point of the article pretty much? You easily get used to ignorable errors, making it harder to catch the non-ignorable, if you don't pay attention to not do that.
Software has a luxury of being discrete most of the time. So we can actually do discrete―and automated―checks on whether something works as we think it should. Humans only read the end result, and even that isn't always necessary.
In fact, maybe NASA should have similarly ‘unbendable’ (cough) software checklists. The temperature is 39ºF? Red color says no launch today.
The turn in the middle of the article about taking care of yourself is interesting. If you only skimmed the first part of the article you might miss that positive message
I was very surprised to discover this part. It also led me to discover a rabbit hole (spurred by the 'Spoon Theory' example) for which I did not have a name previously. It is always amazing how certain articles suddenly give a name to a phenomenon that I had to use so many words to describe.
I've just learned about this spoon analogy and absolutely loved it.
What's better, it's not just about disabilities. You've got kids to take care of? Here goes one spoon. You are not a native speaker and a bit slow in getting what others say? Here goes another one. A sick spouse at home? The third and the last one is gone and that's when things become interesting!
I don't really know if I'm comfortable applying spoon theory to contexts outside of disabilities (and even outside of chronic illness). In the context of chronic illness going out of one's spoons has serious, long-term, immediate, and maybe debilitating repercussions for the person in the management of their condition. The original context was that someone has like 10 spoons that day, and showering is 2 spoons and clothing onessels is 2 more spoons unless a piece of clothing has buttons which makes it 3 spoons due to joint pain in the hands. This was in order to show how incredibly resource intensive chronic illness management is and how it limits a lifestyle.
It is a much different thing than having to take care of a child, not being a native speaker, or having a sick spouse, which describe additional difficulties onto an otherwise functional human being. Spoon theory explicitly applies to the context in which fundamentally functional actions like self-care take significantly more resources than expected.
I have some concern that this is actually a dilution of the original meaning such that it removes its original usefullness to the chronically ill community, which has to deal with the stigma of being potentially severely limited with an "invisible" illness, and the spoon theory was an attempt to describe the "invisible" illness's effect on a person's life (therefore aiding in destigmatization). By diluting the original meaning, we may be contributing to the continued stigmatizing of "invisible" illnesses, as now spoon theory is changed to mean something almost opposite of its original meaning.
I hear what you say and indeed, on one hand the dilution may be seen as a problem.
On the other hand, do we really want to build this wall by saying there are chronically ill people, there are healthy people and these two crowds are governed by completely different rules, have absolutely orthogonal problems and no parallels can be drawn between them to avoid the "dilution"?
Or we'd rather say that disabled people's problems are the same as healthy people's problems, just worse and affect your life in more aspects? Wouldn't that remove the stigma and help us understand each other better?
The problem is that this example is applying spoon theory to an almost completely different situation due to a misunderstanding of the theory as a result of continued dilution of its meaning to adapt the theory to non-chronic illness situations. In this way, it portrays spoon theory as a way to generally describe overburdening onesself with responsibilities who is otherwise functional, when spoon theory was originally meant to describe the significant effort in fundamental functioning. The original meaning, the meaning that applies to chronically ill people, is arguably entirely lost.
Disabled people's problems are not the same as healthy people's problems. Spoon theory was a way to describe exactly how it's not the same.
[x] people's problems are not the same as [y] people's problems
Maybe this helps to build understanding across x and y initially, but maintaining a hard and impassible empathic barrier between groups will eventually lead people to say, "Why should I care, every time I try to relate they say I'm not one of them?"
In this specific case, why is it such a bad thing to expand a metaphor for costs to build that shared empathy? Then we can all relate to each other better.
The metaphor isn't being expanded. It's being actively changed to no longer apply to the original situation. No shared empathy is being built because the situation being empathized with is not the original situation that the spoons theory was originally propgatated by.
Building empathy is the skill of identifying with situations that are not one's personal situations. Arguably, "expanding" a metaphor to suit one's personal situation is the opposite of fostering empathy.
EDIT: Towards "Why should I care, every time I try to relate they say I'm not one of them?" -> The spoon theory was not explicitly to be related to. It was a way to abstract a very real problem that has been broadly minimized into a finite situation in order to explain a phenomenon that a healthy person explicitly does not need to deal with and cannot relate to. It would be similar to me explaining a technical problem with metaphor; I'm not asking for you to relate to the problem. I'm asking you to understand the problem.
As for why should you care? Because this person is in a lot of pain and is dealing with a lot of difficulties that I'm not dealing with, and in fact, upwards of 40% (more maybe) of people in my country (USA) have a chronic/incurable condition. The scale of the problem means that developing a compassionate, understanding view will greatly benefit a significant portion of people one may meet.
Gatekeeping this intuitively universal spoon metaphor isn't going to make disabled people's lives better. A world where only disabled people get to use the spoon metaphor isn't going to have any more people feeling compassionate and empathic toward disabled people. The extension of an idea or metaphor to new situations doesn't devalue it; rather it shows that it is a powerful and useful idea. And before you make baseless claims, I also suffer from a chronic and incurable condition.
My main concern is that spoon theory might help to create and reinforce self-limiting beliefs. Most people are stronger and more resilient than they think, even if they have a chronic illness or a disability. We all have limits, but it's dangerous to think about those limits as hard and fixed.
I'm reminded of Ludwig Guttmann, who founded the Paralympic Games and revolutionised the care of people with spinal cord injuries. Prior to Guttmann, the received wisdom was that paraplegics would inevitably die within a few years from a variety of complications; the standard treatment was simply palliative. Guttmann realised that paraplegics could live long, active and meaningful lives with the right programme of physical and psychological rehabilitation. His approach was tough bordering on brutal, but it saved and transformed thousands of lives.
I worry that spoon theory nudges us back towards old attitudes to chronic illness and disability that define people in terms of their limits rather than their abilities. We can't pretend that everyone has the same opportunities and capabilities, but it doesn't help anyone to think of ability as a finite resource that is continually depleted. We all have the opportunity to develop and grow.
Glad you've discovered "spoon theory" - it's become a common shorthand in some internet communities for all kinds of chronic illness / mental health impairments and the need to explain why a seemingly trivial thing seems overwhelming.
Spoon theory is a attractive concept, but it appears to be largely based on the ego depletion hypothesis. And that one hasn't held up very well now that scientists have attempted to reproduce the original experiments. So I'm skeptical whether it's of much practical use.
Ego depletion isn't really the same thing. Ego depletion is about doing things you don't want to do or not doing something you want to do. It implies we have limited bits of self-control.
Spoons theory is more metaphor, trying to describe the penalty on both time, energy and capacity a disability applies to your day to day to people who have an excess of both to handle the mundane.
One problem I have is that spoons theory says, "You have 20 spoons per day, I have 5." While the end result is the same, it's more that getting dressed in the morning requires 1 spoon for me and 4 spoons for someone with a disability. Similarly, other mundane activities require four times the amount of time and energy.
Also you can borrow spoons from the next day, either by skipping sleep or just outright pushing through an activity well beyond your disability's limit. However, then you have less time and energy the next day.
It's definitely in line with normalizing deviance. Most people who can hold down a job and an independent life with a disability are operating at their limits. They have no spare spoons for emergencies or unexpected events. The people around then normalize that and then become surprised when "one small thing" blows everything out of whack for two days. That person had no more spoons to spare. They borrowed from the next day, and now their Fibro has them bedridden, the stress triggered a manic episode, etc.
I've seen people increase their number of "spoons" day after day by just grinding it out and refusing to quit. Some folks with severe, crippling disabilities have accomplished amazing feats largely through determination and force of will.
Spoon theory is not based on ego depletion except when applied to mental illness; its original creation was based on the experience of someone with lupus and the very real physical ramifications of being limited in lifestyle because of the disease.
How speed limits are enforced in America always bothers me, because there is this great disconnect between planners and everybody else.
Planners think of speeds of roads being intrinsic to the design of the road thus if a people are going to fast on a road, you need to change it by narrowing it or putting in bumps or something.
The other side of this is to think of speeds of a road as based on whats around it, so if there are a lot of houses on a road, people should go slower so you don't hit people so you put in speed limits.
But the problem with limits is that since the road feels faster then the the speed limit, people just go faster then the limit, but since changing the road is a lot more expensive then just putting up speed limits that tool is used a lot less frequently.
It is bizarre that we are discussing making cars entirely autonomous as a way of improving safety, but refuse to consider the much more feasible option of technologically restricting speed on public roads.
It might be technically feasible, but not politically feasible.
While it is a no brainer to say every vehicle should be traveling sufficiently far behind another such that they have enough time to stop before they hit the one in front, this spacing out of vehicles will result in longer travel times as you are effectively reducing road capacity by having each vehicle take up more space on the road. Instead, it's (politically) easier to have each person take individual risks of getting into a collision.
However, with autonomous vehicles, it should be politically easier to force this constraint because the liability will be shifted from individuals to presumably the manufacturer so the manufacturer isn't going to stick their neck out so the individuals can save time on their commute.
Slowing down traffic can, in fact, increase the throughput of the road, especially when it is busy.
This is for two reasons:
1) slower traffic requires a smaller safety gap between each vehicle.
2) the dominating factor in traffic jams is often people responding to the brake-lights of the people in front of them. People over compensate and therefore a "braking bubble" (Soltion: https://en.wikipedia.org/wiki/Soliton ) forms in the traffic which causes it to get even more spaced out.
Keeping traffic flowing is more important to throughput than keeping it flowing it fast. Constant braking and speeding up on a busy road means that the flow rate is constantly disrupted and this causes the traffic to tend towards clumping and congestion.
It's inevitable that people will over or under compensate, and differences in acceleration speeds of different vehicles means the rubber banding is inevitable, especially if you add any elevation climbs to the equation. Add in lane changes, merges, exits, and there's no situation where I would expect a slow and steady flow of traffic.
Maximum throughput at a fast speed with minimal spacing between vehicles is more than maximum throughput at a slower speed with minimum spacing between vehicles. I see this in every urban area, people sacrifice safety for speed.
There's a version of this in SRE where the performance your system delivers becomes the performance people expect. And then they build their systems to depend on that performance, regardless of what your actual SLA is. Paradoxically, delivering better than the performance you're actually capable of sustaining can set things up to break very badly when something fails "within SLA".
There was a post here a little while ago about a google service (I think it was 'locky') that the SREs would deliberately disable if they had over-met their SLA.
This forced the developers of dependent systems to handle outages correctly.
The article draws on the definitive text in this area by Diane Vaughan[0]. Read her work on the Challenger Launch Decision - it goes into the details of why the deviance was normalised. Even down to the level of how important decision making conference calls marginalised technical inputs.
> They put their passwords in their wallet and in their phone.
The author is underplaying the problem here. There were tests that showed burns through the o-rings and the reports rationalized the danger-- not by normalization of deviance but through deceptive language.
It's a lot more like having an audit that shows that no users were observed writing a password on a sheet and putting it in their wallet. And since extant passwords sheets stored in wallets don't match an idiosyncratic definition of "written down" they pass the audit.
That's not to say that normalization of deviance didn't happen. Obviously both it and a more direct type of corruption happened. But I get the sense the author here is trying to cram everything into the former to make a tractable problem out of a messy political situation.
> There's the company where I worked on a four person effort with a multi-hundred million dollar budget and a billion dollar a year impact, where requests for things that cost hundreds of dollars routinely took months or were denied.
I’m so used to seeing this, but I still don’t get it. Seriously, we spend $X/week on some nice-but-unnecessary luxury, but won’t spend $X/16 once for some really useful thing.
I suspect that it has to do with how corporate budgets are designed, but … maybe they could be designed better?
It's rarely quite so concrete, but because money is so countable and is explicitly subject to control that it is seen most keenly. Most companies will have some sort of "financial controller" role, and it all flows downhill from there.
It's where the false positive / false negative phenomenon comes in, too. The person doing the controlling has an incentive to reject as many requests as possible, because they get criticised for any retroactively identified as "waste" - but they don't get to see, and can't count, all the time wasted dealing with the process and opportunities lost as a result.
I once worked for a small company that would let you buy anything under £100 on the company card so long as you sent in an explanation by the end of the month, preferably identifying a client it could be billed to. This worked very well because when you put "£100k engineer time, £10k custom electronics, £10 misc stationary" on the same invoice no sane person is going to question the stationary.
This unintelligible scree doesnt even get the challenger disaster correct. The challenger disaster has almost nothing to do with engineering but has everything to do with management and politics.
For a project of more than one person size, such as a space programme, you cannot separate the management and politics of engineering from the engineering itself.
"But you’ve launched at 40F and it was fine, and then one day you had to launch at 35F and it was fine, and then on a particularly bad day you had to launch at 30F and you’re fine. So you normalize this deviance. You can launch down to 30F, if you really have to. But then one day you’ve missed a bunch of launch windows and it’s 28F and the overnight temperatures were 18F but you did a quick check of the designs and specs and you probably have enough safety margin to launch, so you say GO."
Now contrast this with the wiki entry "NASA managers also disregarded warnings from engineers about the dangers of launching posed by the low temperatures of that morning, and failed to adequately report these technical concerns to their superiors."
From what I recall of the inquiry's findings, that statement is a reasonable (if simplified) synopsis of the managers' (specifically, the managers being referred to in your wiki quote) reasons for dismissing the engineers' concerns.
Here's another quote, from the beginning of the article: "... I think it’s too easy to think of it as just a random-chance disaster or just space/materials engineering problem that only has lessons relevant to that field. And that’s not really the most important lesson to learn from the Challenger disaster!" [my emphasis.]
Your not reading that line correctly. What that line is talking about the physical nature of the device which I agree is not the focus of the article. The article is talking about the human side of engineering.
Now this is suppose to sound like an engineer "But then one day you’ve missed a bunch of launch windows and it’s 28F and the overnight temperatures were 18F but you did a quick check of the designs and specs and you probably have enough safety margin to launch, so you say GO."
But in reality the engineers never said that. The managers made the call in opposition to engineering.
If you want to have an example of when there is a Normalization of Deviance in engineering you need to have the engineers say actually say "GO" and for there to be a disaster. You cant have the managers "dismissing the engineers' concerns" and then turn around and suggest that engineering Normalized deviance. Thats simply the wrong lesson here.
"Now this is suppose [sic] to sound like an engineer..."
You just made that up. No-one else is reading it that way, and for a good reason: it isn't written that way.
Also (following on from what pjc50 wrote above), most (if not all) of the managers referred to in your wiki quote were also engineers - so, not only did the author not make the claim you made up, it could arguably be justified if he had done so.
"check of the designs and specs and you probably have enough safety margin to launch" I hope to god this is an engineer (or scientist), otherwise you have non technical people making technical calls.
Read the second paragraph of the post you are replying to - and, while you are about it, maybe you could finally respond to detaro's point, above.
Your criticism of the article fails on at least four grounds: 1) the managers in question were engineers, as is often the case in large, highly technical projects; 2) engineers can make mistakes, and did so here; 3) engineers can disagree (especially when some of them make a mistake), and did so here; 4) the article does not actually make the claim you say it does.
The primary issue here is the choice of example (Challenger disaster) to explain "Normalization of Deviance".
Its like if you were an SSE working on a product that had some big security issues.
You tell your manager not to launch the product because there is a high risk of a very bad security violation.
The manager under pressure to get it done decides to ignore your warnings and launch the product.
There is a huge security violation just after the product launch and it causes a scene.
Why reach for some pet theory (Normalization of Deviance) to explain this?
This is just the ongoing tension between the desires of higher management and the reality that those on the ground know.
I think the reason why I am so insistent on this is because I am worried about the wrong lesson being learned.
Politics and Power play far more of a role than we like to admit.
This hierarchical structure is also very good at deflecting and distributing blame.
Is Normalization of Deviance just another excuse to explain the commonality of bad management?
It is something of an insult to the accident investigators to suggest that someone just "reached for some pet theory" to explain the Challenger crash (and that of Columbia, for that matter.) There was a very thorough investigation, that clearly established that there had been a normalization of deviance leading up to the final showdown over whether to launch. The commission would have come to a shallow and unsatisfactory conclusion if it had looked at that showdown alone and not recognized the normalization of deviance that had set the stage for it.
I encourage you to read both reports, available from NASA:
The term 'nomalization of deviance' was coined precisely because accident investigators have found a recurring pattern. The recognition of a recurring pattern is more helpful in accident prevention than simply attributing each incident to "just the ongoing tension between the desires of higher management and the reality that those on the ground know."
Here's another example of it, not involving engineers, and where explicit management pressure was not an issue:
Ha! You are not, of course, agreeing with me (or the accident investigation board) so long as you dismiss normalization of deviance as a key issue in the Challenger crash.
This right there is a big red flag. The whole bendy business is bad enough, but here you're actively training people to ignore the wolf cries.
Don't allow false failures in tests. The entire test suite needs to be binary: either everything works, or it fails.