1) Reversion to the mean. This is what the above person believes and that this likely won't happen again.
2) Indication of a trend. The pilot is actually incompetent, and this will happen more frequently with this pilot than an average pilot.
Regression to the mean is simply that any given datapoint is most likely to be the mean, or close to it. This means that any exceptional data point,up or down, can be expected to be followed up by the mean.
The example of this being misinterpreted that I am familiar with is that of a flight instructor's belief on training. When a pilot performed well, they wouldnt comment. If a pilot performed poorly, they would be punished. They believed this was better because when they praised a pilot, they would usually do worse the following run, and when they punish them they do better. This isn't technically wrong, they are just ascribing causation where there is none. I think I saw this example in Signal vs. Noise, but I'm not sure.
Basically, regression to the mean isn't a reason to pick someone who did poorly, it's a reason that that person will do no worse or better than they do normally.
Flip 100 coins. Take the ones that 'failed' (landed tails) and scold them. Flip them again. Half improved! Praise the ones that got heads the first time. Flip them again. Half got worse :(
Clearly, scolding is more effective than praising.
While, of course, human performance is not always equal in the same way our coins are, the fact that performance (or whatever it is you are measuring) will still regress towards the mean still holds true. The coin example just gives an extremely obvious demonstration about what is happening when things regress towards the mean.
Humans aren't perfect improvement machines, but they surely beat a random variable?!
Neither the GP's quote, nor the OP, are making the statement "I understand that this mistake was an outlier, you'll probably never make this mistake again if you remain unchanged".
The claim being made is that acknowledging the mistake and learning from it can dramatically reduce the base probability of such mistakes happening again.
The question is: is the day-to-day random variation much bigger than the day-to-day average increase? If so, regression towards the mean makes total sense.
This is psychological insight, not statistical insight.
3) Out of remorse, pride, being shocked out of complacency or some combination of these, the man would do his absolute best the next day, especially for the very same pilot.
If labour is so abundant that there's little problem in replacing any given worker, people can and will be fired for trivial offences. If there's a shortage, or there's a considerable on-boarding process, less so.
The threat of a firing-at-any-moment also makes for a more tractable workforce, at least from a management perspective.
Not everything is a financial metaphor.
I think more appropriate in this context is learning though. And I would suggest here multi-armed bandits (with every arm representing a pilot).
There is tradeoff between exploration and exploitation that has to be made.
The goal in this case corresponds to a particular type of bandit. It is to postpone death for as long as possible by pulling the right arm. I actually didn't find this type yet (a mortal multi-armed bandit has a birth-death process of the arms themselves).
Edit: This is only about the learning process on the non-pilot side, as one of the other commentators already articulated.
It would seem to me (somebody who knows squat about stock trading) that a warning screen would be much more beneficial for stock trading software than it is for an online video game.
I'd imagine in the case of the stock trader, where quick reaction is valued, he'd have quickly entered whatever key combo is necessary to dismiss it the warning and made the same mistake.
Alarm fatigue is an important design consideration, but not a reason to not put warnings at all (given that this isn't amenable to an "Undo" button, which is preferable when possible).
You shouldn't fix an issue like this with UI validation alone - it needs to go between the component that creates orders and the stock exchange, so that it also protects against software bugs. For instance, a possible bug is a developer multiplying the order size by the lot size in the backend, when it has already been multiplied in the frontend, causing huge orders to be sent. Sanity checks can catch this.
Automated trading systems have traditionally been under a lot of scrutiny, and nobody in their right mind would run one without sanity checks and a kill switch. That incident taught Mizuho that manual trading can, in fact, also be quite dangerous :)
"A perhaps-undesired recognition is the Hoover Nozzle used on jet fuel pumps. The Hoover Nozzle is designed with a flattened bell shape. The Hoover Nozzle cannot be inserted in the filler neck of a plane with the Hoover Ring installed, thus preventing the tank from accidentally being filled with jet fuel.
This system was given this name following an accident in which Hoover was seriously injured, when both engines on his Shrike Commander failed during takeoff. Investigators found that the plane had just been fueled by line personnel who mistook the piston-engine Shrike for a similar turboprop model, filling the tanks with jet fuel instead of avgas (aviation gasoline). There was enough avgas in the fuel system to taxi to the runway and take off, but then the jet fuel was drawn into the engines, causing them to stop.
Once Hoover recovered, he widely promoted the use of the new type of nozzle with the support and funding of the National Air Transportation Association, General Aviation Manufacturers Association and various other aviation groups (the nozzle is now required by Federal regulation on jet fuel pumps)." 
i am a huge fan of the turn of the century dancer isadora duncan and her lover who she first had a child with, the theatre set design theorist, edward gordon craig who she affectionately called endymion
a complete aside, but if you have, or anyone reading this has, yet to read duncan's autobiography 'my life' i highly recommend it to anyone and everyone
she is a brilliant writer, lived an eccentric life, and she was and her writing is imbued with a mad passion for expression and both life's hardships and joys
So I'd say it depends on your environment combined with the individual. Someone who is apathetic and/or lacks critical thinking skills will probably learn very little beyond avoiding that specific mistake again.
Shows the other meanings of the homophones - English huh!
A pattern of making mistakes can tell you somebody is careless, or sloppy, or in over their head. But a single mistake? That's just inevitable.
Or that you need to fix the process.
When that happens, I hope the people around you are more forgiving than you are.
Or under external pressures they have no control over.
I think it's especially true in tech jobs where there is a lot learned on the job. I would be deeply skeptical of any tech company that takes a fire first attitude. That just tells me they 1) treat devs as disposable and 2) they get rid of everyone that has a chance to learn from their mistakes
More generally it is very difficult to build highly reliable systems of any kind without having a very open culture where you focus on exposing problems and fixing them rather than blaming the message.
Everyone was chipping in with their theories about why the U.S and Silicon Valley were so good at what they do when Jobs lost his patience and butted in in true Jobsian fashion.
"Listen! Take a look around this table. Everyone here has a massive failure in their past. Big, epic failures. In the US, we think thats a good thing. In the UK you think it's a bad thing. Thats it."
Or soemthing like that, you get the idea.
Anyway, i've always liked that outlook regardless of wether it's true or not.
People lying about their progress however, that is hard to deal with.
Having someone rip into for lying when the spec was terrible to begin with is massively discouraging and the OP lesson applies just the same there.
Thankfully that situation is resolved now, though it lasted for too long, and miraculously didn't fall off the cliff it teetered on.
My opinion is that trusting people and giving them responsibility is one of the best ways to let them learn and grow. The bigger the mistake, the bigger the lesson. As long as it is possible to recover, it's a calculated risk which may well be worth it.
As beginners, we're over-confident in our ability, even if we actually suck and make lots of mistakes: https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
The opposite seems to become true - experienced engineers (who have learned from their mistakes) seem to be extra paranoid. I've seen also older engineers that seem to be confident still, talk a bit game, but they just never learned. It seems paranoia is a great indicator of experience, and over-confidence/arrogance is an excellent indicator of a lack of learning. Not to say those are the rules, as there are certainly arrogant engineers that are excellent but I'd rather work with the softer one personally.
I know as I have grown I have become softer, not harder, as I realize my humanness.
The trick - when getting to higher levels of technical management - is learning how to context-switch between being overconfident for the benefit of non-technical stakeholders and being paranoid with the technology so that the things that could go wrong don't actually go wrong.
Now of course donations in the past few months have been roughly double our expenses -- information which is also public. But it's very hard to project confidence during a problem solving session, because it gets in the way of actual problem solving.
I wish I knew where the balance point for secretiveness and openness was for public organizations. But given that all of the candidates for U.S. President seem to be at a bad place, I'm starting to wonder if a perfect balance point even exists.
A manager that can't tell who on their team is good will (in the long run) take an average team and turn it into a bad team.
Exactly. My feeling is that while perhaps many of us learned this the hard way, well at least I did--from experience--in hindsight it seems obvious that our emotional impulses and intuitions, while albeit valuable, are not fine-tuned to success in technical careers.
It's strangely difficult to find this important piece of the methodology. Wikipedia's article doesn't even mention it, and this context is fairly critical to understanding the effect.
I think you're right that some amount of inexperience and youth makes the kids overconfident, but what has been demonstrated is that the task difficulty is a large determinant in how good people are at estimating their own abilities, and Dunning Kruger only applies to easy tasks . For very difficult tasks, the Dunning-Kruger effect actually reverses and becomes Imposter Syndrome . Software engineering may fall into the latter category -- something that is difficult enough that, statistically, on average, beginners are actually pretty good at knowing they can't do it.
"Our studies replicate, eliminate, or reverse the association between task performance and judgment accuracy reported by Kruger and Dunning (1999) as a function of task difficulty. On easy tasks, where there is a positive bias, the best performers are also the most accurate in estimating their standing, but on difficult tasks, where there is a negative bias, the worst performers are the most accurate. This pattern is consistent with a combination of noisy estimates and overall bias, with no need to invoke differences in metacognitive abilities. In this regard, our findings support Krueger and Mueller’s (2002) reinterpretation of Kruger and Dunning’s (1999) findings. An association between task-related skills and metacognitive insight may indeed exist, and later we offer some suggestions for ways to test for it. However, our analyses indicate that the primary drivers of errors in judging relative standing are general inaccuracy and overall biases tied to task difficulty. Thus, it is important to know more about those sources of error in order to better understand and ameliorate them."
Burson KA, Larrick RP, & Klayman J (2006). Skilled or unskilled, but still unaware of it: how perceptions of difficulty drive miscalibration in relative comparisons. Journal of personality and social psychology, 90 (1), 60-77 PMID: 16448310
Many people reading the article think of a nice person that humbly tries their best but makes a bad mistake. But what if they're a hugely over-confident asshole?
I mean, not everyone likes everyone, so when someone you don't like to begin with is arrogant, you might easily label them as an asshole.
I think the deeper lesson of the article is not to consider other people assholes, and to be kind to them even if you don't like them at all.
Then they have absolutely no business working on a team effort.
They never should have been hired in the first place.
I was also one of those developers and have since changed my ways :)
Absolutely however I do agree with you. If a young developer doesn't integrate into a team role immediately, that's no reason to not hire him.
A lot of great developers are so great because they've spent years alone with their computers just cranking out code. So the best ones won't have the greatest social skills.
But if the kid's still having trouble integrating into their team role after their 6 month probationary period, for example, I personally wouldn't vouch for them any longer.
The worst problems I've ever seen on teams spring from people that are arrogant and standoffish, followed closely by people that are too dogmatic about technical minutiae and best practices in the face of situations that require flexibility (my classic example is the prototyping team that spent two weeks setting up an automated testing and deployment infrastructure, even though they only had six weeks to do their game prototype).
I'd take a pleasant junior that's really green but is happy to learn how to code better over a more technically proficient asshole that needs to have manners jammed down their throat any day - tech skills can be taught to a newbie, personality cannot. People have had a couple decades to cement personality habits as junior programmers, so those are mostly fixed, whereas tech skills are super fresh and malleable.
But that's a risk; depending on that is ill-advised.
Of course, there are genuinely poisonous people, and I agree on your points with regard to them.
With respect, this doesn't tell us what to do if we run into a hugely over-confident asshole.
Fire them, if you can. Before they poison your entire talent pool with negativity.
A few years later, in the same company, I was managing the middleware servers that handled all communications for the sales people in Europe. Here, when there was a failure, the sales people would have to fax in their copies of the data if there was a failure, so even more pressure. This is where I really learned to both appreciate the criticality of my mistakes as it could make several hundred people redo their work. However, the most important thing I learned with this was to slow down, and pause for a few minutes so that I could explain what was going on to the non technical managers and team leads. This was more important sometimes then actually solve the problem.
Both of these instances and a lot of others that happened over 11 years in that company thought me ot really double, triple and quadrouple check any change. It created a healthy paranoia, but also taught me that mistakes are ok, as long as you learn from them, but also, lean to communicate with higher ups that could not follow the technical jargo. I think the communication part is something that a lot of very confident people have problems with. However, it's very hard to teach this to younger techs, without letting them fail, so as a senior tech, I have seen problems from new juniors, but I have let them actually make the mistake, and feel the stress and pressure that they have messed up. I have found this to be the quickest method to teach responsibility, while standing by them and their mistakes.
Softer in the sense that I want to foster an environment of respect. As a lot of the senior leadership that I've worked with has retired, I've noticed that the 30-40 year old crowd now is more brutal.
This is certainly true, and it isn't limited to the field of engineering.
A good way to detect these environments is when you observe those who have been there longer not asking obvious questions. That usually means the trust issue is there. When you start asking those obvious questions, at first you get that feedback loop of slight condescension. But then others start asking questions and you often get a fruitful conversation.
Having worked outside of the startup world and in the startup world, I think this is a little more prevalent in the startup world because there is another axis besides experience involved in these conversations: how long the person has been with the startup. It's common to have an official or unofficial hierarchy based on experience but in the startup world, there is another hierarchy based on how long you have been at the startup. That additional axis means it comes up a little more in the startup world (in my experience so far).
Extends far further than the workplace. A few minutes in some of the more popular IRC channels on Freenode, you'll get the same experience. It's extremely frustrating, especially seeing it happen as an outsider to communities you like.
I feel like, in the same way that power generally corrupts people, "feeling smart" can corrupt the same way, coming down to the same attitude of feeling like you have an advantage over other people. Regardless of whether you actually are smart or not.
But if the snark is heavy without being warranted at all, it more than pisses me off. It reflects poorly on whatever you're representing and it's just bad communication. If all you have to offer is a lmgtfy.com Link, don't bother. Those immune to that train of thought won't, the others already have.
Anyone who treats someone else on the team in that way, regardless of seniority, is demonstrating that they're not really interested in the project as a whole. It takes a group of people to build something complicated, and someone who is antagonistic towards other people holds back the effort of the group. You're better off without them.
I will die on this hill.
What I don't find ok is to blame a person for an error going to production. Specially when the error is caused by the lack of supervision to the new hire code (code reviews) and probably QA.
Simple fact is, if you worked for me, I would have fired you for that. Junior devs are supposed to do bad things - that's why they aren't senior devs.
Amen. I don't know about firing, but I'd definitely be having a conversation about proper mentorship. We hire juniors because they show promise, but need to learn. Being snarky and shutting them down doesn't just hurt the individual dev – it hurts the entire company.
My wife used to work as a copy editor at a major newspaper and worked with one of those "laugh at and make rude comments about other people's work" kind of people. The copy editing checks had to be checked by another copy editor before it was allowed to be forwarded to the layout team for placement in the news paper. How the story checkin and checkout system worked was that your submitted checks were anonymous to other team members. So, there was this one guy on copy editing team who openly mocked and call other people's work "stupid" and "idiotic". He could never say the person's name that he was talking about, but since every copy editor was in the same room, you knew he might be talking about your work. Everyone hated the guy and everyone complained about the guy. My wife would say how much everyone just hated working with the guy and he was just one of those people that made the job unpleasant.
Then layoffs came around (Newspaper in the Internet Age). When this guy was laid off, champagne bottle were opened up and people celebrated. You know when someone is bad when there is a "sorry to see you go" bar get together and they are not invited. A strange thing happened though, productivity went up because people felt better about submitting their work for final approval. Less copy editor mistakes, less stress, more learning, more openness and more engaging with other team members happend. Barriers between teams fell and even with layoffs, people felt better.
It turns out that the person who mocks other people's work, make for a crappy work environment. You can be tolerant of failures on technical levels, but failure in personal level should not be tolerated. Be pleasant to work with, or should you should be fired.
Maybe an analogy would do better. In the Army, Soldiers go through basic training and are taught just that - the basics. Once they show up at a unit, they will have lots of questions and lots to learn. But if they show up with their name patch velcroed upside-down on their uniform, I guarantee their Sergeant is going to chuckle before they tell the Soldier to turn it right-side up.
I've seen people write things like: if (true == true)
I won't be mean to you about it, but I can't promise I won't chuckle a little before explaining why that's unnecessary. If you've been working in a professional setting for 3 years and are writing code like that, then yeah, I might struggle to be empathetic.
I'm referencing a time when I was in school, and most everything was taught in Java, where that doesn't make sense. This was written by someone who was almost finished their degree and had already started at their professional job. Definitely junior, but I still feel like that's stretching the acceptable level of noobishness. That said, I have an EE degree, so maybe had to solve / reduce a lot more boolean algebra expressions than a straight CS major?
His statement could mean anything from cruelly laughing in the face of the junior dev during a face to face code review to having a quick heh under his breath from the privacy of his office at the developer's use of "if (foo === true)". Or anything in between.
There are such judgements in every thread.
It's even better when it is paired with the famous "I would not hire you" (or its variation "I would fire you") although the conversation does relate to this at all, the OP has shown no intent no be hired in any company, let alone the answerer's company, and, icing on the cake, the answerer is not in a position to hire anyone.
That is my defense and the only time I've laughed at someone's code. I guess the combination of him talking himself up big about all the amazing things he did at Amazon and the code vomit that he actually produced was too much for me.
I regret it, but he shouldn't really have been that junior after 3 years?
Though I guess I still write pretty shitty code now too on occasion.
Alright, I'll take the karma L in this case.
Well, no. It is the product of someone's work. Someone who is trying to learn and should be supported and taught how to make it better and why it is not as good as it should be.
Hell, every now and then I stumble across some idiocy in the code base and think "LOL, whoever did it this way is an idiot". git blame. "OH, hahaha, it was me, what an idiot past me was, amirite guys?" Everyone on the team takes levity about mistakes very seriously ;) It's healthy.
Mocking condescension on the other hand is a different thing entirely.
Harsh, withering treatment is quite common not just with developers but in all technical fields. I wish things were different, but that's the reality as I've seen it.
(I had a collegue at my last company who worked at Amazon before. His code was just fine.)
That said, I think the interpretation that "he had 3 years experience at a big company, he should have been better" is probably the correct one.
They are so bad that laughing is permissible.
Maybe you just don't spend enough time around arrogant people to interpret this sentence this way :)
Laughing in this situation is merely inappropriate. Firing someone for being inappropriate once or twice is incredibly toxic behaviour.
Keep in mind that I don't categorize laughing at someone's work as a "simple mistake." Bugs can be simple mistakes. Offensive jokes can be simple mistakes. Laughing at someone's work however is deeply troubling behavior that actively undermines trust and discourages cooperation in the workplace. That's why you have to kill it with fire.
Firing somebody based on your own subjective opinions is toxic for culture.
I've worked on amazing teams with plenty of good natured ribbing and I've worked on great high performing teams where you could say "this code is rubbish, you can do better". I've also worked on teams where saying that would really hurt people's feelings and impact moral.
Put aside your own pre-conceptions and look at how your team responds to an event/situation. That's the only way to build a high performing team.
Sometimes that will mean laughing at someone's code is toxic and needs to be addressed, other times it will be a non-issue and addressing it creates an issue, and other times it can even be a good bonding exercise.
I worked in a team where if you broke the build you had to put on a clown nose for the rest of the day. It worked well and was a bit of fun. We still catch up a couple of times a year even though I left that team more than 5 years ago.
I've worked in other teams where pressuring a team member to wear a clown nose would be harassment and deeply unsettling.
"It is a good question why I caused this problem. It is weird to think there is a competent company that has been around for so many years, yet they have no procedures in place to stop this from happening. You would think that any changes that could cause downtime on a clients website would go through an automatic test suite and only after passing all tests would it be tested by human QA and finally made live. In this particular case I am the one who has made a mistake, and being a sensible person who is eager to constantly better myself, I will try and learn from this error, however I work alongside dozens of other people on my team who like me, have every chance of making a mistake at some point. It seems like a bad policy for us as a company to say that we should expect every person on the team to break a clients site and then rely on overtime from other people to fix it. So of course I will try and do better, but this is not fixing the root of the problem, the root of the problem is something that needs to be fixed at a much higher level. Have you, as my boss, not considered this problem already? What has the company learned from this? I would be happy to be part of the team that solves this problem by creating unit tests and creating policy to avoid this."
Of course I purposely make this slightly stand-off-ish, over the top, and written from a very specific perspective, to illustrate a point. But the reason I do this is I absolutely agree a single developer shouldn't feel bad about making a mistake (they should try not to, but these things happen), but the company should put measures in place to minimise the impact of mistakes. When the company fails to do this, the company has screwed up a lot more than the developer.
On a more personal level, if someone shows you kindness in the face of a mistake, the last thing you want to do is throw it back in their face and go on a tirade against them. That's a very quick way to make sure nobody ever wants to work with you again.
Would you be comfortable moving something fragile by hand that is worth a lot of money? Maybe.
Would you prefer if the fragile item being moved was done with an automated process that was shown to best protect fragile items? I would.
"I'm really sorry I caused that car accident."
"That's okay, next time take Driver's Ed and don't drive on sidewalks."
"Thanks, I'll make sure to do that... by the way, why can I even drive on the sidewalk in the first place? Why wasn't it required to take Driver's Ed?"
I'm very obviously using hyperbole here, but questioning a process, to me, often shows a level of maturity in engineering. It's not blame-shifting - you clearly still made the mistake - but it's fixing a root cause.
I do think it has to be brought up carefully and with the right tone at the right time though.
Why does the VP have that kind of access by default? I understand having a separate account if needs are there, of having some sort of priv-escalation.
In my new job as network engineer, all the RH boxes have SElinux off. I don't get it, nor can anyone give a cogent answer. And up 4 floors, we have a selinux kernel Dev.
I'd greatly like to even limit roots potential damage against users' data (in this case, DBs).
Two common reasons. First, because the VP or someone in a similar position resented not having such access; it's sometimes hard for people to accept that people who work "for" them hierarchically have permissions they don't. And second, because either they or the person who previously held their position had a legitimate reason for such access in the past, and didn't drop it when they no longer had such a reason.
But yes, even if they have a legitimate reason for such access, they shouldn't have that permission all the time, only when they intentionally don that hat.
I hate making mistakes, but if more robust procedures are the result, well, it's an overall win!
His reasoning (later revealed) was that you can't give people fish and expect them to become expert fishermen. They have to experience hunger and ask to be taught to fish. If people don't have the soul-crushing experience that teaches them why something is really important, they never really internalize why it's important. That the reason why we appreciate that this stuff is so important is BECAUSE we suffered through something that taught us that it was important, and that if we don't give junior folks that same kind of experience, then good practice just becomes something on the list of priorities and not a moral imperative.
There's something to that, but at the same time, it's basically a defense of institutional hazing on the investor's dime. So take it for what it is.
I think you can, it's just not nearly as efficient. College and trade schools are essentially about mixing a very small amount of actual necessity (deadlines, tests, etc) with copious amounts of being told what's best and how to accomplish something. This works well because it's both more palatable to a wider audience of people, and it accounts for people's different interpretation of events. There may be multiple things to learn from any particular failure, and without some guidance you may come away having learned few or none of them the first time. Being told what to expect before hand, or what you could have done to mitigate problems afterward, goes a long way towards making sure you consider all the useful aspects of the problem.
Will you learn any one single lesson as thoroughly or as fast as you would in the real world when it affects you so much? Likely not, but then again, that assumes you actually saw a solution to the problem, and that still might just be one aspect.
A single mistake shouldn't lead unconditionally to a mandatory multi-step process that involves multiple people and may take several months to finish. The examples of the "cover your ass" decision making overpowering the common sense engineering are abundant.
It's always best to do an after action analysis to see where things broke down, I think. Changing procedures or adding protections might be warranted, as long as it's not the "get my permission before ever putting a Y in that field again" kind of enhancement.
I agree with the other replies that while you may be the 'technically right dev' kind of person, it still comes of as being a dick and you will likely not get a positive response from something like this, especially if you caused it.
What makes it more reasonable to say it's the company's responsibility to protect you from fucking up the site instead of you just being a good developer, and not fucking up the site?
I don't expect the folks who built my house to detach and reattach the door pefectly every day, I expect them to build a house that works.
Ideally you would have either test gates, or a staged rollout with automated rollback.
I used to work for this company where every single Friday we would go out for lunch and get tipsy (wine with the food, some digestive liquor shots just at the end, then maybe a mixed drink or a beer at another bar) because we were the only team not allowed to go home early on Friday... for no reason. Mind you, we were some kind of internal IT team and there would be no one to request anything from us, so we never had urgent stuff to do.
Back in the office we would enable the "fire extinguisher mode" which meant "only move if there's a fire" and watch silly videos in Youtube, have some coffee with Baileys because why not...
Saying that nobody should ever do it isn't realistic, sometimes things go wrong on a Friday, sometimes you can't afford to wait 3 days to watch them become even worse.
It's good to instill in your team that Friday deployments should be done only in special circumstances, or they become a habit (and people inevitably end up working Saturday). But yes, sometimes it's necessary.
Don't deploy when you won't be around to support it is a better description; but it doesn't roll off the tongue as easily.
Really, it will come down to a cost/benefit analysis. Are your chances of the site going down due to being hacked over the weekend higher than the chances of a last minute update taking down the site?
The answer is almost always no (much to a sysadmin's chagrin). If the answer is yes (i.e. another heartbleed), then you are probably going to be working through the weekend anyways.
"Only deploy on Friday if you like working weekends"
On weekdays you have the added pressure of other work.
(Of course depends if your business is mostly or evenly loaded on weekends as on weekdays. In most businesses I've know it's usually the lower load/customer visits period).
You should not be pulling people in from their time off to fix shit.
This also creates like a kind "brain latency" for developers I think. I'm coming at this from the operations side of things, so maybe my observations are a little biased here. I have observed that if people deploy changes and it breaks something immediately, it's a very clear correlation and they can fix the bug generally pretty quick. If they deploy and then it breaks 72 hours later, any number of things could be the culprit, especially in a fast moving environment (times 1 billion percent if it's a microservices architecture without a strong devops culture, which most of 'em are). Debugging then takes much much longer. This is made worse if the person who deployed the change is not quickly available when their thing breaks, and it makes being on call for someone else's unproven feature very stressful.
So instead I think it's better to make sure deployment and build systems are rock solid, and deploys are as accessible and as idempotent as possible. Chatops type systems are good here. Then you can roll out big changes during peak traffic and be confident that you can quickly revert if it goes bad, and that the changes were reliable under load if it goes good. I also think it's critically important that big changes are behind rollout flags, such that you can dial up or dial down traffic at will. This is also useful when introducing new cache systems or something like CDN if you need to warm up slowly.
This is a better approach I think than trying to use the time of day to modulate user traffic. I would rather developers can control traffic to their feature themselves and have the person deploying the change with their hands on the wheel until they are confident they can take them off. That way people can do stuff independently, and everyone can trust everyone to deploy and yet still feel safe.
[Client, Friday 4PM] "We are having a big sale this weekend we told nobody about. Could you quickly deploy a fix where all the product's prices are red and bold? That shouldn't take you long, right?"
[Project manager, Friday 4:45PM] "Hey team, X just released an important security fix for Y platform. I need you to deploy it right now or the client could get hacked."
The changes are pushed and working in staging, the tests are passing, Q&A is done. Why hold off? So that you get your Good Practices™ badge? I'd say it's better that you don't have to deal with last week's work on a Monday if at all possible.
I think being down to earth, and keeping your good judgement is key. I don't recommend making world-shattering changes on a Friday, but even then, well, it really depends on the circumstances.
* It's the holiday weekend and your new website with its curated comparison feature needs to go live
* It's the end of the quarter and having this is the only way you can sign a deal now (enough of your partners will be bound to quarters that this is possible)
* You're in the business of live sentiment analysis from TV video and a critical bug needs to be fixed before this weekend's Presidential Debate or your news channel partner will be pissed
Reduce the tempo? Some of these can kill.
1) If you try to hide the mistake I'll be mad. As soon as you know there's a problem, we can mitigate the damage by addressing it immediately.
2) I want to see that you're learning from it. A pattern of repeated mistakes may take some explaining.
I did however once work at a company that was very metric-focused. Turns out that the metric they used for our team was closed tickets, and they felt that given my salary I didn't close enough of them. No understanding whatsoever that others on the team were able to close more tickets because of my help (let alone that the tickets that made their way to me tended to be the complicated ones that others couldn't solve). After the first talking to, I stopped helping out my teammembers (I explained why) - and jumped on the first opportunity to get out of there.
I miss construction. Back then, I could just tell someone "Hey! You! Don't fuck that up, I'm pouring concrete around it tomorrow!" And we would still be cool at lunch break.
I have one simple rule for myself - which is to never deploy anything at night before I go to bed. I've made several critical mistakes which I deployed to production and then went to bed only to wake up and find out that users couldn't use my products.
Thesedays, I do all deployment in the morning - that way, even if there is a critical bug, I am awake to catch it and fix it quickly.
I was in Seattle and I took a public bus. The fee was $1. It was my second day in the US and I didn't have a dollar bill in my wallet (nor a transit card). I was assuming that I can pay with a larger bill just like Japan. Big mistake. The driver yelled at me but nonetheless she let me ride for one time. Then the guy sitting next to me suddenly offered a dollar bill and said "take this, so that you can pay for it by yourself." I probably looked at his face with amazement. "Thank you very much, sir." The guy got off the bus in a few stops later and I never had a chance to say this, but I know it's my turn now. I want to thank the nameless person who slightly changed me. Kindness is contagious.
So sure enough I got into a taxi cab and asked him to drive me to the airport, but make a stop at an ATM. When I went to withdraw the money for the cab, I noticed my payment didn't go through, it was still pending. I was desperate, didn't know what to do.
I kid you not, the guy withdrawing money from the ATM next to me somehow managed to notice my despair and got a chunk of money bills out of his pocket and said: "how much do you need?". I think I just stared back at him for a minute or so. "What?", I replied.
He told me that one day I'd have the opportunity to do the same for another person. I always remember this and always help strangers any time I can. And I realized that we are the ones that gain the most when we help.
EDIT: somehow somehow
It's now 4 years since I was fired from that job and I'm still in development, despite that incident nearly causing me to decide it wasn't for me.
I was extremely happy to read this post, it gives me hope. I try and take a similar approach when working with newer developers, or those that are inexperienced in areas.
- With a system outage, don't lose your cool. Rationality is what's needed at the time.
- Elevating "perfection" at the expense of potential breakage will ensure nothing ever moves forward and your team doesn't grow.
Instead, this manager leaves his engineer with thoughts about how to avoid that problem in the future. Always remember -- an outage is eventually fixed, but a damaged relationship continues forward.
The level of stress matters a lot. If a team is being run in a standard “everyday crisis” mode, they quickly reach the point where there is very little they can take. Every tiny mistake wears people down, reminding them of how much more there is to do, and turning them sour. Managers seem to panic and cut into their team’s time even more by starting to have long, daily meetings to “fix” things.
If you want “nice” people, you have to set them up for success. Reward completing the whole chain, not just hacking away (e.g. for software, not just coding but also testing, documentation, and seeking peer review). Keep meetings to a minimum. No overtime. When short-cuts were taken to meet hard deadlines, open up your schedule and scribble in the exact window after the deadline where you will stop everything and clean up the mess that the short-cut created. Give your people the best equipment that money can buy. And so on.
Though, in my opinion I think it's kind of insulting to be asked something like "What did you learn?". The question isn't really necessary. You know that you screwed up.
That kind of dynamic between an engineer and a team lead is off-putting to me.
I think the proper way for a team lead to handle it is to instead work with the engineer to help find ways to eliminate the human error by implementing tooling or processes. The conversation should go something like this:
Lead: "I wonder how we can make sure none of us break XYZ widget again"
Engineer: "We can build out ABC and run that, also generally just test better before pushing to prod".
Lead: "Cool, do you want to go ahead and take care of that?"
“Great. It sounds like you get it. I know that you can do better.”
Giving folks a chance to communicate what they learned, and then encouraging them to "do better," is the best way to lead.