Hacker News new | past | comments | ask | show | jobs | submit login
How Completely Messed Up Practices Become Normal (danluu.com)
631 points by Tenoke on Dec 30, 2015 | hide | past | favorite | 252 comments

I immediately thought of the "5 monkeys and a ladder" psychology study, in which the first monkey attempts to climb the ladder to get a banana and all of the monkeys are sprayed with water. Then one of the monkeys is replaced with a newcomer, who then tries to climb the ladder, but this time, the remaining original monkeys attack it. And so the replacing of the monkeys continues, until none of the monkeys knows what the deal with the ladder is and yet none attempt to climb it.

This is a popular story in dev/product culture but after a little more Googling, apparently it's apocryphal: https://www.psychologytoday.com/blog/games-primates-play/201...

It's a bit like when there's two cash machines, with a long queue at the first but no-one at the second.

You can see people looking at it, wondering whether it's worth the humiliation of finding it's broken and having to go to the back of the queue, or the joy of finding it is working and getting one over everyone else.

Often nobody cracks.

I actually come across this scenario quite often in retail outlets with poorly designed queue layouts (e.g. an H&M with with 2-4 registers behind a long counter and a single ad-hoc queue lined up in front of one of them).

My reasoning is a little different (and a little more charitable I suppose in its generalization about human nature): I don't want to go step in front of the register with no one waiting in it because I don't want to look like a sociopathic jerk. I also assume that is why most people haven't done so.

At the same time, I know as soon as someone does step up and do it, people will grumble but the single ad hoc queue will redistribute itself into 3 more balanced queues. I'm usually waiting for someone else to be that sociopathic jerk.

What I wish is that the manager of the store gave a little more thought to this issue in the first place.

Actually, scientifically, one queue for all registers is the superior solution. It reduces wait times for customers in general vs. one queue for each register (where one of the lines might get stuck for a long time). That's why you find banks and other places are all setup to have one queue used by all the clerks. The only real problem with H&M is that they haven't put stuff in the way of people forming multiple lines when keeping them in one line is the best practice.

It doesn't reduce average wait time, but the variance in wait times.

Well, it might bring down the average, but not as much as the variance, because the work is now load-balanced across multiple counters, more or less, evenly; and so, the productivity might be a bit on the higher side?

That works only if all registers are being utilized to pull people off the front of the queue. The grandparent poster described a situation where everybody queues in front of one register while the other registers sit idle, which isn't optimal for anyone.

Could average wait time be increased, as the distance from the front of the queue and the checkout in a single queue solution is longer, between 2 and 5 meters for a 4 checkout setup. This distance and walking the 2-5 meters to the checkout increases time total checkout time.

I presume that one of the problems with one queue for all registers is that people will perceive it as a longer line. A five person line at a particular register doesn't seem as bad as a fifty person line for all ten registers.

People quickly learn that it's better.

I'm not sure I understand how that continues for very long. Once the clerks no longer have customers, they can just call people over and the queue will redistribute. If they're not doing that, presumably it's because they're doing something else that takes priority, like replacing the till or handling a problem with a previous transaction.

It only seems like it would be a problem at businesses with bad staff.

In my experience, clerks tend to circumvent the issue by simply yelling out, to no one in particular, something like "Next in line!"

I've also noticed the acquired habit in more experienced retail staff, when confronted with a queue of indeterminate order, to say, "I can help the next customer in line" and leaving implementation of the next method as an exercise for the customers themselves.

FWIW, I find the best arrangement is a Fry's style mega-queue with boundaries clearly defined by racks of candy, magazines, or other impulse-purchase crap that feeds into multiple registers.

Here, clercs use o say something like "This register is also open.", and a new line will form.

Single queue multiple registers work really well. Especially for high volume (see: whole foods).

Teaching customers how to use them, on the other hand.....

And for some situations, like a movie theater box office, a single long line actually drives people away (on non-blockbuster-opening-nights) because it looks like a big crowd, compared to two small lines. Even if it means they'd get through it faster. .... at least, before the age where everyone just buys their ticket on fandango.

tl;dr Thank goodness for sociopathic jerks

I've long held the theory that humans have differing degrees of rudeness because it breaks deadlocks like "after you", subdivide-or-finish the divisible food item, and who takes the last piece. (In places that were too polite, I have seen the last piece of a cake go stale uneaten.)

My friend and I used to have a serious case of "after you" deadlocks during our studies; we would get stuck in front of a door insisting the other goes through first. We eventually figured out a simple solution - if we found ourselves about to get stuck in another "after you" loop, we'd play rock-paper-scisors, and the loser had to go first. It quickly became almost a second nature for us, and it looked pretty confusing/hilarious to people stuck behind us in the queue.

I live by the rule that I never follow an "after you" with a "no, after you" without good reason. If they want demonstrate their politeness, why compete?

The obvious reason is because you get to play rock-paper-scissors.

The less obvious reason is that after a certain number of back-and-forths, you can interrupt the other person saying "aft-", cut them in line sideways so you both sort of swoop through the doorway, which then leads to a warp zone.

You clearly don't understand the meaning of "sociopathic"

This comment breaks the HN guidelines. Please comment civilly and substantively when posting here.



Progress depends on the unreasonable man.

The other tl;dr is that managers need to know and buy into concepts of group psychology.

Queueing theory shows that one queue with multiple servers is better than one queue per server.

there's also a great story i heard about a queueing theory conference where a hundred or so academics turned up at once, and insisted on forming a single line at the hotel registration desk because that was the proper way to do it.

I had this exact experience at the movie theater, yesterday. There were two cash registers at the concession stand, but there was some ambiguity as to whether the second one was staffed. So, at first, like everyone else, I queued up at the first register. I asked around and nobody seemed to know whether the second register was open.

However, I noticed that the line on this register was moving rather slowly, and not wanting to be late for my film, I decided to take a risk and walked up to the second register. Lo and behold, it was actually open, and I had my popcorn in about a minute. If I hadn't been late for my movie, I don't think I would have gone to the second register.

Why i have pretty much given up on burger chains.

Their places have X number of cash machines, yet only one of them is in use even though there is a massive queue.

And if you wander up to one of the unused ones you get yelled at...

Is this an american weirdness? That sounds completely unreasonable.

Can say, it was an American franchise but it happened somewhere in Europe.

Why would you have to the back of the queue after reconning the other queue? What kind of jerks do you queue with?

Depends on a queue, I guess. In case of cash machines one could feasibly rejoin the queue at the original position; in a mall however, reserving yourself a spot in a queue and continuing to shop is considered rude.

My experience is more with the OPs. Leave the line, lose your spot.

>apparently it's apocryphal

It sounds like a fable. I know that primates are smarter than other animals but I doubt that without some strong negative association they would be able to keep up that convention when there's an obvious prize right there, especially if they got hungry.

This appears related to learned helplessness except on a sociological basis rather than individual basis.

And as a corollary the moment someone breaks an "unbreakable" record in many sports that record can sometimes be frequently broken right after.

Competitive eating is a great example of this. Never really thought this carried over in general to other forms of sport / competition.

The best example I can think of is in middle distance running, when Roger Bannister broke the 4 minute mile.

I think the best and most relevant example of this for the HN crowd might be Donkey Kong :-) Ever since the documentary "King of Kong" came out, and that douche Billy Mitchell's record was broken, there's been a stream of new records.

BTW if you haven't seen "King of Kong" you owe it to yourself to watch, regardless of whether or not you're a video game person. It's hilarious.

> apparently it's apocryphal

This shouldn't affect its value as a story, though.

Most of what the author is complaining about boils down to business needs being more important than the needs of the engineering team. To be blunt, they're paying you to do a job, not to make the organization better. That's what they pay the leadership for. You want to be part of the leadership, work your way through the ranks or start your own business.

My life got much, much easier once I learned to stop straining so hard to fix things that are bigger than me. If you don't like your managers, or the culture, or the business, find another place to work, it's that simple. If you can't do that, you're simply going to have to learn how to compromise.

This is exactly a perfect example of WTF worthy behavior being treated as totally normal; that it's encoded that there's "leadership" and "not leadership" as binary job distinctions with distinct responsibilities.

I've been in both types of cultures and, by far the healthier culture was the one in which "making the organization better" was treated as everyone's responsibility. I've seen very junior engineers drive quite significant culture changes, simply by leading by example. They did it, not by whining or complaining, but by taking baby steps of trying out small experiments with their immediate team and then broadcasting their success to incrementally greater circles within the company until it became the new normal.

If you seriously believe that all (or most) companies disallow anyone not in leadership to improve things, I'd consider getting out of your current situation and seeing things again with clear eyes.

> If you seriously believe that all (or most) companies disallow anyone not in leadership to improve things, I'd consider getting out of your current situation and seeing things again with clear eyes.

I've been around a bunch, including the US Air Force, and I can confidently tell you that heavily hierarchical organizational culture is the norm, individuals contributors who want to have effects beyond their job scope do so at risk of offending other stakeholders. They may tolerate you and even give some limited help, but they're not going to shout your name from the rooftops just because you want to be a boy scout.

Not saying you can't buck the system and get away with it, one of my favorite books was about one of my heroes, John Boyd, but if you want to be a hero, you need to go into it with a clear-eyed assessment of what you're up against and so you can tailor your objectives appropriately.

The DoD (!) is hardly representative of the cultures being discussed in the linked article, which is about tech startups (with a bunch of examples from medicine too). Yes, there are worse examples, but those may be beyond help.

As an enterprise grows, you either retain centralized control and have hierarchy, or distribute control by treating parts of the org like independent, federated entities (think Berkshire Hathaway).

The military's size, scope, and mission are unique. You need top-down control, because otherwise most folks won't decide to run up a hill into a machine gun on their own. That mission focus (do stuff with force), cascades into the supporting services.

Tech start ups are a rather small part of the overall IT industry. I dont really have the numbers to back it up, so i am not going to speculate as far as the actual numbers, but any discussiona bout workplace IT that ignores everything that isnt a tech start up is virtually useless.

And the IT industry is a rather small part of the overall job market. So therefore any discussion about the workplace that ignores everything that isn't IT is virtually useless?

I fail to see how governmental organizations should fall outside the scope of the discussion.

And the Air Force is not the DoD. While, technically, the individual branches of the US military fall under the DoD, they all have very different organizational cultures, and the DoD has staff and culture of its own.

> I fail to see how governmental organizations should fall outside the scope of the discussion.

I won't tell you what falls within the scope of this discussion or not, but I can tell you why they don't add a lot of value to it either.

It's already quite hard to somewhat objectively discuss whether a business company is dysfunctional, in part, or not at all, and what are the reasons for that.

It's almost entirely useless to attempt to have that discussion about government. Because politics. Because people root for their home team. Because people are inclined to dismiss the other team's ideas even when they're real good. If I told you the US Government is dysfunctional, someone would counter that I'm not American and I should see my own government. If we move past that and we got into the details of why things are not working properly, invariably Americans will start quoting bits of the constitution, bill of rights and foundling fathers at each other (and from there on it just becomes religion to me).

Discussing whether the function of Military organisation is working properly is even more difficult. In addition to the above problems, there's also hazing and indoctrination, which are incredibly strong psychological forces (without those, like said, people won't run into their deaths without question). And it pretty much divides the crowd in two slices, those "outside" that have no idea how it really works, and those that have been "inside" who are unable to disconnect the indoctrination bits from forming an objective judgement about how well the organisation functions and how that comes to be.

Certainly, discussing businesses has similar problems, but this just amps up the personal emotion factor to eleven. Also one of the reasons why the businesses were kept anonymous in the featured article.

Not sure why you are getting so many replies disagreeing with you. Narrowly-defined roles are indeed the norm, even in the tech world. When I was an "individual contributor" I'd get frustrated with Worst Practices all the time, but was told that I was hired to write code not to change our infrastructure or suggest different product features or improve our testing practices--MANAGEMENT makes those decisions. But even when you get into management, there are always managers above you to limit what you can achieve.

There may be awesome companies out there where anyone is empowered to just go fix some practice that's not productive, no matter where they are in the org chart, but those companies are few and far between. I'd love to see one.

There may be awesome companies out there where anyone is empowered to just go fix some practice that's not productive, no matter where they are in the org chart, but those companies are few and far between. I'd love to see one.

One thing I would consider doing, if I ever start my own company or rise high enough up in someone else's company to implement this, is to give every single employee some sort of discretionary budget to spend on things that make the workplace better. The more experienced and trusted you are, the bigger of a budget you get. I think this would go a long way in fighting the learned helplessness dynamic.

This is a very interesting idea. Have you, or anyone else on HN, ever seen this in action? I would LOVE to hear stories about this, either successes or failures.

It's called 20% time and it works OK. I always used it when I was at Google, although most people didn't.

20% is not specifically for "improving the organisation" of course. It's most famous on the outside for the new products it led to, like Gmail. But actually most 20% projects were small internal things intended to smooth the rough edges off a particular tool or process, or to improve the company in some way. If the particular bee in your bonnet was a type of bug that cropped up frequently in other people's software, making a linter to spot it and driving adoption through the organisation would be a good 20% project, for instance.

Part of it is vinceguidry makes something of a false distinction. Either fight the system and solve X problem behavior or just accept it and don't think about. A third option is simply investigating classes of behavior of this sort and learning something from them.

I really disagree, strongly. It takes time and perseverance, but you can spread better ways of working.

The job would really need to be worth it to work against pushback from above.

That part's true. It's probably almost always easier to find a better environment.

That you chose to mention the UAF as an example suggests that your experiences are really on the far side of what is being suggested here. If there's any place where these kinds of things don't work, it's in government or the military.

I recall reading somewhere that while military organizations are very much synonymous with hierarchy, they have historically experienced a lot of pressure to flatten and decentralize, at least as far as field command goes. Napoleon's success was attributed to smaller units that responded to feedback more quickly. Unfortunately I can't find that article.

However, this is not to say that in regular administration there isn't still a lot of de facto trust in the hierarchy. I think unbounded hierarchy is generally one of those memes that people grow up taking for granted as something that just works with any layer count. In some ways, it's a self-reinforcing thing - if you climbed ranks for 10 years, you don't want to see the system change and your "sunk cost" go away.

[edit: clarified sentence]

The US Marines are often cited in management literature as an example of an organization where the leadership communicates high-level intent and purpose to achieve strategic alignment, and then totally decentralizes tactical decisionmaking to the people doing the actual work in the field. The stereotype that they are just all about top-down decisionmaking and following orders, is false. They recognize that individuals need to be able to make decisions on the fly in order to react to volatility, uncertainty, complexity and ambiguity.

But it will always remain an inherently misleading comparison, because they get to train people to an extent and in ways that would be unthinkable for a business in modern civilized society to get away with. The training and drilling is such an engrained component in that whole system. Without having that first it seems to me a bit like prematurely spot-optimizing the wrong loop.

Was it the Robert Coram book (https://militaryprofessionalreadinglists.com/books/665-boyd-...)?

That book made it into four different reading lists from various branches of the military (USMC, USAF, Army, and a chair of the House Armed Svcs Committee). That's encouraging in that at least they're aware that his ideas/approaches were productive.

Yep. There's a whole philosophy based on the idea that improvement of the organization is everyone's job. It's called Lean, or more simply, continuous improvement or Kaizen.

I hesitated to reply to this because what I'm about to say could be taken as pedantic. However, it is my experience that many people experience a good way of working, apply a label to it and then use that label thinking that it will communicate all of the good things that they intended. This is rarely the case. At some point you see many articles entitled "Why X doesn't work" where they describe a broken version of X and why it doesn't work. The broken version ends up getting more popular than the unbroken version for some reason and there is a general consensus that "X doesn't work and anyone that uses X is an idiot", even though X works perfectly fine if you use it correctly.

With that in mind, "lean" as a principle is organized around the idea that it is better have people request functionality (pull) than for you to imagine what people want and then try to sell it (push). Many "lean" organizations happen to be very good at continuous improvement, but you can be "lean" without any capability for improvement.

"Kaizen" is the Japanese word for continuous improvement. It was introduced to western eyes primarily from the "Toyota Way", which is a lean manufacturing system that concentrates on continuous improvement. It is important to understand, though, that while workers have a responsibility for identifying problems, it is the managers who are responsible for working with the workers to implement improvements in the Toyota way. So there are clear demarcations of responsibility, unlike many of the IT processes that borrow the word "kaizen".

Even "continuous improvement" can be a loaded term with respect to responsibilities. The first time I ran across the term was dealing with CMM (the Capability Maturity Model). Once you are up to level 4 or 5 you are using metrics to improve your process in order to obtain continuous improvement. In every organization I've seen that attempted to achieve CMM level 4 or above, the primary driver was management (and often workers were not even consulted about improvements or why they were necessary -- they were only told to do things a certain way). I'm not saying it was successful, but it was very common ;-)

I guess my point is that if you are lucky enough to be working on a team that does continuous improvement well, be aware that what you call "Lean" or "Kaizen" is unlikely to be what other people call it. You can not communicate what it is you want to communicate using those words alone. If you have been reading literature using these words and believe that if you simple follow a "Lean" system and embrace "Kaizen" that you will transform your organization into one that does continuous improvement well, I'm sorry to burst your bubble. If only it were that easy. Unfortunately, all the difficult problems are people problems and those problems almost always require unique solutions. It can be quite tricky no matter what you are doing.

I think the lesson to learn from this is to be wary of anybody who says they can look at a system for organising people that works well and write down a recipe for how to reproduce the essential elements of that system. Even if they were partially responsible for creating it. Most likely, nobody understands why it works in enough detail to write down precisely what it would take to reproduce it.

W. Edwards Deming got the closest I think?

System of the organization comprises the relationship between systems within it, how those systems change and vary, the methods of knowledge learning and transfer, and the psychology and behavior of the people within it. And importantly, it's how all those pieces interact with each other that makes the most difference.

It's a system. There are some common elements. Above that base level, there is much truth in what you say, but the most important thing you can learn is a mental model for organizations in the most general way possible.

One of the hardest parts of this is organizations are not separate from there problem domain. A process that works for manufacturing cars (high value) may not work so well for manufacturing radios (low value) let alone banking software. Pay, education, and company size are all huge factors that are often ignored.

I found myself imaging someone writing down a recipe for yogurt like so:

1) Heat up milk 2) Find something with the consistency of yogurt to put into the milk 3) Wait

Having a living thing there that knows what the final product should be and can stand as an example is largely how we transfer culture.

Part of the extreme difficulty in defining a mental model for an organization is that organizations are necessarily complex systems. You're exactly right.

Regarding people problems, my one note is that we are all more alike than we seem, and you eventually start to see the same classes of problems everywhere. This is why an understanding of psychology and behavior is so critical.

Except this big catch is that employees are compensated for the improvements they bring about. That basically never happens in a typical software company. Maybe if you keep bringing in important changes someone will notice and maybe you'll get stock options or you'll be promote to management

yeah there are a gazillion companies that claim to be Lean, Kaizen or what else but guess what, in practice only a small fraction actually are :)

That seems like something that is worth (tactfully) screening for in interviews when given the opportunity to respond.

So instead of asking "are you Kaizen?" you would ask about practices performed that imply Kaizen, including the spirit of the law and not the letter.

Yep - this is something that I'm thinking about actively. How do you start the conversation in an interview (as either interviewer or interviewee) that gives you insight into how people treat problems? Is their instinct more individual or systemic? Do they naturally look for improvement, or point fingers?

I've had some success, but they're always very deep interviews, which can either be drastically good or quite startling to the other party depending on what direction it goes. So your note to do it 'tactfuflly' is very prudent.

I have often found that two things work well for this:

* have your antennae out for anything that seems weird (weak signals, indeed) in your conversations about the work

* check references (as an interviewee). That's a fancy way of saying, look to see if anyone you know knows someone who works there, who can give you the straight scoop.

At more than one company, "Agile development" means "the CEO gives his cell number to every customer, and Waterfall development happens very fast."

This is half true. People will go out of their way to make the organization better if it make their jobs easier (ex: switching to more modern easy-to-work with technology). But if for instance the project is going in a wrong direction, most of the time you have no good reason to fight to change things (and yes, it's fight b/c often it's about changing opinions and demonstrating repeatedly that things needs to be done different). So sure, do your duty and voice your concerns but at the end of the day just go with the way the boss decides. This is called delegation of responsibility and something a lot of people don't grasp. You need to cover you ass to stay sane.

Leadership leads. This is a tautology. It is unhealthy to pretend otherwise.

Lets take for Leadership decides to incur technical debt to allow a faster entry into market, Development Team Lead decides this is unacceptable, and does not take on the Technical Debt, but instead does things 'The Right Way'. This leads to competitors entering the market first, snagging customers and mindshare, which eventually leads to the company going under.

Was the Development Team 'Right'? Even if lets say he was 'Right', is it his decision to make, should he intentionally sabotage or ignore direction from his superiors ?

Most true WTF processes tend to come from decisions made that are out of the control of the parties that they directly impact.

> Was the Development Team 'Right'? Even if lets say he was 'Right', is it his decision to make, should he intentionally sabotage or ignore direction from his superiors ?

That's a bit of a loaded question.

You assume there is a right and a wrong that can be determined before knowing the outcome (a.k.a. deontology).

It's easy to see this is not at all clear-cut if you consider the other possible outcomes. What if it turns out that DevTeam's decision was what saved the company (and maybe even their competitors are now buckling under Tech.Debt).

Would that have made them right? Maybe yes. But in other's eyes maybe not because they still disobeyed their "superiors". But maybe the company would have fallen if they had listened.

Can you categorically say, beforehand, that one decision is right and the other is wrong?

You could, but others might disagree (and rightfully so, IMHO).

Say you claim it's always right to do what management tells you, and it's wrong to decide to go against that. And you can argue this because a large business needs that kind of structural dependability, otherwise it would fall apart in chaos and specialisation is a good thing, so management specialises in determining long-term vision that a Dev.Team leader is not as knowledgeable about. Sounds like a pretty tight justification, no?

Well, except when you're Dev-team and your livelihood (maybe family) depends on this job and following management's will crush the company, you decide to disobey management, and it turns out that indeed the tech.debt would have crushed the company instead of the entry-to-market delay. Then Dev-team gets to claim they were right. Even if management says "you saved the company this time but in general you ought to always listen to us even if you think it's a bad idea", except that (obviously) protecting their livelihood ranks a lot higher on Dev-team's ladder of oughts than following orders to facilitate a smooth tightly-run company.

GP is not pretending anything. Further, taking a position on technical tradeoffs (your example) is an orthogonal concern to defining leadership structure, which is also distinct from the possibility of making bottom-up changes. No one is proposing anarchy here, just people fixing what they can.

OK, I think there are some unspsoken givens that I need to spell out to make my point clearer.

Typically people resolve the issues that directly cause them pain pretty quickly if it is in their control. Further "control" requires understanding how the levers a party influences impact the obstacles that cause the pain.

WTF level issues typically occur when the above conditions aren't met, usually that means the 'pain' doesn't impact the party that has the control to end the pain. next most common is the party that has the control doesn't understand how its levers affect its obstacles.

just fixing what they can is not going to resolve WTF level issues...

One should always keep in mind that a WTF-level issue is very possibly a critical oversight on the issue-holder's side. Lots of things in the world appear very fucked up, but are in actuality slight improvements over even more fucked up states of affairs. One should endeavor to find out the history of the situation before jumping to judgment.

If you want for everybody to fix things, everybody should know things, share the vision.

If management is planning to do things quickly and badly, the dev lead should know that and plan accordingly, trying to do it in the best possible way.

If he's kept in the dark, of course he might blunder. But that's not his fault, assuming a company that encourages initiative and fix it yourself attitudes.

Best leadership leads by empowering people, not by enslaving them.

"Lets take for Leadership decides to incur technical debt to allow a faster entry into market, Development Team Lead decides this is unacceptable, and does not take on the Technical Debt, but instead does things 'The Right Way'. This leads to competitors entering the market first, snagging customers and mindshare, which eventually leads to the company going under."

To be honest, how often does this actually happen? I'd doubt it happens often enough to worry about it. Most startups aren't doing anything that special, and so this whole, "We have to move at twice the speed of light or we're all dead!" attitude doesn't belong anywhere.

> Most startups aren't doing anything that special, and so this whole, "We have to move at twice the speed of light or we're all dead!" attitude doesn't belong anywhere.

burn rate/runway is a critical survival factor to startups

You're conflating two things. One is the principle that engineering needs are only really important as far as they serve some business need. As a developer, I don't love this, because I'd like to be able to work on whatever I found useful, but it's true.

The second thing is people ignoring opportunities to make engineering changes that would be beneficial for both engineering and business reasons. They do this because "it's not how it's done" or "it's impossible" or "we don't have time for that". And this is a problem that affects both engineers and leadership. Some changes can't be done because leadership doesn't allocate the resources, some can't be done because people on the ground don't give a shit.

Since there are problems that can't be solved without leadership buy-in, you're definitely right that you can't fix these problems without having a role where you have a real ability to change the culture, and that you should be realistic about that.

There is some nuance here, though. A lot of leadership that might seem to default to "we don't have time for that" can be more reasonable when the problem, and (critically) the monetary ROI of fixing the problem, is explained to them in terms they can use to justify to their bosses. Effective communication of technical best-practices to non-technical people is something that everyone can practice. Sometimes you'll have irrational bosses, and that's just life, but more often than not they're just too busy to understand technical things, and it's still possible to spoon-feed them.

I appreciate your honesty, but I definitely would not want to work in a company where that was the prevailing attitude, nor would I hire someone who took that as their MO. It's a pragmatic policy for each individual, but it also leads to endless passing of the buck.

"What can I do, I'm just a dev, the team lead should figure it out" turns into:

"what can I do, I'm just team lead, the department head should figure it out" turns into:

"what can I do, I'm just the department head, the CTO/CEO should figure it out" which finally ends with:

"I'm just the CEO, these decisions should be made by the department heads and team leads because they have a better understanding of the problem."

Responsibility gets passed from the bottom to the top back down to the bottom, where the process repeats itself. Taken to its logical conclusion, everyone has a perfect rationalization for why all decisions are either too general or too specific for them to make. The net result is that decisions still get made, but no one questions them or defends them or even knows where they came from, because no one perceives them as a choice they had any influence over.

There are many reactions you can have to a lack of interest in your plan for organizational change. Quitting and moving on is definitely a contender. I am moving on, but I am taking the opportunity to accomplish several personal goals before doing so. My company would hate to see me disappear abruptly, and I've already got the go-ahead for a sizable consulting arrangement so they aren't left in a lurch.

Absolutely, there's always a balance between "voice" and "exit" options when it comes to organizations in need of reform. If you've been there awhile and feel like your voice just isn't being heard, exit is usually right move. And if you can exit gracefully, that's even better!

> To be blunt, they're paying you to do a job, not to make the organization better.

Ultimately what "they're paying you" for is not so specific. Generally, leadership doesn't have the skills to know there is a problem. Even in the cases where the leadership is technically exceptional, there is no way they can consider the myriad technical decisions and how they affect the future of the business.

Maybe leadership still doesn't care. If technical issues stunt the growth of the company or cause it to go under, the response could be "Meh, we had a good run." In that case, I, as an individual contributor, want to know that's the attitude up front. This lets me know to move on before everything hits the fan instead of be one of thousands laid off at the same time.

> Ultimately what "they're paying you" for is not so specific.

I think you'd find in most organizations, roles are well-defined, but not articulated to those assuming those roles, because it helps morale to let employees define themselves, and also keeps them from getting too complacent. A manager's job is not to lead but to manage, i.e. get the greatest possible output from the employee.

If you want to know what your defined role is, a simple way to do so is to simply stop doing your job and seeing what people complain about first.

You're right in that leadership can not know the intricacies of your domain, but you're wrong in assuming there's something special about that state of affairs as it concerns technology. Leadership does not know the intricacies of, say, human resources, that's why they hire a specialist. Even in areas where the leadership does know how to do the grunt worker's job, their attention isn't focused on that area, so they're not going to know about problems until someone, typically a manager, brings it up.

A company's leadership is typically engaged outward, towards the broader market. Our CEO handles big deals with other service providers and retail outlets. If you think about medieval Europe, the petri dish that modern organizational methods evolved in, it makes sense. Someone needs to keep tabs on what the neighboring states are doing, that someone needs all the resources of the nation at his disposal at his command so he can deal with regional situations.

Individual contributors, when they try to "rock the boat", are perceived as unnecessarily taking time away from the much more important job of keeping tabs on the broader world / market. It's not directly making the company more competitive, so it's just noise, is the attitude.

The attitude / culture I described above is the norm, everything else is an exception. You can take this paradigm, and use it carve out a little fiefdom in any traditional hierarchical organization. Essentially, you figure out your defined role, do only the minimum required to not get fired, and devote the rest of your time to company politics. As you come to understand the organization and its needs, you'll be able to position yourself as someone who can meet those needs. If a company needs it, then that means it can't get it with the current resources it has, otherwise it would have it already. So you will need a budget and staff.

> If you want to know what your defined role is, a simple way to do so is to simply stop doing your job and seeing what people complain about first.

Whenever I've done something like that in the past, the complaint is always some variant of "I noticed you goofing off".

Historically this was a source of incredible frustration because it basically acknowledged I was undertasked; the complaint would never be some variant of "Why isn't X done yet?"

I suppose that means I had no defined role in the company?

> the complaint is always some variant of "I noticed you goofing off".

OK, there's a political status-quo you have to learn how to internalize. If you work at the front desk, you can fuck around on the computer, but you can't take a book out and read. One looks like work, the other looks like fucking around. You can't make your organization look bad. The perceptions are more important than the substance. If you can't get the perceptions right then you were never going to make it in corporate America, and should probably stick to contracting.

So the way to actually do this is, to look like you're working, but actually be producing nothing. When they complain, you can say you were busy on X, where X is some obviously unimportant meaningless detail. That forces your boss to clarify what he expects you to be working on.

That your boss never asks you about specific tasks means that your defined role is to be a repository of knowledge and not a cog in a machine. This is a good thing, knowledge workers can bullshit their way to perks that the rank and file could only dream of. My current job is just such a thing.

The key to this is understanding that nobody actually knows what you do. Management can only guilt you into being productive, they have no way of actually knowing if you are being productive or not. Perception is reality and you control the face you put out to the organization.

Pretty much. And there definitely are jobs like that out there. I once had a job where literally no one would have complained if I didn't do any work except for sitting at my desk, replying to e-mails, and attending meetings (which I don't consider "real" work). Because I was part of a management training program, I was not expected to produce any output, I was just "there to learn." In reality, that meant I was a professional internet surfer and no one cared. I was miserable and left that job as soon as I could afford to.

No, I think Dan has a pretty good handle on what the respective needs of companies and their engineering teams are. I think a more accurate gloss is that people's personal self-interest and habits are more important (to them) than the needs of the company. When doctors don't wash their hands, it's not because "the company" needs to kill more patients. It's because they're in a hurry.

I would strongly disagree with you on both points -- that you're getting paid to do a job, and that you have to be paid to be a leader.

For one, you don't become a leader by being paid to be a leader. That's how you end up as an incompetent manager. You become a leader by taking responsibility and doing what you can to ensure those responsibilities are taken care of. One of those responsibilities is to make your organization better.

And you shouldn't think you're getting paid to do a job. I'm going to do a job whether I get paid by my employer or not. What I'm getting paid for is to take my employers priorities and goals into consideration, and the strength of that consideration is proportional to the pay. If you don't pay me very much, my priorities will be considered first, and one of my priorities from a workplace is to be comfortable and happy within a good organization -- I will work to help and support my friends and coworkers. So that's what I'll focus on. If you want me to 'just do a job' or 'maximize company profits', you need to pay me extra, because I won't care about those things for cheap.

You must be miserable at work if you think you're getting paid to just do labor. That's a waste of an education, assuming you have one.

> that you're getting paid to do a job, and that you have to be paid to be a leader.

I did not mean these things the way you seem to think I meant them.

> What I'm getting paid for is to take my employers priorities and goals into consideration, and the strength of that consideration is proportional to the pay. If you don't pay me very much, my priorities will be considered first, and one of my priorities from a workplace is to be comfortable and happy within a good organization -- I will work to help and support my friends and coworkers.

I use a similar rubric to decide how to prioritize my time. My defined role comes first over everything else. Because I am accomplishing my role extremely efficiently my company sees me as a very good employee.

What I do with the rest of my time I consider to be my sole discretion. If I have an idea for something I'd like to build for the company, I'll go over it with my manager to gauge interest. Sometimes he's interested, sometimes he's not. I do not sweat lack of interest, I am an idea machine, I can come up with new ones.

I look at this surplus time as the primary benefit I receive from getting better over time at my job. I spend maybe a couple hours per week on defined roles, the surplus time I mostly re-invest back into my own capabilities. This helps both the company and me, but mostly me. The company simply isn't set up to be able to utilize my talents effectively.

This is my thinking also... it came to me over a long time, and typically has a lot to do with why I switch jobs. At a certain point i give up struggling against the 'WTF' and just start going along with the flow, then one day i look at the stuff i am doing, realize working at the place is making me a worse professional, and move along...

However I do want to note that in general these things do have real concrete negative impacts on the bottomline of the organization.

> However I do want to note that in general these things do have real concrete negative impacts on the bottomline of the organization.

Oh absolutely. But your duty to that organization is to raise these issues to the person best-equipped to see the bigger picture, and to do your best to convince him it's a real problem. Once you've done that, your work is done, you have just done more to help your organization than 100 blog posts would have accomplished, and more than 90% of the other employees would have ever done.

You can comfortably use this approach once a month and be vastly more effective than everybody else at your company at bringing about change. Simply raise an issue, have a conversation about it, then drop it if you get no support.

Exactly. It also works as a meta principle. If you raise issues without them going anywhere, that in itself is an issue worth raising. Sometimes it means going to the manager's manager, but it does work. And if it doesn't work that's an excellent exit cue.

I completely disagree with this.

To expand a little, I don't think the point of having a positive effect on the WTF parts of an organization has anything to do with my effect relative to other employees.

If I see a WTF security or ops practice, I don't sleep better at night by telling myself, "Welp, I noticed it and said something to someone, and that's a lot more than most people do."

Maybe I think of my job too broadly, but my job as a developer is, at least in part, to protect the business, to find the WTFs and get them sorted out.

The idea that IT is just there to solve the "business problems" and they should shut up about anything that isn't directly related to making money is absurd to begin with. But even if I'm being completely naive about that, it still implies a really shallow understanding of "business problem."

Security is a business problem. Stupid ops habits that result in downtime are a business problem. Groupthink that results in acceptance of worst practices is a business problem. All the whatthefuckery the article talks about boils down to business problems. Many of these problems are such that the people primarily concerned with running the business are not in a position to recognize as problems.

It absolutely is my job and every developer's job to find these things and take care of them.

If I ever found my self in a situation where I saw some WTFs going on and I went to my boss or a coworker and got no explanation other than 'This is how we've always done it.' I would be concerned. I would go back to my desk and finish what I was supposed to be working on. Then I would go home that evening and write out the clearest and most concise explanation of why this is a serious business problem with citations and take it back to the coworker or boss.

In my experience, the WTFs don't usually come from this is how we've always done it. They had some reason back at some point in the past where someone really needed it to be that way for some reason, and usually temporarily.

Here's an example from several jobs ago, one of my first in the industry, actually:

Why does this set of boxes still have password auth enabled? Why isn't it locked down to ssh key only? Why is port 22 accessible outside of the VPN?

Oh, it's because our CEO likes to get his hands dirty with code every once in a while, but he didn't want to mess with pub/private keys, so we just left it open because these boxes weren't that important. They were 'just' dev machines pointed at test DBs with no real data in them.

But it got written into a setup script or Wiki somewhere, and when those dev boxes got repurposed for production, people followed the scripts or rules or whatever that were specific to those machines. So now you have prod machines running with password access with SSH exposed to the public. Ones that are pointed at live DB servers with creds for them.

That's a serious WTF.

I'm not even close to a security expert of any kind. I wouldn't claim to be in a million years. I work with databases and Python, almost exclusively. Though I dabble in DevOps when I need/want to. Even I know that the above is a serious WTF.

What did it take to get that fixed after my coworkers said, "Yeah, that's how it is."? Going straight to my boss, who also said, "Yeah, that's how it is." Then spending maybe an hour of my time writing up how much of a business problem this is and going back to my boss, who said, "Yeah, I know it's a problem, but I don't even know why it's like that. It just is."

So then I ask him who he can think of who might know why that is. Oh, quelle suprise, his boss might know.

And indeed, his boss did know. But he had long since stopped trying to get the point across to the CEO who liked to dabble, so he just went with it. Here comes my very brief paper explaining why this is--you guessed it--a business problem. The CEO responded within a couple of hours requesting that someone come setup SSH keys on his computer and shut down the passwd auth and close port 22 on all prod machines immediately.

That's a lot more work than mentioning it to someone. But I think that it's my job to do that, even though I've never worked in DevOps or Security teams.

My work was absolutely not done when I asked a coworker about it once, and then asked my boss.

----------------------------------------------- Edited to add:

Astute readers will note that there was a lot more WTF-ery going on than just auth and port exposure on prod. Like, for example, that the non-technical CEO who liked to dabble was doing so on a production box because he was not made aware that the boxes he was logging into to play with had been repurposed.

There was a lot of cascade in tracking down that one particular WTF, and many other WTFs were solved because of it.

At the time for that company, my job description was C# middleware for a web app. I maintain that it was absolutely part of my job to pursue that WTF to its endpoint, and that doing anything less would have been completely irresponsible.

Whether or not it's strictly within my job description, I think dealing with things like security vulnerabilities and poor operational practices definitely falls within my ethical duties as someone who claims to be a professional.

I would not want to hire, nor work with anyone who felt that their job was "done" simply because they made a comment to someone at some time.

> However I do want to note that in general these things do have real concrete negative impacts on the bottomline of the organization.

Not true. Sometimes, things that have concrete negative impacts can BENEFIT organizations.

U.S. prisons are just one of many such examples of this. A concrete positive impact would be lower incarceration and re-offending rates. American prisons have the exact opposite effect. This leads to lots of "customers" in the form of prisoners and victims seeking a more abstract "sense of justice".

The American prison system is probably not the only industry with that kind of business model. There are, no doubt, many contexts where a "concrete negative impact" can be very good for the bottomline. It's also an important lesson about making money in general. It's not about creating value. It's about making others pay for your actions regardless of whether or not said actions are "positive".

> Sometimes, things that have concrete negative impacts can BENEFIT organizations.

That's a very good point too. If you wander, unaware, into a quest to change something with negative externalities on workers or the public at large but with a positive effect on the business, you'll find yourself suddenly in a minefield of opposition with a target painted on your back and no idea who you've just made enemies of. Tread very carefully.

If you are able to positively identify negative externalities that produce benefits for your company--benefits your company depends on--you shouldn't tread carefully. You should quit your job.

Is there no sense of ethics in our profession?

Classically if doing contract software work, bug-ridden software meant a secure maintenance contract for years to come.

Though when I got out of that sort of thing, I recall there being (relatively) new rules requiring a different entity to do the maintenance from the ones that created the software... but my understanding of FAR is basically non-existent, so I don't know.

The customers of prisons aren't prisoners or taxpayers. It's the people who decide to spend money on facilities and salaries.

Another example of this might be some unreliability in a part once it goes out of warranty.

There's a variety of things you can still do as a middle-tier or above engineer to try to change the situation. (Very junior engineers are advised to A: watch how the situation develops with those poor practices and learn and if ambitious B: find a mid-tier or above engineer to ally with for these matters. You lack the social capital on your own to do much.) In my experience, at least with my development style, writing unit tests is either a net time wash or for larger systems, a net time gain [1], so use them. Even if nobody else runs them, you still get a better system out of them, and you get to run them when somebody complains. You can use analysis tools on your own code. You can set up an instance of a build server even if nobody else has one, and have it monitor your unit tests and other tools and fix things as others hack on your code. Etc. etc.

There's two basic outcomes that can happen here. Either you become a leader, and gradually and often quite begrudgingly at first, people start following when they see how well it works. (Unit tests can be a real eye opener sometimes, especially when you start having reasonable cause to show how the problem is unlikely in your code, being either in the other code or the specifications.) Or you get quashed from above. Contrary to the cynical answer, the latter is not inevitable, but it is certainly a possible outcome. At that point, yeah, it's just time to say you got some valuable experience and move on.

What you MUST NOT do is simply whine... from the point of view of those above, anyhow. Even if it's perfectly sensible complaints from your point of view that are all but objectively correct, it's unlikely to be heard as anything but whining. You need to lead by example. You also very much need to do so with some idea of cost/benefits analysis; you can't go from no discipline at all to a perfectly disciplined project in one step, so consider your steps carefully. Keep them small; stay away from "big rewrites". (Probably the biggest failure case I've seen is someone who thinks that code X is in the wrong paradigm and sets out to entirely rewrite it to make it "better". YMMV but in my personal experience this is usually someone who thinks the code needs to be OO, or a different kind of OO. This is guaranteed failure. Even at surprisingly small scales! You destroy everybody else's knowledge of the code.)

And I'll say it again to underline it... the cynical answer that this is impossible is wrong. It certainly won't be easy, but I can guarantee great growth as a developer if you follow this path, both technically and in dealing with people. Even if you have to change jobs.

As for the bottom line point, arguably you have a duty as a professional developer to be doing this stuff I describe, precisely because it does mean resources are being continuously and avoidably drained on issues that shouldn't exist. If you find yourself unable to discharge it, you should find somewhere you can.

[1]: I like to say that it lets me develop with monotonic forward progress. I even use unit tests during prototype work quite often, after I got sick of the way during prototyping I couldn't count on anything to work, ever, due to changes, and I realized that itself was actually inhibiting my prototyping ability. Sure, sometimes I dump entire subsystems but even then it was usually because the unit tests showed me a fundamental flaw far earlier than the rest of my prototyping would have, and I do so with far more information about the local landscape than I would otherwise have had.

Thanks for this. I have used somewhat similar approach and ended up writing bunch of testing tools for our app over a period. I can see its effectiveness as quite a few people in our team and outside started using them, or made their own copy and modified for their needs. Now I am better off as a developer but appreciation in term monetary compensation / promotion did not come through. However unlike others I do not have option to change jobs due to visa issues etc.

Well, sounds like you're doing your part, it's just the legal system screwing you up.

I wanted to tack on that while the "change jobs for de facto promotion" technique is solid and time tested, doing the sort of stuff I mentioned can help you climb faster. If you're planning on that career path, that's great skill development. You may still advance if you just clock time in before moving on, but you'll find you don't advance as far and that you seem to be getting the same job over and over. (At least, statistically.)

Best of luck with the visa issues. If nothing else, keep an eye on the long term. Development today may still pay off later.

(I mean, don't go crazy. I'm not big into unpaid work. But not all "eight hours" are created equal.)

This is absolutely true. However I just generally dont see the point in 'fighting' superiors... its not that its impossible to fight the system and make things right, its just not a good investment of an individual contributors time and effort. much better investment to move to an organization that is either already aligned with positive processes or actively seeking to improve processes.

I bristle at the sentiment of "things that are bigger than me." Perhaps my aversion to this notion stems from the fact that I've worked (mostly) in small companies. Folks at my now bigger company say things like: "above my paygrade" or "out of my hands" or "I'm too low on the totem pole."

I admit that by straining to fix these hard things, I'm often going outside of the formal role I hold at the company. I'm fine with that. Others may not be at times. For me, the straining is required. We have a cultural goal to "think like an owner" and that is how I typically justify my behavior.

>>To be blunt, they're paying you to do a job, not to make the organization better.

If this dichotomy exists -- if doing your job does not by definition make the organization better -- then that's a dysfunctional organization.

They all are.

I agree with one caveat, the "leadership" you mention is actually "management". Actual leadership requires you to do a hell of a lot more than just what you're told/serving your own bottom line.

Companies have business needs to not go down and to not lose customer data, both of which are very real consequences of poor engineering practice.

Managers are also acclimated to the business's practices and may find others absurd. The essay is suggesting that hey would be better managers were they to listen to the WTFs from newcomers.

Yes, and beyond that, engineers have ethical responsibilities to make sure they follow the golden rule when handling customer data.

I don't think you have to compromise much. If you have any self-worth, you'll strive for improvement. You don't have even have to that careful about rubbing folks the wrong way. You're not going to get fired or anything.

>You're not going to get fired or anything.

You won't get fired, but what might happen is that you get a reputation for being "that guy". The "guy who's always complaining about unit tests". Or, "the guy who emphasizes process of 'agility'". What then happens is that the moment you speak up, management tunes out. They know what you're going to say. They know what you're asking for. And they've already told you no. At that point, you're basically screwed, organizationally. You have a reputation as a troublemaker, which will make transfers difficult. You're not going to get the desirable assignments on the product that you're currently working on. So even if you don't get fired, your life is often made so miserable that you leave.

That depends entirely on how you do it.

If you're upset that there's lousy test coverage, and you respond by complaining, that's not useful. If you respond by making the test suite run faster; creating better mocks; writing docs; making the test results more visible; teaching others how to write (better) tests... that's a different story.

The trouble comes when you think that pointing to a problem is, in and of itself, valuable. We're all surrounded by problems; pointing out one that we likely know about probably isn't useful.

    If you respond by making the test suite run faster; creating better mocks; 
    writing docs; making the test results more visible; teaching others how to 
    write (better) tests... that's a different story.
The trouble with that is that you only have a limited amount of time. If you're spending time writing tests (even if those tests save time in the long run), it's going to eat up time in the short run, which cements your reputation as a "troublemaker". Namely, your boss is going to wonder why you're taking extra time to write tests and create mocks when your peers are cranking ahead with features.

Yes, in theory, you get to say, "I told you so," when your peers' features are found to be bug-ridden pieces of crap that have to be reworked two or three times before they can be deployed. In practice, a boss whom you can say, "I told you so," to is a boss who'd have listened to you in the first place, making the whole exercise moot.

Being smart enough not to mess with problem X of company Y doesn't mean I don't find it interesting to inquire about how problem X came along.

There seems to be a bit of a false dichotomy underpinning this article that companies value feature growth above all else and this directly results in poor operational performance. However, you can get terrible availability without delivering any features whatsoever for months and months.

Two 9s of availability? Half the customers I've had would be ecstatic to have even ONE 9 of availability. And those guys hardly ever ship any code due to how encumbered developers typically are in those places and release maybe once every 6 months to a year perhaps. In fact, this is basically my typical experience with most enterprise customers I've worked with as a consultant - they're unable to execute almost anything materially important and customers put up with them because nobody else is in that niche enterprise market that's keeping people employed by lack of choice / market consolidation (healthcare.gov is just a visible example - plenty more projects are even worse with perhaps even larger budgets with zero media attention).

> And those guys hardly ever ship any code due to how encumbered developers typically are in those places and release maybe once every 6 months to a year perhaps.

I've worked in places where some teams ship with this three or six month frequency. They consider it completely normal and find ideas like continuously delivery or even weekly deployments as not just abnormal but risky and irresponsible! This is the very point OP is trying to make, the _normalization of deviance_.

I agree with you plenty that this shouldn't be acceptable, but most of the people I've seen that are against modern practices of decent software teams are not so much examples of the points in this article as much as stereotypical examples of one's grandparents or parents trying to tell you how your job doesn't matter because software isn't "real work" and that their principles work just fine today as it did in the 70s. Or they expect that continuous delivery / CI is a product or feature of something else they bought for 9+ figures and that it's something that you bundle in with services and a license cost because that's literally all they know as how to make anything happen. Doing software projects with people that have more experience in medieval architecture would be probably more pleasant and productive than dealing with leadership that have decades of experience doing nothing but big company projects with more resources spent on planning software than on engineering talent.

If a company is hell-bent on focusing for development and new features over stability / security, that's something that can be fixed by leadership - I've worked with plenty of companies that turned themselves around and have wise leaders that know that it's time to spend the resources to do spring cleaning while trying to keep existing employees excited by feature development happy.

This post's style and "quality" of writing is really aggravating. I felt like I was banging my head against the wall after reading so many run on sentences or paragraphs that start with the same contraction. Other times the writing is so poorly executed I cannot tell what the author is trying to convey. For example what is going on in this paragraph:

"There’s the company with a reputation for having great engineering practices that had 2 9s of reliability last time I checked, for reasons that are entirely predictable from their engineering practices. This is the second thing in a row that’s basically anonymous because multiple companies find it to be normal. Multiple companies find practices that lead to 2 9s of reliability to be completely and totally normal."

It's a little bit awkward. Still, I find it oddly fascinating that you're confused, because I can't figure out what's confusing.

There is a company Dan Luu knows about.

This company has a reputation for great engineering practices.

This company had 2 9s of reliability when Dan last checked.

The reason it has 2 9s of reliability is a predictable result of its engineering practices.

Although this example is about a specific company, you can't identify the company from the description.

You can't identify the company from the description because it is a description that applies to many companies.

You also can't identify the example from the previous paragraph [of Dan Luu's post] because that paragraph's description also applies to many companies.

Multiple companies have engineering practices that cause such reliability problems and find these engineering practices to be completely and totally normal.

The unspoken implication here is that 99% reliability is considered bad. This may not be clear if coming from a different field where 99% sounds pretty good.

Right. 99% is an A+ in school and 3 days of downtime per year.

That's about the reliability of my home server. A set of commodity mid-to-low quality parts that I assembled and programmed to turn on after a power outage.

And I could easily double it (as in half the unavailable time) if any ok ISP become available at my place.

Good point. I had assumed the parent was complaining about syntax, but that could be confusing.

When you say 99% is bad, are you making a normative or positive statement?

I'm saying that the author believes 99% reliability to be “Bad (TM)”. Further evidence in the second paragraph of this post: http://danluu.com/broken-builds/

I didn't say that this is a normative viewpoint in engineering or whether I personally agree with it. As you can see from other commenters, many do hold this view. A sometimes opposing philosophy, however, is “release early, release often” which many open source projects adhere to.

I don't know what positive and normative statements are, but think of examples of highly available services. Electricity, Water, phone service. What do you think when those are down 3 days (or 72 hours) per year?

So, characteristic X is so common that it does reveal the identity of company Y. Yet the author is surprised that many companies think X is normal?

There are two points I'd make:

1) Maybe yes. If the practices are really bad, it could be surprising that they're widespread and that people don't see a problem.

2) In general, when someone is pointing out something bad, saying "you're surprised?!" is counter-productive. You don't have to be surprised to call something out (I'm not surprised anymore how much our government spies on us, but it seems bad).

He writes it as if he is suggesting that 99% reliability is just bad, period. If his point was to say that 99% is bad in this specific use case, he should have made that more clear.

It's convoluted and "hard to read" for a reason. The author wants to make a point by writing all sentences using the same style. If you don't see it...

The author detailed precisely why I left a former Y-Combinator company, Return Path.

"As far as I can tell, what happens at these companies is that they started by concentrating almost totally on product growth. That’s completely and totally reasonable, because companies are worth approximately zero when they’re founded; they don’t bother with things that protect them from losses, like good ops practices or actually having security, because there’s nothing to lose.

The result is a culture where people are hyper-focused on growth and ignore risk. That culture tends to stick even after company has grown to be worth well over a billion dollars, and the companies have something to lose. Anyone who comes into one of these companies from Google, Amazon, or another place with solid ops practices is shocked. Often, they try to fix things, and then leave when they can’t make a dent."

Did you give them a ton of advance warning when you left?


I left it open-ended, actually. I talked to the CDO and explained why I was leaving. He agreed with the decision and thanked me for being brave enough to stand up and say it.

It ended up being three weeks before we agreed that I'd successfully handed off everything.

No bad feelings either way -- I'd joined the company because they had said that they wanted to 'grow up,' but that was a feeling percolating up from below. The lower levels of the company wanted to grow up and stop firefighting all the time. The top levels of the company would fight you every last way.

Wow have I ever been there, done that at a startup you've heard of. Line level employees hated the endless firefighting; the ceo/cto didn't give a shit and stymied any change. My solution was to forfeit any ops duty at all. I told my boss she had two choices: I didn't work on ops or I didn't work there at all.

So what did your boss say to that? Guessing the latter, because it sounds like you don't work there anymore...

That's still pretty good for a company you thought was (somewhat?) dysfunctional!

You didn't have to name the company.

Why not? I worked there, it's publicly verifiable, and it's pertinent to the discussion since we're discussing it on YCombinator's forum.

> It’s sort of funny that this ends up being a problem about incentives. As an industry, we spend a lot of time thinking about how to incentivize consumers into doing what we want. But then we set up incentive systems that are generally agreed upon as incentivizing us to do the wrong things

The longer I live, the more I realize that everything is a market, and incentives control it all. The reason you follow company policy most the time? You're incentivized to follow the rules so you get the raise, or at lease don't get fired. When there are competing incentives for different responses to the same subject, that's when you need to take extra care to realign the incentives. Trying to institute new behavior? You have to fight momentum, familiarity, and sometimes easiness. That often requires more than a few dictates.

Everything is a system—and some systems are markets. A market is also a system.

This is why it's important to know how to think in systems, and about psychology, statistics, variation, knowledge and everything else that influences systems—if you want to work in one.

There's more to most systems than just incentives. They are a small part of what goes on.

Can you recommend some books on this? I have been trying to learn to think in systems, and find that to be the most useful skill to have.

If you want something more academic I recommend Measuring and Managing Performance in Organizations by Robert Austin [1]. There is a sample that is a pretty good introduction as well [2].

This was the only book worth reading when I was researching metrics for our team at work.

TL;DR: Don't use performance metrics for human beings. You almost certainly won't get what you want, and you'll probably get nasty side effects instead.

[1] http://www.amazon.com/Measuring-Managing-Performance-Organiz... [2] http://ptgmedia.pearsoncmg.com/images/9780133492071/samplepa...

I agree!

Donella Meadows is one of the most articulate writers and thinkers on systems. She is the easiest to learn the basics from: http://www.amazon.com/gp/product/1603580557

Same person, shorter, free, and condensed format: http://www.donellameadows.org/systems-thinking-resources/

Same author again, the final chapter from the book above: http://www.donellameadows.org/archives/dancing-with-systems/

And if you only get one book on how to apply it to business, it's this one: http://www.amazon.com/Leaders-Handbook-Making-Things-Getting...

A more recent applied-systems-thinking manual for organizations that I've found hits home with more traditional managers (very useful): http://www.amazon.com/The-High-Velocity-Edge-Operational-Com...

Another good one that's a bit long-winded, but another set of applied examples: http://www.amazon.com/The-Fifth-Discipline-Practice-Organiza...

Senge, in the foreward of that book, gives almost all credit to W. Edwards Deming—who is the originator of many of the ideas of organizational systems thinking and how it integrates with Management. So, if you want to go deeper, Deming's book "Out of the Crisis" is a good tome.

Systems Thinking, Third Edition: Managing Chaos and Complexity: A Platform for Designing Business Architecture

nor·mal - adjective 1. conforming to a standard; usual, typical, or expected. "it's quite normal for puppies to bolt their food"

All the things that he writes about are normal - they happen. People (myself included) with an engineering background are surprised when things don't "make sense" or people don't do things the "right way." The trick is to get to the point where these things are not surprising, where you see them as part of the systems you are trying to understand and consequences of forces that aren't mysterious, they are just part of human social dynamics. From that vantage point you can get a better sense of what you can change to influence outcomes and whether you can or can't in a particular context.


Eat as fast as possible, (with plausible effect of vomiting soon after--not part of the definition, just a common effect).

It's a word in American English, just not super common.

This post seems so good and so self-evidently true that I'm surprised at the amount of pushback it's getting here. Not sure what else to say about it.

Well, I'll say this- the "@flaky" thing is pretty mind-blowing. In my own company I have noticed many engineers have a disturbing level of comfort with deciding something is a "mystery". There are no mysteries in what we do. The test fails because something is fucked up. Flappy tests are annoying, but the right thing to do is to address the situation.

It depends. I recall one project with flaky tests, and shortly after joining I decided to investigate. It turned out to be a problem with floating-point values in the shopping cart logic. Tests were using random item prices, so sometimes it appeared and sometimes not. I'm glad I found that one. :-)

On the other hand I've seen something like @flaky used in the Ruby world to run Capybara tests with a lot of Ajax, where you get failures because of timing issues deep down inside Selenium. It's not a problem with production or your app, but your testing tools.

In the second case I still am not totally comfortable, since it makes it easier to overlook flaky tests that really do need attention, but I can understand it there.

Yeah, I'm with you. Of course things can never be perfect but we should be striving to make them as perfect as we can. I've felt Selenium pain too.

But, and this is crucial-- notice how you understand why the tests were flappy. As opposed to just grumbling like "oh that dumb old thing again. I wish it would shut up, the thing totally works."

I agree, but with a caveat - there's an important time/value tradeoff to going down any particular rabbit hole where you have more tasks than time to expend on them, particularly if you're in a billable or result-oriented area, and it lacks obvious benefit.

There are plenty of times I've seen people think of things as "well, it just does X" because of a lack of depth of knowledge, but you also can't necessarily go down every rabbit hole to the bottom, unfortunately.

I disagree because things like flappy tests are a "smell" that could lead to discovering a deeper problem. We simply don't know ahead of time that the payoff won't be worth the effort in such a situation. If you can't explain what's wrong, that's bad.

Certainly, but given that the value of the test can be nonzero even if unreliable, and you may not have time to run down why it's unreliable, it makes sense to have a mechanism to utilize it.

Not ever having engineering time to investigate why it's unreliable is a (mostly) unrelated problem.

Yeah...I mostly have experience only running test suites for popular libraries and for my own software...but the existence of this library is a massive WTF...such that had I read its official introductory post, I would have thought it to be really good satire:


Can someone describe a real life production scenario in which this flaky behavior is desirable? That is, preferable to these other generally accepted practices:

- flag the test and mark the bug as an issue and at some point, attempt to fix it

- delete the test, if it happens to relate to dead code or was poorly conceived in the first place

- use fixtures and other libraries to mock dependencies, e.g. Webmock and/or vcr to intercept http request and respond with a pre-recorded fixture.

I understand that there are scenarios in which production most go on even when a test fails. But to throw on another testing layer that tells you, "hey, it kind of works, for some unknown reason", instead of just marking the test as a failure to be investigated...what possible value or insight could outweigh the additional noise generated? I guess one possibility is that it lets you know that something is truly fucked up...but that is not at all the tone of the Box blog announcement:

> When testing Sync 4, Box's desktop sync application, we also ran into this issue, but we also didn't want to simply remove our flaky tests. When we noticed that most flaky tests would pass when rerun, we realized we could make doing so automatic. Flaky is a nose plugin that can rerun flaky tests without interrupting your test run. Using it is as easy as decorating your test methods with @flaky

At AWS we had some tests which involved adding stuff to DynamoDB. Since it's eventually consistent, sometimes the tests would fail because some assertions wouldn't be true until some time later. Of course we tried to make them succeed every time with retries, but we didn't want the tests to take forever either.

So, it was expected that the tests would fail once in a while. Maybe there could be a better way to handle this, but what would happen is if that test failed we would just check the data manually, waiting for it to come up.

Th nuance of flakiness (if you work at a very large company with one continuous build system and one big repository for everyone) is that in order to not break dependent teams with your checkins, you run all their tests before checkin. Sometimes this catches major, subtle issues - out of a run with a million+ test targets, one test fails and it reveals a critical security bug in a widely used open source library. But most of the time, large numbers of poorly written (or neglected) tests fail, and it's not because of your change. The important thing is to show a distinction in the UI between a flaky test and a passing test, so that people don't look at a wall of green and see success, and push broken code to production.

"Can someone describe a real life production scenario in which this flaky behavior is desirable?"

I think so? My company develops a processor that is meant to be compatible with processors from other vendors. To try to ensure compatibility, we have a test suite that compares our "golden model" of intended behavior against the observed behavior of competitors' chips.

We sometimes have run into cases where a competitor's product "randomly" gets wrong results, but when we run the instruction again, it gets the right answer. This happens frequently enough that we've arranged the test suite to automatically try again to see if a failure is reproducible before bothering a human with it.

Yeah I figured these kind of errors fall into the system interoperability/integration category...but I also figured they would fall into their own kind of test suite, one that is much more geared toward the measurements of thresholds and probabilities. Adding a "flaky" plugin to the standard test framework to do this kind of non-standard testing...feels like designing the workflow in a slightly backwards way, like monkeypatching a basic data object for very specific behavior needed in a few niche libraries.

> flag the test and mark the bug as an issue and at some point, attempt to fix it

But what do you do with the test in the meantime? If you disable it you lose coverage.

The best temporary state might be to re-run the test until it passes. If it fails after allowed re-runs, you have a regression.

I think it corresponds to a lot of normal practice - even if you don't use @flaky, if one build fails and you run it again and it passes, what do you do? It's very possible to build tests that will pass sometimes if the code is correct and fail always if the code is wrong (e.g. test where there are race conditions in mock initialization, test that asserts that rows come out of a database in the same order they were put in). IME that comes up much more often than a test that will pass always if the code is correct and fail sometimes if the code is wrong. It's not pretty and it usually indicates underlying poor design but it may be the pragmatic option.

> even if you don't use @flaky, if one build fails and you run it again and it passes, what do you do?

One's answer to this question is a look into one's soul, at least in terms of engineering. The question is exactly what do you do. In my opinion, good engineers find out why the first one failed and why the second one passed. Either could be a false result. Sometimes the tests failed because the build box ran out of hard drive space, that's a false result. Sometimes a test passes because its condition for passing is incorrect, that's a false result too.

Tests passing is not necessarily a good thing in itself. That is why we have the notion of "testing the test"- a green icon means nothing if it's a lie.

> if one build fails and you run it again and it passes, what do you do?

I run it a third time, to start.

What do you do differently based on the outcome of that? The only scenario I can think of where that makes sense as a first step is if you're happy to ignore a 1-out-of-3 failure but feel the need to investigate a 2-out-of-3, which seems like an odd position to take.

I just think it's the best diagnostic step to take first.

If it fails again, perhaps you've got a timing issue, or something that's switching back and forth.

Of course you're not happy to ignore the failure.

The paper on "normalization of deviance" that this post links to is also really good. It's written from a medicine perspective, but its observations and conclusions are pretty generalizable.


Sidney Dekker's _The Field Guide to Understanding 'Human Error'_ has a bit about this (which he calls drift). One thing I liked in that section is a sort of anti-Murphy's Law: What can go wrong will usually go right, and then we'll assume that it will go right again and again, even if we borrow more from our safety margin.

That said, it's a trade-off, and if you're a startup, it's usually far better to do it dirty today than perfect next year.

Diane Vaughn wrote about 'normalisation of deviance' in her book The Challenger Launch Decision. The original Space Shuttle SSRB design had redundant O-rings for safety, and while a small amount of O-ring erosion (from the hot gases of burning propellant) had been observed on several occasions prior to 1986, it had never resulted in failure, so after a while, they began to accept it as normal. But as Richard Feynman pointed out, those O-rings had never been designed to erode. The first time erosion was seen was a deviance; it indicated that the protective thermal putty was not doing its job, and should have been corrected then.

An interesting (to me) recent piece of writing from the aviation world also points a finger at the "normalization of deviance." Written by Ron Rapp, a charter pilot:


Thanks for this...lots of great examples...the one about the stoned-out music student who, as a museum guard, accidentally enabled a ~$500 million art heist, was a good laugh, with links to more interesting reports and post-mortems.

I went in agreeing with the headline, and the article just didn't follow through on it (also, thank god for readability view. 2560 pixels wide lines with no margin are not any more readable than blogs full of ads).

So it posits that there's messed up practices considered normal, then it talks about companies that are clearly abnormal? Even in one of the very first paragraphs, "the company whose culture is so odd that ...". And since when is marking flaky tests is "completely messed up practice"?

Ok so a lot of this is about coding practices and such but... like the guy said, those problems sort themselves out. Bad security gets broken into, the companies eventually get hit and either die out or fix themselves. Etc...

There's a lot of completely messed up practices in tech. Oh lord, especially as a european looking in to the SV world. Some of those messed up practices I can't even mention on HN because people think they are so normal, I get mass downvoted and have to engage 5 people telling me how normal this is (in fact I might even have to engage in it by just mentioning this).

1. The issues highlighted tend to be problems specific to people. Programmers that don't know how to do some aspect of security properly, that's a problem specific to those guys. You put me next to them, I'll do that bit properly but will be clueless about a different bit. It's all fixable.

2. The actual programming & design is the least problematic, mostly because it's the one that's most easily changed. Trying to change culture gets you fired. Fixing a pipeline with noticeable improvements gets you promoted. The one bit I did agree with was that it is hard to show improvement when you prevent a fire. I also think that's fixable and I also think that's a people problem, except at the manager level.

You want messed up practices? Look at the game dev industry and its mandated crunched and burnouts. "Everybody's crunching for the next 6 months because we really want to see the game released on time".

You said "So it posits that there's messed up practices considered normal, then it talks about companies that are clearly abnormal?", but isn't that the point of the article, that clearly abnormal is subjective, and talks about how the people in the companies become blind to that?

Curious, as a European, what do you find wrong with some of the practices in SV? Genuinely interested in hearing your perspective (perhaps if you elaborate and keep it objective as possible you'll avoid some of the mass downvotes).

If you're really curious, you can take a look at my post history - if you go far back enough you might find a few rants on it.

But it has a lot to do with culture. Adopted traits that are only there to serve themselves. And since culture is by definition subjective, I can't really say anything bad about it now, can I?

Sometimes a new person comes in saying "WTF WTF WTF Wtf wtf..." because you don't have the messed up practices that they expect.

Ah yes, writing tests, code reviews, and version control can be such a pain, why would you put up with these productivity killers?

It really depends. When I emerged out of the DoD contracting industry, it was a total WTF that we weren't obsessing over charge codes and timekeeping. Even if we're not billing a customer, how do we even know what we're spending our time on if we're not documenting it with constant timesheets?

It's possible for process to serve a purpose, but still not be worth it. (Not going to argue with your specific examples though)

Yes exactly. Some new people will come in saying things like that because they've internalized bad practices.

An anecdotal observation is that the worst offenders in terms of institutionalized bad practices also have a culture of failing upwards. ie incentive structures are setup in a way where you create your mess so fast that you get promoted and someone else has to deal with the aftermath of what you did. In such an environment slowing down to do the right thing invariably means that you are setting yourself up to inherit a mess created by someone else.

Ooh, that's insidious and has the ring of truth to it. I've never thought about that before.

"There’s the office where I asked one day about the fact that I almost never saw two particular people in the same room together. I was told that they had a feud going back a decade, and that things had actually improved – for years, they literally couldn’t be in the same room because one of the two would get too angry and do something regrettable, but things had now cooled to the point where the two could, occasionally, be found in the same wing of the office or even the same room. These weren’t just random people, either. They were the two managers of the only two teams in the office. Normal!"

I'm 99% certain that I know about whom the author refers here, having worked at an office with somebody of the same name where a drama matching this description took place. It was one profoundly weird situation that should never have been allowed to fester.

Thanks for stoking the fire of my curiosity.

This reminds me that I've acclimatised to one specific test in the PHP interpreter test suite always failing on OS X. I should go and make it either not run on OS X, or modify it so it will actually pass.

That way, when someone eventually actually breaks that function, they'll notice.

Its a broken window, if you don't fix it people have permission to just have tests that fail, they get used to seeing a failure that 2 failures isn't an issue and pretty soon real problems that would be caught by the tests aren't fixed anymore.

A nice thought, but pointless. Failing tests do not matter to the PHP release managers, they ship it anyway.


That's nonsense, PHP releases ship with no failing tests. That's from three years ago, and anyway gcov.php.net is not used for CI these days (IIRC it's an abandoned old box, its results are inaccurate), Travis is. The PHP-7.0 and PHP-5.6 branches are currently green on Travis.

That also reminds me back when I did x.400 mail for a living (at british telecom) Sprint had a implementation that completely contradicted the ITU standard - Sprints response was well just put a fix in for sprint's mail systems.

ICL also though that even if the standard said such and such index MUST start at 1 that they woudl start it at 0 - cue some nice divide by zero errors - Not surprised we don't have a UK Mainframe company any more.

Similarly, back when I used to maintain Stackless Python and do merges in from mainline Python, my benchmark was to only have the same set of failing tests post-merge for the same merged revision.

More often than not I would have failing tests in official Python release version tags, which became par for the course.

Sounds like a good idea to me!

This line perplexed and disturbed me: "...This is the same company where someone recently explained to me how great it is that, instead of using data to make decisions, we use political connections, and that the idea of making decisions based on data is a myth anyway; no one does that...."

One, that is depressing and implies the company seems to have a corrupting influence on society at large.

Two. The girl is right and it is called insider information. Another corrupting aspect of our society.

In other words, corruption is a new viable normal and workers have no problem with it because most workers are all desperate and happy to have a passport to middle or upper middle class.

From the context, I think the author was probably referring to office politics, not corrupt politicians.

I hope so for that is less corrupt. I can be somewhat obtuse. I am going to write the author and ask. Thank you.

so many of these problems have the same root cause:

we don't have an effective data driven reputation system. we use gameable heuristics to track social capial.

when metrics for evaluation are flawed, people behave in ways that exploit the flaws even if they increase the likelihood of failure.

"we are not rewarded for necessary grunt work as much as shiny advances", for example. That's a failure of the reputation system to account for the value of that work.

My solution to this problem is a mathematical reputation system based on the same concept as page rank. The system is available here:


I'd love your feedback.

In the past few years I have been building up good reputation at various stores both online and offline. It bothers me I cannot use that reputation. For example a major supermarket chain here in The Netherlands rolled out self scanning from 2006[1]. They do random checks at the checkout, and for many years now they know I never forget to scan something. This resulted in the amount of random checks going down for me. That should mean something, and I should be able to use this trust/karma elsewhere. All these companies building data on me, and I can't use it myself.

[1] https://www.youtube.com/watch?v=orjo0uNZFsk

The challenge is getting the companies to (from their perspective) "share" the trust information with the world, which includes their competitors.

I suspect we might need a "taxonomy of trust" so to speak, that allows the trust data to be anonymized and aggregated into commonly-accepted meanings of trust contexts, trust roles, trust relationships, etc. That might let these companies to release the trust data into such a format through a blockchain perhaps, and be able to participate in consuming the aggregated data. I'd need someone well-versed in game theory to figure out if an advantage is conferred to "leeches"; a company in such a scenario who only consumes the aggregated data but never send into the blockchain what they accumulate on their own customers. I think that's a real danger with such a scheme, but am not sure how to strongly dissuade that behavior.

Well, watch out. That sounds great if you trust everyone you interact with to give you good reputation. But pretty quickly that can lead to the recent stories about the Chinese social credit (assuming the stories are true). It can become a very subtle and powerful way to control people. Not to mention a legal minefield. Just look at credit ratings.

I like this very much, thank you for bringing it up. Take a look at Fabriq [1; PDF], trying to accomplish similar in a blockchain context. I especially like the work you put into explicating the implied respect issue.

What are your thoughts modeling respect not just as a unitary quality (still highly useful for quick, ad hoc, high-level evaluation), but also along crowd-created and crowd-defined axes? Then people can refine their description of respect and for example, say they agree with one crowd-group's definition of "good manager" for a specific person, but at the same time that person is not respected as another crowd-group's definition of a "good leader".

Gaming reputation systems over extended time periods and via aliased entities is a perennial problem. What are your thoughts on random latencies before respect scoring is evaluated on new respect data for an entity, securely tying a hash based upon fully-sequenced DNA to real-person accounts, interaction of entities in a specific context (someone might be respected as a great athlete, considered toxic in one of the companies they own, but respected in a different company), and tracking corporate aliasing (through mergers, acquisitions, spin-offs, name changes, etc.)?

[1] http://www.ourfabriq.com/fabriqwhitepaper_v20150520.pdf


> The Chinese government is building an omnipotent "social credit" system that is meant to rate each citizen's trustworthiness.

Speaking of completely messaged up...

I had to add max-width, margin, and font-size styles before I could even attempt to read that page. For all that markup, there sure wasn't any attention payed to readability.

It's just plain HTML. That your browser doesn't display it readably is a good example of a completely messed up practice that people have come to believe is normal.

Please don't write off somebody's lack of attention to typography as a virtue and clever design. It's not. Browsers display it how they are told to display it. Maybe your system could use better fonts or line spacing by default, it's arguable, but it definitely would be stupid and unreasonable for browser to enforce less than maximal width for some of your div's if not told otherwise. If anything, they already enforce more than they should (that's why normalize.css exists).

And, by the way, it's not like there's no css at all in the source. It's just UX-ignorant, so to say.

Browser defaults have a legacy and they're not easy to change without disruption. Just because they're less than ideal isn't an valid excuse to neglect your content presentation.

The author seems to have concerns for UX because his page isn't just HTML. He put some CSS on the nav header, used some HTML5 semantic elements, some Aria roles, and added a viewport header element for mobile scaling. Sadly his concerns were passing as 1/3 of his CSS is rendered moot, and his desire for a semantic page was abandoned quickly after implementing his navigation and footer.

His page's UX is improved dramatically by adding 2 simple CSS rules (margin, max-width). If those rules add too many bytes to the page he could convert his navigation divs to a proper unorder list and get some bytes back and make his page more semantic to boot.

His page is just a drive by attempt at standards and it's lazy.

Or an example of someone being too damn lazy to put effort into their offerings. I wouldn't be saying anything except that it takes so little effort to make it 1000x more readable than he did.


I use this Chrome extension: https://chrome.google.com/webstore/detail/read-mode/nagcaaho...

There are others like it if you don't like this formatting.

> This is a problem even when cultures discourage meanness and encourage feedback: cultures of niceness seem to have as many issues around speaking up as cultures of meanness, if not more. In some places, people are afraid to speak up because they’ll get attacked by someone mean. In others, they’re afraid because they’ll be branded as mean. It’s a hard problem.

That's a really good insight, and it's something to keep in mind with all the recent controversy about development cultures.

Getting Things Done When You’re Only a Grunt, Joel Spolsky, 2001:


This is a much better version of the original article. Here's solutions to situations that arise when you have access to information that you feel should be acted upon that doesn't have broad organizational support already.

Good article. I could think of reasons why people kept leaving in their first year at the company where they had freedom. The author says the company had the beat parts of Netflix and valve. What is the author referring to?

> well, we have some tweaks that didn’t make it into the paper.

Every single time I've tried to implement a newish, reasonably complicated algorithm from a paper and contacted the authors when I've run into trouble, this is the reply I've gotten. How is it not normal? It's research after all, and if you've worked in research you should have a good idea how the paper mill works.

I think your response here is a perfect example of what the article rants about.

It may seem normal to you and people with a "good idea how the paper mill works", but it is absolutely insane from the point of view of a lot (hopefully most...) people outside of that bubble, who would likely mostly expect the results in a paper to at least be possible to replicate with the information in the paper.

But is it expected?

"How is it not normal?"

Because how the hell is anyone supposed to validate their findings?

> There’s the company with a reputation for having great engineering practices that had 2 9s of reliability last time I checked, for reasons that are entirely predictable from their engineering practices. This is the second thing in a row that’s basically anonymous because multiple companies find it to be normal. Multiple companies find practices that lead to 2 9s of reliability to be completely and totally normal.

I'd like to know more about these practices that lead to 2 9s of reliability. Can you give specific examples of such practices, albeit not the companies themselves?

Is the implication that 2 9s are really bad, mediocre, or some other negative judgment in this context? I'm familiar with a lot of places that have trouble with even 1 9 of reliability with lots of capital but terrible culture despite employing lots of operational controls and some best practices that have trouble with one 9, so even 2 9s sounds like a dream sometimes and the way that Dan Luu worded this article makes it hard for me to understand the different cultural failure modes that he's trying to express.

How does a place survive being down 37 days per year (one nine)? Cripes!

SLA that defines unplanned maintenance very, very conservatively. For example, a place I was at did maintenance for about 8 hours weekly where users wouldn't be able to manage any of their stuff, but there was network access outbound possible from their resources, so it was considered "available." Somehow with hardly anyone actually doing anything with the service, it was a major part of an $800M+ acquisition and mostly for being "enterprise" ready. I will not understand how you can get away with things that demonstrably fail to work beyond the most trivial of canned demo cases and be sold like it's completely done to even more gullible companies with zero technical vetting.

There's a lot of completely inconcistent definitions about what a service is sold as and what is actually delivered, and lawyers half the time only care about regulatory requirements rather than functional ones. Someone will use the ITIL definitions of service and say "it's available!" meaning that it exists in a CMDB or something and another person defines availability as "I can ping it" and doesn't care if there's an HTTP 500 error being thrown repeatedly. But gosh, if something used an insecure MongoDB server that is the reason they immediately cancel a contract (not hyperbole, saw something very similar happen).

One 9 could also be 9%. I think we should be thankful for two 9s when we get them.

1 9 is terrible (1 month down!). 2 9s is ok (4 days down) for a lumbering IT dinosaur, terrible for any SV company.

It's pretty clear from the context that the author finds 2 9s of reliability unacceptable.

Somehow tangential

"These hidden problems are the true gold standard of entrepreneurism and it’s amazing how little discussion there actually is about how to find them. It’s hardly a surprise though since they; as we can see, can be hard to find and I think there are a couple of reasons why.

Hidden problems aren’t obvious even to those who experience them every single day.

Most people have enough human problems. They are often hired to do a specific job and don’t necessarily think about these problems as something that could be solved. Many just see them as part of the actual process. So to even understand they are problems, require a certain kind of attention, most people simply don’t have. (I have later learned that this is called functional fixedness and is a cognitive bias. Which explain why people sometimes say “Why didn’t anyone think of this before?” — most people simply don’t think like that.)

Hidden problems often only reveal themselves over time.

Not all problems are even instantly recognizable. Instead they only reveal themselves over time or through years of experience. This also means that many of these problems require a certain age and experience to even notice let alone understand. Perhaps this is one of the reasons why the average age of a founder is 38 and with 16 years of working experience behind him."


I want to speak to one aspect of the wonderful article:

Listening to weak signals.

I'm about to do a shameless promotion for a book I have nothing to do with, but a book that has been a guiding light for me: Michael Lopp's Managing Humans

When I was reading the article I couldn't help but think of Lopp's advice about regular one-on-one meetings with each of the people on your team.

I think this is one of the points that Lopp intends for managers to be listening to during those meetings.

They aren't so much for feedback from the manager (as they are often treated), but more as opportunities for the manager to listen.

If I understand the book correctly, those one-on-one meetings are exactly the place where the managers are supposed to be listening for the "weak signals."

I am not an expert in every area of development, and yet I have somehow been inserted into a management role.

As Lopp explains very clearly, this happens often, and the single biggest thing you can do when that happens is care about being a manager. It's a different skill set than being an IC.

Recognize that, but don't get totally caught up in that. I don't think Lopp would disagree with anything in this article. I think, in fact, that following Lopp's ideas would lead to far fewer cases of WTF than what we see in the wild.

I read the title and immediately thought of circumcision. The article was not about that, and I enjoyed the insight. Same concept though. Normalization of deviance.

The funny part about the recent obsession with "normalization of deviance" is that it's just one of hundreds of psychological biases that people exhibit and that directly impact work.

This is why it's important to learn about psychology if you intend to work with humans—or even just with yourself.

Way back in WWII soldiers coined the acronym (SNAFU), "Situation Normal All Fucked Up".

I think there is another thing to take away from some of the case studies - if you design and implement operating procedures and alarms, do so in a way that is simple, effective, and does not draw an undue amount of time and attention to itself. I have dealt with too many systems that sound the "everything's OK alarm" constantly and procedures that have good intentions but no effort to streamline the gratuitous amount of time and effort needed to be followed.

It is not constructive to blame employees for failure to heed poor alerts and protocols.

> And I can think of more than one well-regarded unicorn where everyone still has access to basically everything, even after their first or second bad security breach.

Which companies? That's pretty scary.

I used to work for a major financial exchange like this. When I joined, the root password was known by /everyone/. They also used telnet instead of ssh.

Another company I worked for used rot13 for their back end risk management system's password storage. Found it completely by accident when trying to add the platform I was supporting at the time. I had a setting to the effect of 'resolve data from defined functions' enabled, so every password stored would be resolved to plaintext instead of showing their 'hashes'. It was batshit scary - scariest being the production r/w credentials for the credit card and mortgage databases.

When I reported that one to the devs, they responded with, "We know. We needed to push the code out as quickly as possible, so we got lazy". Fuck. That.

Could be Instagram, judging by that recent security writeup...

Key point: "The simplest option is to just do the right thing yourself and ignore what’s going on around you."

This is so, so, so easy to do and generally has no negative repurcussions.

Then There is the story of the Emperor's new clothes.


Sometime older works or those in exit interview, just don't GAF and call "group think" what it is.

You want to fix these problems. Don't hose your monkeys and hire some old deep thinkers and empower them.

The most unsettling thing is how much I love reading anecdotes like that - they make you feel so much better about "messed up" things you do yourself.

My personal hottake is that, as Einstein proved, everything really is relative.

What may seem dysfunctional and WTF from your vantage point may be perfectly logical from someone elses.

If you are going thru your work-life with the idea that social relationships and business practices are always going to follow the same strict rules as your software or hardware does, well good luck with that.

Normal is as normal does.

Einstein proved nothing like this.

Next you'll tell me that Descartes didn't prove the existence of Zeus!

Einstein proved that everything is truly relative, and depending on your particular velocity and place in the universe, fundamental things you perceive, and that others perceive about you, are totally and completely different.

Perhaps I was being a bit too abstract, but I stand by my assertion.

You brought completely irrelevant point.

No I didn't...it's only irrelevant to you because you are not trying to understand what I'm saying.

If you just dropped the Einstein part, it would be a perfectly fine comment.

No it wouldn't. The statement "everything is relative" is in itself a contradiction.

I can't read this because my ISP's (I assume) scheme of MITMing all http traffic is buggy, and now I can only load things over https.

Start using encryption, people. There's no reason not to.

This is an odd post to voice that complaint on, since the site supports HTTPS.

As someone trying to get a free SSL cert from LetsEncrypt right now, I'd say that it still has some way to go before it becomes frictionless :) (getting there quite fast though)

Otherwise, the reason becomes $$$ .

Well, nowadays it's just $, as you can get one for $9/y ($5/y if you buy 3 years). For one subdomain, mind you.

Are you blaming the writer because this page is delivered through regular http?

That seems reasonable - when you fail to deliver content over HTTPS, you put your readers at risk of monitoring, content injection and all sorts of other nasty things.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact