
How Completely Messed Up Practices Become Normal - Tenoke
http://danluu.com/wat/
======
danso
I immediately thought of the "5 monkeys and a ladder" psychology study, in
which the first monkey attempts to climb the ladder to get a banana and all of
the monkeys are sprayed with water. Then one of the monkeys is replaced with a
newcomer, who then tries to climb the ladder, but this time, the remaining
original monkeys attack it. And so the replacing of the monkeys continues,
until none of the monkeys knows what the deal with the ladder is and yet none
attempt to climb it.

This is a popular story in dev/product culture but after a little more
Googling, apparently it's apocryphal:
[https://www.psychologytoday.com/blog/games-primates-
play/201...](https://www.psychologytoday.com/blog/games-primates-
play/201203/what-monkeys-can-teach-us-about-human-behavior-facts-fiction)

~~~
mattkevan
It's a bit like when there's two cash machines, with a long queue at the first
but no-one at the second.

You can see people looking at it, wondering whether it's worth the humiliation
of finding it's broken and having to go to the back of the queue, or the joy
of finding it is working and getting one over everyone else.

Often nobody cracks.

~~~
klenwell
I actually come across this scenario quite often in retail outlets with poorly
designed queue layouts (e.g. an H&M with with 2-4 registers behind a long
counter and a single ad-hoc queue lined up in front of one of them).

My reasoning is a little different (and a little more charitable I suppose in
its generalization about human nature): I don't want to go step in front of
the register with no one waiting in it because I don't want to look like a
sociopathic jerk. I also assume that is why most people haven't done so.

At the same time, I know as soon as someone does step up and do it, people
will grumble but the single ad hoc queue will redistribute itself into 3 more
balanced queues. I'm usually waiting for someone else to be that sociopathic
jerk.

What I wish is that the manager of the store gave a little more thought to
this issue in the first place.

~~~
lnanek2
Actually, scientifically, one queue for all registers is the superior
solution. It reduces wait times for customers in general vs. one queue for
each register (where one of the lines might get stuck for a long time). That's
why you find banks and other places are all setup to have one queue used by
all the clerks. The only real problem with H&M is that they haven't put stuff
in the way of people forming multiple lines when keeping them in one line is
the best practice.

~~~
bewuethr
It doesn't reduce average wait time, but the variance in wait times.

~~~
ignoramous
Well, it might bring down the average, but not as much as the variance,
because the work is now load-balanced across multiple counters, more or less,
evenly; and so, the productivity might be a bit on the higher side?

------
vinceguidry
Most of what the author is complaining about boils down to business needs
being more important than the needs of the engineering team. To be blunt,
they're paying you to do a job, not to make the organization better. That's
what they pay the leadership for. You want to be part of the leadership, work
your way through the ranks or start your own business.

My life got much, much easier once I learned to stop straining so hard to fix
things that are bigger than me. If you don't like your managers, or the
culture, or the business, find another place to work, it's that simple. If you
can't do that, you're simply going to have to learn how to compromise.

~~~
shalmanese
This is exactly a perfect example of WTF worthy behavior being treated as
totally normal; that it's encoded that there's "leadership" and "not
leadership" as binary job distinctions with distinct responsibilities.

I've been in both types of cultures and, by far the healthier culture was the
one in which "making the organization better" was treated as everyone's
responsibility. I've seen very junior engineers drive quite significant
culture changes, simply by leading by example. They did it, not by whining or
complaining, but by taking baby steps of trying out small experiments with
their immediate team and then broadcasting their success to incrementally
greater circles within the company until it became the new normal.

If you seriously believe that all (or most) companies disallow anyone not in
leadership to improve things, I'd consider getting out of your current
situation and seeing things again with clear eyes.

~~~
vinceguidry
> If you seriously believe that all (or most) companies disallow anyone not in
> leadership to improve things, I'd consider getting out of your current
> situation and seeing things again with clear eyes.

I've been around a bunch, including the US Air Force, and I can confidently
tell you that heavily hierarchical organizational culture is the norm,
individuals contributors who want to have effects beyond their job scope do so
at risk of offending other stakeholders. They may tolerate you and even give
some limited help, but they're not going to shout your name from the rooftops
just because you want to be a boy scout.

Not saying you can't buck the system and get away with it, one of my favorite
books was about one of my heroes, John Boyd, but if you want to be a hero, you
need to go into it with a clear-eyed assessment of what you're up against and
so you can tailor your objectives appropriately.

~~~
ajross
The DoD (!) is hardly representative of the cultures being discussed in the
linked article, which is about tech startups (with a bunch of examples from
medicine too). Yes, there are worse examples, but those may be beyond help.

~~~
apalmer
Tech start ups are a rather small part of the overall IT industry. I dont
really have the numbers to back it up, so i am not going to speculate as far
as the actual numbers, but any discussiona bout workplace IT that ignores
everything that isnt a tech start up is virtually useless.

~~~
sbov
And the IT industry is a rather small part of the overall job market. So
therefore any discussion about the workplace that ignores everything that
isn't IT is virtually useless?

------
devonkim
There seems to be a bit of a false dichotomy underpinning this article that
companies value feature growth above all else and this directly results in
poor operational performance. However, you can get terrible availability
without delivering any features whatsoever for months and months.

Two 9s of availability? Half the customers I've had would be ecstatic to have
even ONE 9 of availability. And those guys hardly ever ship any code due to
how encumbered developers typically are in those places and release maybe once
every 6 months to a year perhaps. In fact, this is basically my typical
experience with most enterprise customers I've worked with as a consultant -
they're unable to execute almost anything materially important and customers
put up with them because nobody else is in that niche enterprise market that's
keeping people employed by lack of choice / market consolidation
(healthcare.gov is just a visible example - plenty more projects are even
worse with perhaps even larger budgets with zero media attention).

~~~
antrix
> And those guys hardly ever ship any code due to how encumbered developers
> typically are in those places and release maybe once every 6 months to a
> year perhaps.

I've worked in places where some teams ship with this three or six month
frequency. They consider it completely normal and find ideas like continuously
delivery or even weekly deployments as not just abnormal but risky and
irresponsible! This is the very point OP is trying to make, the _normalization
of deviance_.

~~~
devonkim
I agree with you plenty that this shouldn't be acceptable, but most of the
people I've seen that are against modern practices of decent software teams
are not so much examples of the points in this article as much as
stereotypical examples of one's grandparents or parents trying to tell you how
your job doesn't matter because software isn't "real work" and that their
principles work just fine today as it did in the 70s. Or they expect that
continuous delivery / CI is a _product_ or feature of something else they
bought for 9+ figures and that it's something that you bundle in with services
and a license cost because that's literally all they know as how to make
anything happen. Doing software projects with people that have more experience
in medieval architecture would be probably more pleasant and productive than
dealing with leadership that have decades of experience doing nothing but big
company projects with more resources spent on planning software than on
engineering talent.

If a company is hell-bent on focusing for development and new features over
stability / security, that's something that can be fixed by leadership - I've
worked with plenty of companies that turned themselves around and have wise
leaders that know that it's time to spend the resources to do spring cleaning
while trying to keep existing employees excited by feature development happy.

------
dfc
This post's style and "quality" of writing is really aggravating. I felt like
I was banging my head against the wall after reading so many run on sentences
or paragraphs that start with the same contraction. Other times the writing is
so poorly executed I cannot tell what the author is trying to convey. For
example what is going on in this paragraph:

"There’s the company with a reputation for having great engineering practices
that had 2 9s of reliability last time I checked, for reasons that are
entirely predictable from their engineering practices. This is the second
thing in a row that’s basically anonymous because multiple companies find it
to be normal. Multiple companies find practices that lead to 2 9s of
reliability to be completely and totally normal."

~~~
hyperpape
It's a little bit awkward. Still, I find it oddly fascinating that you're
confused, because I can't figure out what's confusing.

There is a company Dan Luu knows about.

This company has a reputation for great engineering practices.

This company had 2 9s of reliability when Dan last checked.

The reason it has 2 9s of reliability is a predictable result of its
engineering practices.

Although this example is about a specific company, you can't identify the
company from the description.

You can't identify the company from the description because it is a
description that applies to many companies.

You also can't identify the example from the previous paragraph [of Dan Luu's
post] because that paragraph's description also applies to many companies.

Multiple companies have engineering practices that cause such reliability
problems and find these engineering practices to be completely and totally
normal.

~~~
momzer
The unspoken implication here is that 99% reliability is considered bad. This
may not be clear if coming from a different field where 99% sounds pretty
good.

~~~
dfc
When you say 99% is bad, are you making a normative or positive statement?

~~~
momzer
I'm saying that the author believes 99% reliability to be “Bad (TM)”. Further
evidence in the second paragraph of this post: [http://danluu.com/broken-
builds/](http://danluu.com/broken-builds/)

I didn't say that this is a normative viewpoint in engineering or whether I
personally agree with it. As you can see from other commenters, many do hold
this view. A sometimes opposing philosophy, however, is “release early,
release often” which many open source projects adhere to.

------
karlkatzke
The author detailed precisely why I left a former Y-Combinator company, Return
Path.

"As far as I can tell, what happens at these companies is that they started by
concentrating almost totally on product growth. That’s completely and totally
reasonable, because companies are worth approximately zero when they’re
founded; they don’t bother with things that protect them from losses, like
good ops practices or actually having security, because there’s nothing to
lose.

The result is a culture where people are hyper-focused on growth and ignore
risk. That culture tends to stick even after company has grown to be worth
well over a billion dollars, and the companies have something to lose. Anyone
who comes into one of these companies from Google, Amazon, or another place
with solid ops practices is shocked. Often, they try to fix things, and then
leave when they can’t make a dent."

~~~
cballard
Did you give them a ton of advance warning when you left?

[http://www.onlyonceblog.com/2013/09/how-to-quit-your-
job](http://www.onlyonceblog.com/2013/09/how-to-quit-your-job)

~~~
karlkatzke
I left it open-ended, actually. I talked to the CDO and explained why I was
leaving. He agreed with the decision and thanked me for being brave enough to
stand up and say it.

It ended up being three weeks before we agreed that I'd successfully handed
off everything.

No bad feelings either way -- I'd joined the company because they had said
that they wanted to 'grow up,' but that was a feeling percolating up from
below. The lower levels of the company wanted to grow up and stop firefighting
all the time. The top levels of the company would fight you every last way.

~~~
x0x0
Wow have I ever been there, done that at a startup you've heard of. Line level
employees hated the endless firefighting; the ceo/cto didn't give a shit and
stymied any change. My solution was to forfeit any ops duty at all. I told my
boss she had two choices: I didn't work on ops or I didn't work there at all.

~~~
kyllo
So what did your boss say to that? Guessing the latter, because it sounds like
you don't work there anymore...

------
kbenson
> It’s sort of funny that this ends up being a problem about incentives. As an
> industry, we spend a lot of time thinking about how to incentivize consumers
> into doing what we want. But then we set up incentive systems that are
> generally agreed upon as incentivizing us to do the wrong things

The longer I live, the more I realize that _everything_ is a market, and
incentives control it all. The reason you follow company policy most the time?
You're incentivized to follow the rules so you get the raise, or at lease
don't get fired. When there are competing incentives for different responses
to the same subject, that's when you need to take extra care to realign the
incentives. Trying to institute new behavior? You have to fight momentum,
familiarity, and sometimes easiness. That often requires more than a few
dictates.

~~~
calinet6
Everything is a system—and some systems are markets. A market is also a
system.

This is why it's important to know how to think in systems, and about
psychology, statistics, variation, knowledge and everything else that
influences systems—if you want to work in one.

There's more to most systems than just incentives. They are a small part of
what goes on.

~~~
harigov
Can you recommend some books on this? I have been trying to learn to think in
systems, and find that to be the most useful skill to have.

~~~
ansy
If you want something more academic I recommend Measuring and Managing
Performance in Organizations by Robert Austin [1]. There is a sample that is a
pretty good introduction as well [2].

This was the only book worth reading when I was researching metrics for our
team at work.

TL;DR: Don't use performance metrics for human beings. You almost certainly
won't get what you want, and you'll probably get nasty side effects instead.

[1] [http://www.amazon.com/Measuring-Managing-Performance-
Organiz...](http://www.amazon.com/Measuring-Managing-Performance-
Organizations-Robert/dp/0932633366) [2]
[http://ptgmedia.pearsoncmg.com/images/9780133492071/samplepa...](http://ptgmedia.pearsoncmg.com/images/9780133492071/samplepages/0133492079.pdf)

------
michaelfeathers
nor·mal - adjective 1. conforming to a standard; usual, typical, or expected.
"it's quite normal for puppies to bolt their food"

All the things that he writes about are normal - they happen. People (myself
included) with an engineering background are surprised when things don't "make
sense" or people don't do things the "right way." The trick is to get to the
point where these things are not surprising, where you see them as part of the
systems you are trying to understand and consequences of forces that aren't
mysterious, they are just part of human social dynamics. From that vantage
point you can get a better sense of what you can change to influence outcomes
and whether you can or can't in a particular context.

------
draw_down
This post seems so good and so self-evidently true that I'm surprised at the
amount of pushback it's getting here. Not sure what else to say about it.

Well, I'll say this- the "@flaky" thing is pretty mind-blowing. In my own
company I have noticed many engineers have a disturbing level of comfort with
deciding something is a "mystery". There are no mysteries in what we do. The
test fails because something is fucked up. Flappy tests are annoying, but the
right thing to do is to address the situation.

~~~
danso
Yeah...I mostly have experience only running test suites for popular libraries
and for my own software...but the existence of this library is a massive
WTF...such that had I read its official introductory post, I would have
thought it to be really good satire:

[https://www.box.com/blog/introducing-flaky-a-nose-test-
plugi...](https://www.box.com/blog/introducing-flaky-a-nose-test-plugin-for-
automatically-rerunning-flaky-tests/)

Can someone describe a real life production scenario in which this flaky
behavior is desirable? That is, preferable to these other generally accepted
practices:

\- flag the test and mark the bug as an issue and at some point, attempt to
fix it

\- delete the test, if it happens to relate to dead code or was poorly
conceived in the first place

\- use fixtures and other libraries to mock dependencies, e.g. Webmock and/or
vcr to intercept http request and respond with a pre-recorded fixture.

I understand that there are scenarios in which production most go on even when
a test fails. But to throw on another testing layer that tells you, "hey, it
kind of works, for some unknown reason", instead of just marking the test as a
failure to be investigated...what possible value or insight could outweigh the
additional noise generated? I guess one possibility is that it lets you know
that something is _truly_ fucked up...but that is not at all the tone of the
Box blog announcement:

> _When testing Sync 4, Box 's desktop sync application, we also ran into this
> issue, but we also didn't want to simply remove our flaky tests. When we
> noticed that most flaky tests would pass when rerun, we realized we could
> make doing so automatic. Flaky is a nose plugin that can rerun flaky tests
> without interrupting your test run. Using it is as easy as decorating your
> test methods with @flaky_

~~~
ghettoimp
"Can someone describe a real life production scenario in which this flaky
behavior is desirable?"

I think so? My company develops a processor that is meant to be compatible
with processors from other vendors. To try to ensure compatibility, we have a
test suite that compares our "golden model" of intended behavior against the
observed behavior of competitors' chips.

We sometimes have run into cases where a competitor's product "randomly" gets
wrong results, but when we run the instruction again, it gets the right
answer. This happens frequently enough that we've arranged the test suite to
automatically try again to see if a failure is reproducible before bothering a
human with it.

~~~
danso
Yeah I figured these kind of errors fall into the system
interoperability/integration category...but I also figured they would fall
into their own kind of test suite, one that is much more geared toward the
measurements of thresholds and probabilities. Adding a "flaky" plugin to the
standard test framework to do this kind of non-standard testing...feels like
designing the workflow in a slightly backwards way, like monkeypatching a
basic data object for very specific behavior needed in a few niche libraries.

------
jwmerrill
The paper on "normalization of deviance" that this post links to is also
really good. It's written from a medicine perspective, but its observations
and conclusions are pretty generalizable.

[http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2821100/](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2821100/)

~~~
arjie
Sidney Dekker's _The Field Guide to Understanding 'Human Error'_ has a bit
about this (which he calls drift). One thing I liked in that section is a sort
of anti-Murphy's Law: What can go wrong will usually go right, and then we'll
assume that it will go right again and again, even if we borrow more from our
safety margin.

That said, it's a trade-off, and if you're a startup, it's usually far better
to do it dirty today than perfect next year.

------
scrollaway
I went in agreeing with the headline, and the article just didn't follow
through on it (also, thank god for readability view. 2560 pixels wide lines
with no margin are not any more readable than blogs full of ads).

So it posits that there's messed up practices considered normal, then it talks
about companies that are clearly _abnormal_? Even in one of the very first
paragraphs, "the company whose culture is so odd that ...". And since when is
marking flaky tests is "completely messed up practice"?

Ok so a lot of this is about coding practices and such but... like the guy
said, those problems _sort themselves out_. Bad security gets broken into, the
companies eventually get hit and either die out or fix themselves. Etc...

There's a lot of completely messed up practices in tech. Oh lord, _especially_
as a european looking in to the SV world. Some of those messed up practices I
can't even mention on HN because people think they are so normal, I get mass
downvoted and have to engage 5 people telling me how normal this is (in fact I
might even have to engage in it by just mentioning this).

1\. The issues highlighted tend to be problems specific to people. Programmers
that don't know how to do some aspect of security properly, that's a problem
specific to those guys. You put me next to them, I'll do that bit properly but
will be clueless about a different bit. It's all fixable.

2\. The actual programming & design is the least problematic, mostly because
it's the one that's most easily changed. Trying to change culture gets you
fired. Fixing a pipeline with noticeable improvements gets you promoted. The
one bit I did agree with was that it is hard to _show_ improvement when you
prevent a fire. I also think that's fixable and I also think that's a people
problem, except at the manager level.

You want messed up practices? Look at the game dev industry and its mandated
crunched and burnouts. "Everybody's crunching for the next 6 months because we
_really_ want to see the game released on time".

~~~
ra1n85
Curious, as a European, what do you find wrong with some of the practices in
SV? Genuinely interested in hearing your perspective (perhaps if you elaborate
and keep it objective as possible you'll avoid some of the mass downvotes).

~~~
scrollaway
If you're really curious, you can take a look at my post history - if you go
far back enough you might find a few rants on it.

But it has a lot to do with culture. Adopted traits that are only there to
serve themselves. And since culture is by definition subjective, I can't
really say anything bad about it now, can I?

------
johngalt
Sometimes a new person comes in saying "WTF WTF WTF Wtf wtf..." because you
_don 't_ have the messed up practices that they expect.

~~~
kazagistar
Ah yes, writing tests, code reviews, and version control can be such a pain,
why would you put up with these productivity killers?

~~~
JimboOmega
It really depends. When I emerged out of the DoD contracting industry, it was
a total WTF that we weren't obsessing over charge codes and timekeeping. Even
if we're not billing a customer, how do we even know what we're spending our
time on if we're not documenting it with constant timesheets?

It's possible for process to serve a purpose, but still not be worth it. (Not
going to argue with your specific examples though)

------
dman
An anecdotal observation is that the worst offenders in terms of
institutionalized bad practices also have a culture of failing upwards. ie
incentive structures are setup in a way where you create your mess so fast
that you get promoted and someone else has to deal with the aftermath of what
you did. In such an environment slowing down to do the right thing invariably
means that you are setting yourself up to inherit a mess created by someone
else.

~~~
dmourati
Ooh, that's insidious and has the ring of truth to it. I've never thought
about that before.

------
pklausler
"There’s the office where I asked one day about the fact that I almost never
saw two particular people in the same room together. I was told that they had
a feud going back a decade, and that things had actually improved – for years,
they literally couldn’t be in the same room because one of the two would get
too angry and do something regrettable, but things had now cooled to the point
where the two could, occasionally, be found in the same wing of the office or
even the same room. These weren’t just random people, either. They were the
two managers of the only two teams in the office. Normal!"

I'm 99% certain that I know about whom the author refers here, having worked
at an office with somebody of the same name where a drama matching this
description took place. It was one profoundly weird situation that should
never have been allowed to fester.

~~~
Demiurge
Thanks for stoking the fire of my curiosity.

------
TazeTSchnitzel
This reminds me that I've acclimatised to one specific test in the PHP
interpreter test suite always failing on OS X. I should go and make it either
not run on OS X, or modify it so it will actually pass.

That way, when someone eventually actually breaks that function, they'll
notice.

~~~
bmn_
A nice thought, but pointless. Failing tests do not matter to the PHP release
managers, they ship it anyway.

[http://redd.it/qeq7k#c3z0vva](http://redd.it/qeq7k#c3z0vva)

~~~
TazeTSchnitzel
That's nonsense, PHP releases ship with no failing tests. That's from three
years ago, and anyway gcov.php.net is not used for CI these days (IIRC it's an
abandoned old box, its results are inaccurate), Travis is. The PHP-7.0 and
PHP-5.6 branches are currently green on Travis.

------
SCAQTony
This line perplexed and disturbed me: "...This is the same company where
someone recently explained to me how great it is that, instead of using data
to make decisions, we use political connections, and that the idea of making
decisions based on data is a myth anyway; no one does that...."

One, that is depressing and implies the company seems to have a corrupting
influence on society at large.

Two. The girl is right and it is called insider information. Another
corrupting aspect of our society.

In other words, corruption is a new viable normal and workers have no problem
with it because most workers are all desperate and happy to have a passport to
middle or upper middle class.

~~~
cjcenizal
From the context, I think the author was probably referring to office
politics, not corrupt politicians.

~~~
SCAQTony
I hope so for that is less corrupt. I can be somewhat obtuse. I am going to
write the author and ask. Thank you.

------
MarkPNeyer
so many of these problems have the same root cause:

we don't have an effective data driven reputation system. we use gameable
heuristics to track social capial.

when metrics for evaluation are flawed, people behave in ways that exploit the
flaws even if they increase the likelihood of failure.

"we are not rewarded for necessary grunt work as much as shiny advances", for
example. That's a failure of the reputation system to account for the value of
that work.

My solution to this problem is a mathematical reputation system based on the
same concept as page rank. The system is available here:

github.com/neyer/respect

I'd love your feedback.

~~~
raarts
In the past few years I have been building up good reputation at various
stores both online and offline. It bothers me I cannot use that reputation.
For example a major supermarket chain here in The Netherlands rolled out self
scanning from 2006[1]. They do random checks at the checkout, and for many
years now they know I never forget to scan something. This resulted in the
amount of random checks going down for me. That should mean something, and I
should be able to use this trust/karma elsewhere. All these companies building
data on me, and I can't use it myself.

[1]
[https://www.youtube.com/watch?v=orjo0uNZFsk](https://www.youtube.com/watch?v=orjo0uNZFsk)

~~~
yourapostasy
The challenge is getting the companies to (from their perspective) "share" the
trust information with the world, which includes their competitors.

I suspect we might need a "taxonomy of trust" so to speak, that allows the
trust data to be anonymized and aggregated into commonly-accepted meanings of
trust contexts, trust roles, trust relationships, etc. That might let these
companies to release the trust data into such a format through a blockchain
perhaps, and be able to participate in consuming the aggregated data. I'd need
someone well-versed in game theory to figure out if an advantage is conferred
to "leeches"; a company in such a scenario who only consumes the aggregated
data but never send into the blockchain what they accumulate on their own
customers. I think that's a real danger with such a scheme, but am not sure
how to strongly dissuade that behavior.

------
cptskippy
Speaking of completely messaged up...

I had to add max-width, margin, and font-size styles before I could even
attempt to read that page. For all that markup, there sure wasn't any
attention payed to readability.

~~~
kps
It's just plain HTML. That your browser doesn't display it readably is a good
example of a completely messed up practice that people have come to believe is
normal.

~~~
krick
Please don't write off somebody's lack of attention to typography as a virtue
and clever design. It's not. Browsers display it how they are told to display
it. Maybe your system could use better fonts or line spacing by default, it's
arguable, but it definitely would be stupid and unreasonable for browser to
enforce less than maximal width for _some_ of your div's if not told
otherwise. If anything, they already enforce more than they should (that's why
normalize.css exists).

And, by the way, it's not like there's no css at all in the source. It's just
UX-ignorant, so to say.

------
gherkin0
> This is a problem even when cultures discourage meanness and encourage
> feedback: cultures of niceness seem to have as many issues around speaking
> up as cultures of meanness, if not more. In some places, people are afraid
> to speak up because they’ll get attacked by someone mean. In others, they’re
> afraid because they’ll be branded as mean. It’s a hard problem.

That's a really good insight, and it's something to keep in mind with all the
recent controversy about development cultures.

------
teddyh
_Getting Things Done When You’re Only a Grunt_ , Joel Spolsky, 2001:

[http://www.joelonsoftware.com/articles/fog0000000332.html](http://www.joelonsoftware.com/articles/fog0000000332.html)

~~~
aelaguiz
This is a much better version of the original article. Here's solutions to
situations that arise when you have access to information that you feel should
be acted upon that doesn't have broad organizational support already.

------
turnip1979
Good article. I could think of reasons why people kept leaving in their first
year at the company where they had freedom. The author says the company had
the beat parts of Netflix and valve. What is the author referring to?

------
flatline
> well, we have some tweaks that didn’t make it into the paper.

Every single time I've tried to implement a newish, reasonably complicated
algorithm from a paper and contacted the authors when I've run into trouble,
this is the reply I've gotten. How is it not normal? It's research after all,
and if you've worked in research you should have a good idea how the paper
mill works.

~~~
vidarh
I think your response here is a perfect example of what the article rants
about.

It may seem normal to you and people with a "good idea how the paper mill
works", but it is absolutely insane from the point of view of a lot (hopefully
most...) people outside of that bubble, who would likely mostly expect the
results in a paper to at least be possible to replicate with the information
in the paper.

------
mwcampbell
> There’s the company with a reputation for having great engineering practices
> that had 2 9s of reliability last time I checked, for reasons that are
> entirely predictable from their engineering practices. This is the second
> thing in a row that’s basically anonymous because multiple companies find it
> to be normal. Multiple companies find practices that lead to 2 9s of
> reliability to be completely and totally normal.

I'd like to know more about these practices that lead to 2 9s of reliability.
Can you give specific examples of such practices, albeit not the companies
themselves?

~~~
devonkim
Is the implication that 2 9s are _really_ bad, mediocre, or some other
negative judgment in this context? I'm familiar with a lot of places that have
trouble with even 1 9 of reliability with lots of capital but terrible culture
despite employing lots of operational controls and some best practices that
have trouble with _one_ 9, so even 2 9s sounds like a dream sometimes and the
way that Dan Luu worded this article makes it hard for me to understand the
different cultural failure modes that he's trying to express.

~~~
koko775
How does a place survive being down 37 days per year (one nine)? Cripes!

~~~
devonkim
SLA that defines unplanned maintenance very, very conservatively. For example,
a place I was at did maintenance for about 8 hours weekly where users wouldn't
be able to manage any of their stuff, but there was network access outbound
possible from their resources, so it was considered "available." Somehow with
hardly anyone actually doing anything with the service, it was a major part of
an $800M+ acquisition and mostly for being "enterprise" ready. I will not
understand how you can get away with things that demonstrably fail to work
beyond the most trivial of canned demo cases and be sold like it's completely
done to even more gullible companies with zero technical vetting.

There's a lot of completely inconcistent definitions about what a service is
sold as and what is actually delivered, and lawyers half the time only care
about regulatory requirements rather than functional ones. Someone will use
the ITIL definitions of service and say "it's available!" meaning that it
exists in a CMDB or something and another person defines availability as "I
can ping it" and doesn't care if there's an HTTP 500 error being thrown
repeatedly. But gosh, if something used an insecure MongoDB server _that_ is
the reason they immediately cancel a contract (not hyperbole, saw something
very similar happen).

------
ThomPete
Somehow tangential

"These hidden problems are the true gold standard of entrepreneurism and it’s
amazing how little discussion there actually is about how to find them. It’s
hardly a surprise though since they; as we can see, can be hard to find and I
think there are a couple of reasons why.

Hidden problems aren’t obvious even to those who experience them every single
day.

Most people have enough human problems. They are often hired to do a specific
job and don’t necessarily think about these problems as something that could
be solved. Many just see them as part of the actual process. So to even
understand they are problems, require a certain kind of attention, most people
simply don’t have. (I have later learned that this is called functional
fixedness and is a cognitive bias. Which explain why people sometimes say “Why
didn’t anyone think of this before?” — most people simply don’t think like
that.)

Hidden problems often only reveal themselves over time.

Not all problems are even instantly recognizable. Instead they only reveal
themselves over time or through years of experience. This also means that many
of these problems require a certain age and experience to even notice let
alone understand. Perhaps this is one of the reasons why the average age of a
founder is 38 and with 16 years of working experience behind him."

[http://000fff.org/the-problem-with-problems](http://000fff.org/the-problem-
with-problems)

------
ianamartin
I want to speak to one aspect of the wonderful article:

Listening to weak signals.

I'm about to do a shameless promotion for a book I have nothing to do with,
but a book that has been a guiding light for me: Michael Lopp's _Managing
Humans_

When I was reading the article I couldn't help but think of Lopp's advice
about regular one-on-one meetings with each of the people on your team.

I think this is one of the points that Lopp intends for managers to be
listening to during those meetings.

They aren't so much for feedback from the manager (as they are often treated),
but more as opportunities for the manager to listen.

If I understand the book correctly, those one-on-one meetings are exactly the
place where the managers are supposed to be listening for the "weak signals."

I am not an expert in every area of development, and yet I have somehow been
inserted into a management role.

As Lopp explains very clearly, this happens often, and the single biggest
thing you can do when that happens is care about being a manager. It's a
different skill set than being an IC.

Recognize that, but don't get totally caught up in that. I don't think Lopp
would disagree with anything in this article. I think, in fact, that following
Lopp's ideas would lead to far fewer cases of WTF than what we see in the
wild.

------
insanity55
I read the title and immediately thought of circumcision. The article was not
about that, and I enjoyed the insight. Same concept though. Normalization of
deviance.

------
calinet6
The funny part about the recent obsession with "normalization of deviance" is
that it's just one of hundreds of psychological biases that people exhibit and
that directly impact work.

This is why it's important to learn about psychology if you intend to work
with humans—or even just with yourself.

------
PaulHoule
Way back in WWII soldiers coined the acronym (SNAFU), "Situation Normal All
Fucked Up".

------
protonfish
I think there is another thing to take away from some of the case studies - if
you design and implement operating procedures and alarms, do so in a way that
is simple, effective, and does not draw an undue amount of time and attention
to itself. I have dealt with too many systems that sound the "everything's OK
alarm" constantly and procedures that have good intentions but no effort to
streamline the gratuitous amount of time and effort needed to be followed.

It is not constructive to blame employees for failure to heed poor alerts and
protocols.

------
acconsta
> And I can think of more than one well-regarded unicorn where _everyone still
> has access to basically everything_ , even after their first or second bad
> security breach.

Which companies? That's pretty scary.

~~~
bpchaps
I used to work for a major financial exchange like this. When I joined, the
root password was known by /everyone/. They also used telnet instead of ssh.

Another company I worked for used rot13 for their back end risk management
system's password storage. Found it completely by accident when trying to add
the platform I was supporting at the time. I had a setting to the effect of
'resolve data from defined functions' enabled, so every password stored would
be resolved to plaintext instead of showing their 'hashes'. It was batshit
scary - scariest being the production r/w credentials for the credit card and
mortgage databases.

When I reported that one to the devs, they responded with, "We know. We needed
to push the code out as quickly as possible, so we got lazy". Fuck. That.

------
pbreit
Key point: "The simplest option is to just do the right thing yourself and
ignore what’s going on around you."

This is so, so, so easy to do and generally has no negative repurcussions.

------
mgrennan
Then There is the story of the Emperor's new clothes.

[https://en.wikipedia.org/wiki/The_Emperor%27s_New_Clothes](https://en.wikipedia.org/wiki/The_Emperor%27s_New_Clothes)

Sometime older works or those in exit interview, just don't GAF and call
"group think" what it is.

You want to fix these problems. Don't hose your monkeys and hire some old deep
thinkers and empower them.

------
auganov
The most unsettling thing is how much I love reading anecdotes like that -
they make you feel so much better about "messed up" things you do yourself.

------
cubano
My personal hottake is that, as Einstein proved, everything really is
relative.

What may seem dysfunctional and WTF from your vantage point may be perfectly
logical from someone elses.

If you are going thru your work-life with the idea that social relationships
and business practices are always going to follow the same strict rules as
your software or hardware does, well good luck with that.

Normal is as normal does.

~~~
satai
Einstein proved nothing like this.

~~~
cubano
Einstein proved that everything is truly relative, and depending on your
particular velocity and place in the universe, fundamental things you
perceive, and that others perceive about you, are totally and completely
different.

Perhaps I was being a bit too abstract, but I stand by my assertion.

~~~
satai
You brought completely irrelevant point.

~~~
cubano
No I didn't...it's only irrelevant to you because you are not trying to
understand what I'm saying.

~~~
kyllo
If you just dropped the Einstein part, it would be a perfectly fine comment.

~~~
typon
No it wouldn't. The statement "everything is relative" is in itself a
contradiction.

------
umanwizard
I can't read this because my ISP's (I assume) scheme of MITMing all http
traffic is buggy, and now I can only load things over https.

Start using encryption, people. There's no reason not to.

~~~
GFischer
As someone trying to get a free SSL cert from LetsEncrypt right now, I'd say
that it still has some way to go before it becomes frictionless :) (getting
there quite fast though)

Otherwise, the reason becomes $$$ .

~~~
icebraining
Well, nowadays it's just $, as you can get one for $9/y ($5/y if you buy 3
years). For one subdomain, mind you.

