

The other half of "Artists Ship" - tyn
http://www.paulgraham.com/artistsship.html

======
allyt
I think this can be generalized to say that start-ups have a different cost-
benefit analysis - one where losses are capped at the (relatively small) value
of the company. On the other hand, large companies _have_ to be risk-averse
because the worst-case is several magnitudes worse. When you're working at
ConEd or AIG, the "tiny probability/worst-case-loss" factors start to matter,
because "worst-case" can include investors losing life savings, federal
investigations, and jail time for your boss's boss.

~~~
lutorm
Funny that some of those large companies you mention didn't seem to get the
memo about being risk-averse... ;-)

~~~
fallentimes
During their executive retreats they decided to be risk seeking instead.

I wonder if the problem at the recently troubled companies (AIG, Merrill,
Lehman etc) was not enough checks in place or not enough power allotted to the
employees who knew something was wrong.

~~~
andr
From memory: the Excel bug that caused Moody's to give high ratings of
mortgage securities (one of the probable causes of the whole mess) was known,
but management decided not to fix it. So, greed can override checks and
balances.

~~~
nradov
The bug wasn't in Excel. It was Moody's own fault, not Microsoft's.

~~~
andr
Sorry, I didn't express myself clearly. I meant a bug in an Excel spreadsheet
made by Moody's.

------
jaydub
This past summer I worked as an intern for a big company. On the first day of
orientation the head of the legal compliance department said very flatly to us
that any individual could do far more harm to the firm than good.

This mentality, reenforced by the company's bureaucratic change management
system, really did not sit well with me.

Fortunately for my summer experience, my "buddy"/summer mentor and I found a
loophole which we used to "hack" the change management system so that once we
got our initial approval, we could propagate changes to prod without running
through the whole process again for each change (which otherwise would have
been required).

~~~
byrneseyeview
For large companies with a valuable reputation, that's almost guaranteed to be
true. Most people won't generate a million dollars worth of value in a given
year, but nearly everyone could do that amount of damage to their company's
reputation in just a few minutes.

~~~
pg
Sure, yes, if e.g. the customers' health or safety was at risk. But I don't
think the average bug in a web app would damage a company's reputation
significantly. GMail occasionally shows me a message saying "Oops, an error
occurred." It doesn't make me think any less of Google.

~~~
nostrademons
You aren't a typical non-early-adopter user. I've had people make loud, public
complaints about such trivial matters as a new logo being too tall, or a text
box being too wide, or an _optional_ WYSIWYG editor feature ruining the "text-
only flavor of the community". These on websites far smaller than GMail.

Look at some of the major Web2.0 kerfuffles in recent years. Off the top of my
head, I can think of:

\- The HD/DVD mutiny on Digg

\- Public suicides on both JoelOnSoftware and Justin.TV

\- Reddit storing passwords in cleartext and them getting stolen off a laptop

\- Ariel Waldman and the Twitter harassment fiasco

\- The Flickr censorship debate here:
<http://www.flickr.com/help/forum/40074/page3/>

Yeah, you could argue that those services are all still around, so obviously
they haven't been hurt too much. How much management time was wasted in
dealing with them? How many non-users decided not to become users because of
something they heard 3rd-hand about how it's a terrible company?

(And I don't think the solution is to never innovate. I can understand why a
middle manager at a big company would think so, though - a PR disaster is by
definition public and disastrous. In the short run, it always makes more sense
to not mess with success, it's just that this doesn't lead to success in the
long run.)

~~~
pg
None of these were the kind of bug that comes from releasing code with
insufficient testing. The only one that even involved code was the Reddit
password problem, and that was more a design mistake than a bug.

~~~
neilk
A culture of testing is a lot more than just verifying functionality. Code
reviews with experienced engineers would have caught that design flaw before
it was released. Or made it a high priority to fix.

I've dealt with the security teams at large companies. They end up
implementing sometimes draconian policies, but it seems that programmers
refuse to learn to write secure code any other way. I have to admit that, if
you consider the large company as being a legitimate entity, they're doing
legitimate work.

------
neilk
Some things that PG didn't make explicit: web applications can have an
extraordinarily tight feedback loop between customer and coder. Multiple
revisions per day are easy, especially if any single user's data is rather
low-value. Many a startup of this kind has zoomed past its corporate
competitors simply by iterating faster. If your product is embedded firmware
for a home security device, this strategy is just not available.

But here's another twist: when you start working for a giant company that
provides myriad services under the same login, suddenly everyone has to move
as slowly as the slowest part of the business. It doesn't matter if you are
just doing Yippee! Backgammon, because who knows -- maybe you could somehow
accidentally expose the data at Yippee! Payment Solutions.

This is a completely rational consequence of being a giant company and
delivering multiple web services under the same login. It is possible to have
slightly different authentication policies for each service. But I wonder if
OpenID providers have thought about this enough.

~~~
gaius
_Some things that PG didn't make explicit: web applications can have an
extraordinarily tight feedback loop between customer and coder._

He has made that point in other essays.

------
Haskell
_"...not only wouldn't these guys have broken anything, they'd have gotten a
lot more done."_

I don't think the claim that they would not have broken anything can be backed
by facts.

If you look at Microsoft, for instance, their programmers used to write very
buggy code, but with the introduction of better processes, like their Secure
Development Lifecycle, they saw a substantial improvement in code quality (as
measured through security and reliability metrics).

The damage done by not following a quality assurance process and writing buggy
code was so big that Microsoft's image will be affected for a long time, even
though they have now improved the code quality.

~~~
pg
I meant these particular guys.

~~~
Haskell
My point still is valid.

Even these particular guys certainly would have written buggy code. A QA
process could correct it.

~~~
Retric
I worked without testing for 2 years and released a single bug into
production. In that specific case I thought something might have been wrong
but they needed it ASAP so I said who cares. My father once had a tester say
"Ops I stopped testing your code over a year ago" when he released a fix to
some buggy code someone else wrote which still had a few bugs. After adding a
formal QA process I stopped double checking how buggy my code so I could get
more done but I have ended up releasing more bugs.

So I have little respect for formal QA.

------
sah
I think the costs of safety checks can be more pronounced in software than in
other areas of business. It's not just that good hackers are particularly
irritated by them -- they're irritated for a reason.

A two-week release-process lag is so much worse than it might sound. If you
can ship instantly, you can get feedback and make improvements rapidly, over
and over. Sure, without a careful release process you might occasionally break
things, but when you do you can fix them in minutes rather than weeks. When
you insert release-process lag into that cycle, improvements that might have
been made in a rapid series of releases over a day or two can take months.
Sometimes those improvements just never get made, because no one is motivated
to keep working on them for so long.

In software, "stable" often means "unchanging", but only rarely means "high-
quality".

------
prakash
Some thoughts:

As companies grow big, they like to _emulate_ larger companies, and hence the
people they like to hire are from larger companies since they _understand
scale_ \-- this comes with a lot of baggage ofcourse. This in turn, most
times, brings with it a culture of people who are worried more about not
screwing up, covering their own ass, and thinking about getting promoted
rather than doing what's right for the company or the business and screwing up
as a byproduct of making quick decisions.

 _Re Joel Spolsky:_ The cost of the sale is not only the cost of the product,
but what an organization would pay for a software/service, and what they would
value it at. This is why most companies have a sales force, and don't
advertise their prices on their website. As Joel mentions, for a lot of
companies, charging less than $50k is a rounding error and not worth their
time.

The flip side of people at big companies buying is based not entirely on the
_performance_ of the service or the product. As long as the service/product is
reliable & ok, and it doesn't screw up in any major ways, they won't get fired
for making that buying decision, rather than looking at ways of
maximizing/optimizing the _performance_ of the product -- yet another reason
committees make buying decisions.

 _Re SOX:_ I like what founder's fund + facebook is doing -- letting early
employees cash some of their equity out, thereby increasing the time early
employees will stay with the company.

I also like Fred Wilson's thoughts on a secondary market for startup stock
(similar to what goog does), this would in some sense, I guess let you be a
private company, and not deal with the challenges associated with SOX
compliance and at the same time not raise a highly dilutive Series D, in
addition there is some sort of liquidation event for the investors.

PS: _(std. Buchheit comments about Limited Life Experiences +
Overgeneralization = Advice apply)_

PPS: Say hi to the reddit guys for me ;-)

------
walterk
There's an excellent prototyping policy in effect at Maxis:

 _In terms of time, they have a policy of permission vs. forgiveness. You need
to be prepared to fail early – but that’s okay! If a prototype takes less than
two days, don’t worry: just go ahead and do it. It if takes more... you should
probably have permission._

<http://www.gamasutra.com/php-bin/news_index.php?story=11628>

------
jwheare
We used to do releases whenever we wanted. After this took the site out one
too many times, we moved to at most daily releases with an hour of QA.

This actually made things worse. Aggressive releases were often the result of
a decision further up the chain that something was "critical" and needed to go
out right then and there. We'd got into the pattern of expecting to be able to
do this at the drop of a hat.

So now, everyone was rushing to get the "urgent fix" into the daily release.
This is an effective way to ship buggy software. Extra emergency releases were
frowned upon, and so it introduced a level of panic that "Oh shit, this better
not screw everything up". Ironically, with anyone releasing whenever they
wanted, there was less panic because you could always make a quick fix and put
it out without too many people noticing or caring.

We felt as a team that we'd prefer a more relaxed schedule that also allowed
testing and accountability.

Frequent releases were a result of bad management, and unrealistic stakeholder
expectations, we as developers didn't necessarily care that our code wasn't
being put out immediately. Obviously, it needs to go out at some point, and
we've been equally demoralised by mammoth projects that have gone on for
months without a release, but that's a different end of the spectrum. The
important thing is _regular_ releases, not frequent ones.

We've now decided to move to releases every 2 weeks with a 1 week QA period.
We made this decision as a team of developers, it wasn't handed down from
above, it just became a necessary check for the scale of our company.

I personally feel a lot more productive as a result, less panicked by the
quick fix you need to drop everything to fit into the daily release, and less
worried that a release will take out the site.

But sure, it comes at a cost. There's now a lot more going out with each
release and more variables that could combine to cause issues, but these
issues highlight deficiencies in our QA process that can be fixed. Also,
getting our team to work according to this 2 week schedule has meant some
significant costs to adopt better development processes, namely SCRUM, but I
feel our team has benefited enormously from it. It all depends how well your
team can adapt really.

So yes, there are costs associated, but it doesn't necessarily lead to
demotivated developers. You need to be sure that the checks are introduced
properly with their own test suite. This itself has a cost but can be
extremely worthwhile. A half arsed and arbitrary change to your release
schedule with no process changes can destroy you, but with a little training,
the right team and the right managers, you can make things better for
everyone.

------
tdavis
Having never had a "normal" job per se (okay, I bagged groceries for a year
when I was younger), I still find it hard to believe that it can take two
weeks just to get something deployed. Maybe one day I'll finally _truly_
believe all the stories I've heard -- two-week deployments, meetings about
future meetings, etc. For now, some part of me still believes it's simply
implausible and everyone is engaging in hyperbole...

~~~
trapper
It's not abnormal. In rails I have deployed to a customer in 12 hours,
starting from scratch. I was using my own framework though. In gwt, I have
done this in ~24 hours. These weren't simple apps either, not just
scaffolding, but had complex user interfaces.

~~~
tdavis
I wasn't referring to task difficulty, just the extreme time delay between
when something is ready and when it is put into production.

~~~
trapper
I was referring to development & production deployments in both examples.

------
gcheong
I would add that in addition to having to show the cost of any new checks that
one should have to show how the check will actually prevent the problem it
proposes to prevent and the likely-hood of that problem occurring or re-
curring. A major problem I've seen with checks is that they just become CYA
material rather than anything truly beneficial or preventative and people will
work around them.

------
alextp
One problem is that the cost of checks is only visible in the aggregate. The
marginal cost of each new check seems to be pretty low.

I wonder if there's a way to limit the total cost without hard-and-fast silly
rules.

~~~
jbert
Is "one in, one out" a silly rule?

e.g. you can only add a new step to a process if a step is removed from
another.

~~~
mechanical_fish
_Is "one in, one out" a silly rule?_

Yes. ;)

When do you institute the rule -- on day one, in which case it is functionally
equivalent to "do not have any checks"? On day 13? Day 7386?

What is the quantum of a "check"? Is Sarbanes-Oxley one "check" or a
collection of hundreds of "checks"?

Does the limit apply department-by-department or company-wide? Your sysadmins
will naturally tend to employ a lot more formal checks (most of which are
hopefully enforced by tiny Perl scripts) than your R&D prototyping team. Do
you force the sysadmins to compete with the accounting department for a quota
of checks? Do you incentivize sysadmins to evade checks by getting tasks
reassigned to the R&D prototyping team, which (alas) can't afford to have any
checks?

The last thing you want to do is encourage teams of lawyerly meta-checkers to
run around enforcing rules about rules. That's costly, _squared_.

~~~
jbert
"When do you institute the rule?"

When you realise you have a (potential) problem. It stops it getting worse.

"What is the quantum of a 'check'?"

More tricky, certainly. That's a potential big problem.

"Does the limit apply department-by-department or company-wide?"

Whatever makes sense.

The idea of the exercise would be basically to:

\- build an appreciation throught the organisation that checks have a cost

\- provide a (albeit imperfect) mechanism for exercising some control over it

Something doesn't have to be perfect to be of use.

"The last thing you want to do is encourage teams of lawyerly meta-checkers to
run around enforcing rules about rules. That's costly, squared."

That's just ISO9001, isn't it? :-)

------
snowbird122
This is what I call "playing defense" instead of "playing offense". By nature,
the people at large companies are more concerned with risk aversion than
innovation. This is also how managers like to "hold power" over the real
innovators by controlling the change management process. What makes a manager
more qualified to deploy to production than the programmer?

~~~
jodrellblank
Their higher up position gives them [the potential for] a better view.

------
dpatru
To "make something people want", you need to know your customer's value
system: what's important to him and what's not. This can be hard for startups,
especially ycombinator types which may be very low budget, because their own
value system is so different. Startups start with a company worth about $1 and
want to turn it into a company worth about $1,000,000. Therefore startups
value the ability to change rapidly -- that's why they want good programmers.

Big companies on the other hand, start with a company worth $100,000,000, and
want to turn it into a company worth $150,000,000. They value steady process
improvement that doesn't risk their existing revenue -- that's why they want
good managers.

It's important to know this if you're trying to sell to a big company as well
as if you're trying to compete with a big company.

------
gregggreg
_In fact, the acquirer would have been better off; not only wouldn't these
guys have broken anything, they'd have gotten a lot more done._

That's where you lost me. Sure, there are companies who add checks without
thinking of the consequences, but as a company goes from a startup that is not
making money to a more mature company that is making money, the cost of
breaking things becomes very, very high. That is why most post-startup
companies add more checks, to protect their revenue stream. And any company
making decent money has a revenue stream that far outweighs the amount of
money that the programmers would be willing to pay to have a faster release
cycle. And the risk of the company losing their revenue stream for any
significant amount of time far outweighs the risk in losing decent programmers
because they aren't happy with the length of the release cycle.

You seem to say that these particular programmers would never have broken
anything, but I find that impossible to believe unless they are working on
something that is not at all complex or is inconsequential to the revenue
stream. Every programmer, even the best programmer in the world writes code
with bugs in it. The more complex a system is, the more likely there are to be
bugs in it and as a company grows and makes money, the more complex their
systems will become.

As an example, let's say that you were responsible for a web application that
brought in a million dollars of revenue a day and you had these supposedly
perfect programmers that never made mistakes who were responsible for the
code. Would you let them just throw out whatever code they wanted onto the
servers because you trusted them to not make mistakes? Or would you be more
reasonable and put some checks in place first to make sure that the
application worked correctly?

Obviously there needs to be a sane balance between the risk of breaking the
application and the opportunity cost of not updating the application and the
frustration of slowing down development is one of the costs that should be
weighed, but to say that there shouldn't be any checks added as a company
matures is almost laughable.

------
cwp
_Whenever someone in an organization proposes to add a new check, they should
have to explain not just the benefit but the cost. No matter how bad a job
they did of analyzing it, this meta-check would at least remind everyone there
had to be a cost, and send them looking for it._

To me that's the most important paragraph here. That takes you from "let's
make sure this never happens again" to a cost-benefit analysis. The issue is
not whether or not to have checks, as much of the discussion here assumes.
It's about realizing that not all checks are equivalent, and using that
knowledge to get the greatest safety at the lowest cost.

------
bootload
_"... Steve Jobs's famous maxim "artists ship" works both ways. Artists aren't
merely capable of shipping. They insist on it. So if you don't let people
ship, you won't have any artists ..."_

Steve Yegge has had the unenviable luck of working on 3 products all of which
have not shipped (cited as business reasons) yet he still works for Google ~
<http://blog.stackoverflow.com/2008/10/podcast-25/> So it seems there is
something else going on that keeps programmers on board. Is it money, having
the freedom to talking about it the process? I don't know.

~~~
tdavis
I spent 3 months at the end of recent deployment furiously developing an
(arguably) game-changing app for the Army's secret intranet. I knew in advance
it wouldn't actually get deployed, despite everyone up to the Task Force
Commander loving it and touting its importance. The day came when it was
"done" (of course I had become attached and there were a million things I
wanted to add) and ready to be deployed... then the red tape tied it up until
it was time for us to go. As is generally the case, our replacements weren't
planning to do things our way (the country has gotten much worse since we
left, imagine that) so the project was basically sunk.

I worked on it because I love programming and trying new things. Everything I
was doing at the time was new to me so it was fun, engaging, and a learning
experience. Plus, it beat the hell out of anything else I could have been
doing in that wasteland ;)

~~~
bootload
_"... I worked on it because I love programming and trying new things.
Everything I was doing at the time was new to me so it was fun, engaging, and
a learning experience. Plus, it beat the hell out of anything else I could
have been doing in that wasteland ;) ..."_

Good explanation.

I put it as a question as I'm wasn't conclusively sure why people would sign
on to assignments that have a probability of failure. But I've since found
that smart organisations allow pursuit of training and learning as a reason to
keep staff and motivation. I figure the USF know a thing or 2 about
motivation. Hope your putting the lessons you learned into some hack your
working on :)

------
jpclauss
Paul,

Surely if the costs of checks could be "discontinuous," "non-linear," "step-
change," or however else you would like to term it, then perhaps the potential
cost to the company of NOT having the check in place could also be the same.
Larger companies potentially have more to lose by having recurring lapses in
supply, or in the ongoing case of your essay, software service. Of course,
software/coding may also be a special case. Additionally, it's true that your
"committee" example represents a discontinuous change in bureaucracy for that
firm, which would make it much more reasonable to expect an equivalent affect
on cost.

------
garyr55
There is a related concern I have about costs piling up due to corporate or
managerial mandates. I have the impression that people above us only think
about what will make their jobs easier -- what else do they need for us to
supply them with so that they will be able to achieve predictability, manage
or reduce costs, and generally manage the company better. When they come at it
from this angle, it always seems reasonable that they ask us to do more --
provide more reports, track time against projects, report metrics -- when in
fact most of what they ask actually interferes with us being able to get real
work done. Since they/we do not have adequate productivity measurements in
place already, the resulting reduction in productivity is never noticed, hence
an invisible cost has been added without a conscious decision or cost/benefit
analysis. Since I think most individual contributors recognize that this is
taking place, it acts as a disincentive.

In the end I think the solution to both the article's and my concerns are the
same: It is always better to ask the members of the team that experienced the
problem firsthand what they think the solution would be or what they would do
differently next time to avoid the problem. Especially if you phrase it as
"what could you or your team have done differently to improve the outcome" and
urge them to avoid proposing major process initiatives if at all possible, the
improvement will be more practical and lower cost in all senses.

------
rokhayakebe
_Programmers are unlike many types of workers in that the best ones actually
prefer to work hard. This doesn't seem to be the case in most types of work.
When I worked in fast food, we didn't prefer the busy times. And when I used
to mow lawns, I definitely didn't prefer it when the grass was long after a
week of rain._

 _Programmers, though, like it better when they write more code. Or more
precisely, when they release more code. Programmers like to make a difference.
Good ones, anyway._

Salesman like it best when they can sell more. When I used to sell used cars,
I loved it best when I could keep pushing cars out of the lot.

Writers like it best when they can write more. They like it best when their
imagination can produce thousands of stories.

The difference is that coders usually know what they will tackle and have a
strategy so it is easier for them to get to the end point. Also they are
almost sure they will hit the end point of a certain problem.

With other professions you cannot predict much. Your next customer is not
guaranteed, your next story just does not want to show up in your brain
etc....

That being said, Yes, a lot of programmers I know work hard.

~~~
pg
Writer are like hackers, yes. But the reason most salesmen like to work hard
is that they're paid on commission. That's very different.

~~~
tomsaffell
I suspect that what is really going on here is that _workers who love their
job actually prefer to work hard_ AND _good programmers love their job_. So it
is probably true that _good programmers actually prefer to work hard_ , but
that is not the whole truth...

Also, since _most workers don't love their job_ , it is probably also true
that _good programmers are unlike many workers_. To say that _programmers are
unlike many TYPES OF workers_ we'd have to believe that a higher proportion of
programmers love their job than do workers in other types of work. From what I
have seen on HN (albeit sample biased) I'd say that is true.

~~~
randallsquared
Yes. I've worked in fast food, and as a cashier at a convenience store, and in
both cases, those who were better at the job preferred busier times ("It makes
the day fly by, doncha think?").

~~~
shammah
Summary:

People like to make a difference. Good ones, anyway.

------
mafamba
I found this particular paragraph to be not right somehow:

Programmers are unlike many types of workers in that the best ones actually
prefer to work hard. This doesn't seem to be the case in most types of work.
When I worked in fast food, we didn't prefer the busy times.

Who are the people who are "best at" fast food? I think perhaps you mean that
people who build things get enjoyment out of building -- perhaps more
precisely, of seeing their built item in action. I think programmers in
particular are used to having short feedback loops between building and trying
out, and that pattern of building up and testing as you go until a larger
problem is solved is a strong motivation.

I guess I'm claiming that harder-working isn't really the issue. The issue
seems to be who is having more fun. And in that sense, your claims still make
sense. Part of the fun of building is a short feedback cycle. More checks in
the process make the feedback cycle longer, therefore less fun. That explains
why teachers' like their jobs. You can see it in someone's face when they
learn something new.

------
jfjfjf
I disagree with the claim that harm to the U.S. IPO market was "not the
intention" of the "people who wrote" Sarbanes-Oxley.

I have two pieces of evidence.

First, it was well-known and easy to see that Sarbanes-Oxley would be
extremely expensive. It was well-known when passed that its costs fall
disproportionately on small public companies. Because IPOs by definition are
small public companies, the costs fall disproportionately on them. Thus, it
must have been known when enacted that the bill would harm IPOs.

Second, suppose in fact it is the case, as Graham seems to advocate, that
Congress "inadvertently" harmed IPOs. His argument is that Congress just
accidently happened to overlook the harm of the bill to IPOs. In that case,
when the harm to IPOs became factually clear, the bill would have been changed
or amended. The fact that the law was not amended even after the harm to IPOs
became clear proves that it was Congress' intention all along to harm small
public companies (the companies most dangerous to the large corporations who
have the most lobbying pull).

------
jcsimpson2
Very nice - I think another reason why companies overpay - is because they
don't negotiate, they don't check out the hot companies, they allow their
employees to make decisions and most have never hired a company - so they go
with the best name so they can say they hired them - it looks good on their
resume - companies don't realize that most executives - director level and
above are doing nothing more than building their resume - and their projects
go unchecked, unaudited and bascially the company gets screwed - and frankly
it is the company's own fault. Once a company loses it's founding partners,
the transition team runs wild with all the changes they want to make and not
looking for those things that made the company successful - spending is at all
time high, brand suffers because the new team is too busy spending that they
lose sight of what is important - the customer - gosh does that sound like
AIG?

------
massung
Another interesting read. I'd like to note that (while it wasn't the focus of
the essay), many of the expensive "checks" aren't there for monetary reasons,
but rather human safety and the dire consequences of a poor decision.

Working at Motorola on cell phone code could be a great way to give everyone
in the modern world a cool new feature - or accidentally break 20+% of all
phones out there. A terrible Mickey Mouse movie could destroy a brand. Take it
a step further to NASA, vehicle engine design (and safety), oil pipelining and
refining, etc. and the cost could be human lives.

That said, I'm very interested in the cost of these "checks" on those
industries. Not monetary costs (although important), but rather the cost in
stifled future innovation.

Thoughts?

------
t0pj
I guess so many _checks_ in a big company could be considered the biggest
_mistake_ of all?

~~~
unalone
Well, as companies get larger they can afford less and less to make mistakes.
If I'm running a start-up company, I can get away with changing things
abruptly, because I have a smaller base of users and I have fewer people
relying on my running at production-level. Also, chances are I'll be able to
look at user feedback effectively.

Once you get large, and your company is providing sustenance for all your
employees, people rely on your product, and you've got too many users for
effective feedback-checking, you _have_ to close up, take fewer risks. Because
suddenly, people want you to move slowly. They don't want you constantly
skyrocketing ahead with their playing backup. Look at any big company - even
Google, which was once famous for moving quickly - and you'll see that part of
what gives a big company a good reputation is its being "solid." They have to
give things up for an advantage.

It's why newspapers are so relied-upon. Of course, now it's what is hurting
newspapers the most. They're being beaten by the flexible Internet. But even
there, we're seeing a trade-off. Look at the quality of stories by the top
writers online and by the top NY Times writers, and the online writers are
much more amateur. They're faster, occasionally they're more interesting, but
the Internet is thus far not retaining a high level of professionalism among
reporting. Similarly, start-ups are much less reliable on the whole than large
companies - look at Twitter and its problems, for instance.

So I think that PG's article is right. You can't restrict people and expect
them to do as well. However, too much freedom leads to less stability, so it
becomes a trade-off. Everything in moderation.

~~~
dcurtis
This thinking has always kind of confused me. Why are customers/users at a big
company more important than those at a startup? Just because there's more of
them, now you can't make mistakes?

If you have the agility to make rapid production changes, you also have the
ability to rapidly rollback. So the argument that larger companies require
more checks and testing than startups isn't really valid, especially when you
consider the costs.

~~~
mechanical_fish
_If you have the agility to make rapid production changes, you also have the
ability to rapidly rollback._

This is just not true. Rollbacks are always more expensive than changes,
because you can't rewind time to undo the consequences of having your software
be broken for minutes, hours, or days. Worse, in the absence of "checks", the
cost of making a production change tends to be roughly constant as the company
grows -- it takes the Amazon sysadmin no more time to type "make deploy" than
it does me -- but the cost of a rollback scales directly with the size of your
company's customer base.

Within a few seconds after Amazon.com breaks S3, thousands of companies begin
to lose money, and they lose money second by second until the rollback
happens. Even if Amazon is only down for a minute, that's one minute of
downtime multiplied by its number of customers. The larger the customer base,
the larger the stakes.

And, unfortunately, the cost of downtime is nonlinear. If Amazon goes down for
a mere two minutes, hundreds of peacefully sleeping system administrators will
get emergency pages from their uptime-monitoring systems. They will get out of
bed. They will check their logs and their failover mechanisms. They will lose
a lot of sleep, and soak up a bunch of overtime pay, and a lot of their good
will towards Amazon will dissipate like the morning dew. Once you lose your
reputation for quality it takes a lot of work to get it back.

This is why larger companies have more controls. The controls are in place to
try and pass the ever-increasing cost of a rollback back to the team that
causes the rollbacks. The reason it seems so gosh-darned expensive to add a
trivial feature to your flagship app is that it _is_ expensive: If the average
rollback costs $1m in revenue and every new feature is only 95% reliable,
every new feature costs the company $50k to deploy.

The secret here is: If you want to deploy changes rapidly, don't work on a
product that has a lot of uptime-sensitive customers! Start a different
product line, or start a beta program, or found a smaller company.

~~~
dcurtis
S3 is a really bad example because they provide infrastructure. Their
customers actually see their entire site go down. Those kinds of companies are
the exception. I hope Heroku has rigorous testing and scrutinizes every
change, even though they are a startup.

Let's say I own a video site and I want to add threaded comments. If I have 5
users and the site goes down for 5 minutes, those 5 users will get 5 minutes
each of annoyance. If I have a million users, each of those users will get 5
minutes of annoyance each also. There is no difference to the user there. So,
by adding more checks to make sure the site doesn't go down for 5 minutes when
you have more users, you're saying the more users you have, more the important
each user becomes. I think that's a strange way of thinking.

(The same is true here of an infrastructure service-- if S3 had 5 users and
were more cavalier about their release schedule and broke something, those 5
users would exact the same net effects of downtime as if S3 had 5 million
users.)

The awesome benefit of getting threaded comments developed, tested briefly,
and pushed in one evening is worth the risk of 5 minutes of downtime compared
to the 2 weeks of rigorous testing and approval-by-committee. No matter how
many users you have.

~~~
mechanical_fish
I used an infrastructure site as an example because the value proposition is
easy to understand when you use a site that has a clear and simple
monetization strategy. Video sharing sites are arguably an even worse example
than S3, because the value of uptime is so hard to perceive or compute. It's
likely that even Twitter doesn't understand the true value of a customer-hour
of Twitter uptime, because the site isn't monetized and so much of the value
is concentrated in the brand. Measuring that is like voodoo, only less
empirical. ;)

 _If I have 5 users and the site goes down for 5 minutes, those 5 users will
get 5 minutes each of annoyance. If I have a million users, each of those
users will get 5 minutes of annoyance each also. There is no difference to the
user there._

No, but there is a big difference for you! If a user is worth a dollar per
year, the five-user site is worth five bucks per year, but the million-user
site is worth a million bucks. If each patch to your code causes 0.1% of users
to abandon your product (a number which depends on the odds that a patch will
cause a rollback, and on the odds that a rollback will annoy a user enough to
make them leave), patching a 5-user site costs you half a cent per year on
average (most likely it has no perceptable cost, since odds are no users will
leave) but each patch to a million-user site costs you $1000 per year in
revenue. And that's just the _linear_ cost. There are nonlinear consequences:
one or zero annoyed users is nothing to worry about -- unless that user is
Michael Arrington -- but a clique of 1000 annoyed users is potentially a
_movement_ : a critical mass of people who will all start complaining about
your company on Twitter on the same day, potentially costing you your next
10,000 or 100,000 or 1 million users while simultaneously empowering your
competitors, who may begin building the site that will take you down by
poaching those dissatisfied users.

This is just the flip side of scalability. As a programmer you enjoy mighty
economies of scale: Running a site with a million users is more expensive than
running a single-user site, but it is _much less_ than a million times as
expensive. But this leverage also applies to your mistakes: a mistake that
costs you a dollar when your site is small might cost you $1,000,000 when your
site is big. And it's the same mistake! Typos are just as easy to make on big
sites as on small ones.

Obviously, this doesn't mean that you shouldn't ever change the site.
Presumably each and every one of your patches is valuable, and will _bring in
revenue_ to pay for its own insurance premiums. Right? :) But you do need to
think about that calculation, because you do occasionally make mistakes. As
your userbase grows, you may wish to test each patch on a subset of users to
be sure they will _really_ like it, and that the additional revenue is really
going to be there. You may wish to institute tests and internal audits that
lower the risk of rollbacks, or failover mechanisms to lower the cost of
rollbacks. And before long, lo, you will be that which you deplore: A company
with a bunch of annoying internal controls! But at least you'll have revenue
to console yourself with.

~~~
fallentimes
But I think what Dustin is saying (correct me if I'm wrong) is that the
multiplier applies both ways. And that the total cost of making a 5 minute
downtime mistake, even to a million users, could easily be outweighed by the
benefits of releasing a product/feature/site 2 weeks early. In most cases, I
think large companies are risk adverse instead of risk neutral to situations
like this.

I agree with both of you that it varies considerably based on what the site
does (infrastructure, videos, games, etc).

~~~
mechanical_fish
_In most cases, I think large companies are risk adverse instead of risk
neutral to situations like this._

I'm not going to argue with that. Just because a certain increase of caution
is rational doesn't mean that caution isn't being overapplied in many cases,
just as PG suggests in his original post.

------
jaytee_clone
How about a check immunity system?

For example, competent people who knows their own weaknesses well and often
check their decisions with others should be immune to checks

Good programmers often let other good programmers check their codes. Good
writers often let other good writers proof read. Good managers often discuss
his decisions with his employees before deploy them. They should be check
immune.

You can even pass out check-immune badges to encourage non-check-immune people
to be more critical of themselves :)

Of course, this system itself is a check. Who decides who will get check-
immunity? What's the successful-decision-ratio threshold? I think it's
possible to implement a light-weight system little cost.

------
vladimir
Yes, startups make their job better than big companies. But I think that it is
a part of a more general rule: people work productively in OPEN systems, and
the open systems themselves encourage working hard. Open source projects are
developed more rapidly, because it is hard to imagine the system which is more
open. Both users and developers control the system, and every user can become
a developer. In startups we have a limited number of people who can control
the process of development, but they still controll it. In other words, the
system is open for founders and employees. Big companies work unefficiently
because they are too closed, and a great number of checks is only a part of
it.

------
Cellar
Nice writeup on half an idea. Now, time for some release engineering. How do
you give good coders free (enough) reign while keeping tabs on the stupidities
of the lesser ones?

In a startup, it's acceptable for service to be accidentally down for short
periods while the geniuses running the show fix their latest cockup. In large
companies, it isn't. How do you get the best of both worlds?

It is not by removing all safety interlocks. Sysadmins, especially good ones,
can tell you in excruciating detail why not. Most programmers are not very
good sysadmins, or at least not as good as they might like to believe.
Coincidence?

------
netcan
This would be interesting as a concept to develop past the circumstance of
programmers at work.

I would call it part of institutionalisation. You have a need to be effective
via policies, checks, procedures & such. This replaces the shoot from the hip
of small groups.

I think you can apply this to schools that need to teach approved courses with
approved grading. Then it gets worse when they need examinations that are to
be applied across an entire country.

I'm sure there are principals that can be extracted & applied to lots of
places. The costs can be very serious.

------
phex
My experience is that extensive procedure comes because the small company
start up approach doesn't scale to large companies. Every one I have worked at
that has transitioned from from 10s to 1000s of people was failing because it
tried to continue as if it was 10 wizards at it. The wizardy actually hindered
others because it was opaque to them.

Interestingly the biggest company I worked for actually ran the best because
perhaps its roots being 80+ years before any of the others would allow the
significant difference to be tracked.

------
netman21
When I was an automotive engineer the industry instituted a Japanese "check"
called Design Failure Mode Effects Analysis. For every component the engineers
would have to predict the ways the design could fail in testing and what steps
would have to be taken to fix them. Time spent doing DFMA's added
significantly to the cost of producing a new part and worked counter to weight
optimization. Do you think, twenty years of DFMA's later, that automotive
design is more efficient?

------
randallsquared
"The purpose of the committee is presumably to ensure that the company doesn't
waste money. And yet the result is that the company pays 10 times as much."

Well, only if they still buy every product, which would defeat the purpose of
having a gatekeeper committee. If, instead, the committee approves only 1 in
25 products that would otherwise have been bought, the committee saves them
money (leaving aside the cost of the time of the committee-members).

------
ebvigmo
" And you know what? It would have been perfectly safe to let them. In fact,
the acquirer would have been better off; not only wouldn't these guys have
broken anything, they'd have gotten a lot more done."

My experience as a manager of a team of programmers for many years is that for
sure they would love to be able to write and release w/o a QA process, they do
in fact take the site down when this is allowed!

------
cunard3
Checks have a cost, for sure. What about Eric Reis's split-testing axiom? If
in a startup the developer/coder is basically the customer of the enterprise
because the cost of the check is paid by the rising frustration of the coder,
then should split-testing be looked upon as a check? Even if it is, the idea
of an available metric to backup claims of efficacy is really compelling.

------
MikeGale
For me this is all about "Mental Poisons".

Mental Poisons are ideas that harm people, life and the universe. They are
remarkably common.

Paul expresses this one very well. There's a lot more out there too.

Fossilised heirarchies and thought-free-rule-driven organisations are great
for the student of poisons.

------
nerd004
Thanks for this article Paul. Now i really know the why behind the what i
observed working for a big company

[http://computinglife.wordpress.com/2008/11/14/hands-free-
or-...](http://computinglife.wordpress.com/2008/11/14/hands-free-or-hands-
bound-mouth-loose-3-mouth-loose/)

------
joshv
Two weeks seems a bit much, but once you have a significant client base of
paying customers, you simply must have some form of production control and QA.
Even good developers make mistakes, and letting everyone just push their
changes to prod over lunch is a recipe for pissed off clients.

------
jdhawk
As a previous startup coder who was acquired by a much larger company, I can
truly say you've hit the nail on the head. I never minded working 70 hour
weeks before, but now I have a hard time punching the clock 35 hours a week. I
hate my new higher paying job.

------
ajclose
Oh, wow. Yes. I've often thought that a lot of this process is just to keep
the lowest common denominator from messing things up. I want to spend my life
surrounded with people that don't need this, because then we could do so many
more worthwhile things.

------
ddelony
It seems that checks affect journalism too. "Citizen journalism" allows people
to report much more quickly than traditional media. Bloggers don't have to
submit their posts to editors and fact-checkers, the way broadcast and print
journalists do.

------
terrycojones
Hi Paul

I'm surprised you didn't tie this in to due diligence checks done by VCs, and
its impact on startups, etc. The parallels are strong, and you've commented on
that impact before (at FOWA in 2007, I think).

Regards, Terry Jones

------
levbor
I'd say that the reason for bureaucracy in large organizations is that central
management doesn't trust middle managers. Whereas in a startup there are a
total of three bosses, and all trust each other.

------
hxa7241
The cost of the checks should balance the cost of the risks. Those are not
always determinate or exactly estimable, but, statistically, that would be a
simple basis of a rational approach. Wouldn't it?

------
guruz
Small side node:

I think it is very cool to have the comments here instead of his website. It
is nice to have them consolidated and not some comments here, some there.

------
dalelllarson
The movie "Meet the Robinsons" makes a point to children that we make the most
progress when we are free to fail. Thanks for making the same point for
grownups.

------
swdesignguy
This is exactly why we don't respond to RFPs any more.

------
pchristensen
pg worked in fast food?

~~~
pg
Baskin Robbins, in HS.

~~~
gcheong
What was your favorite flavor?

~~~
fallentimes
When PG worked there they only had 7 flavors :).

------
goodspeed
I guess the best check (and benefit) for startups are commit mistakes early,
commit mistakes often.

------
RomanZolotarev
in Russian [http://spring.jumpidea.com/2008/12/paul-graham-
artistship.ht...](http://spring.jumpidea.com/2008/12/paul-graham-
artistship.html)

------
fredwilson
i love the last paragraph of that post Paul. so true.

------
WildWildEast
Wonderful. Well thought, researched and valuable for large and small companies
alike. <http://wildwildeastdailies.blogspot.com>

