
Continuous Deployment at IMVU: Doing the impossible fifty times a day. - TimothyFitz
http://timothyfitz.wordpress.com/2009/02/10/continuous-deployment-at-imvu-doing-the-impossible-fifty-times-a-day/
======
mechanical_fish
Don't be too disappointed if a single submission gets a lukewarm or confused
response on HN. The upmods and comments on here are a lot less consistent than
what you're used to. ;) Just keep writing. It's really valuable.

Also, it's clear to me why your daily routine might sound like science fiction
to the median HN reader: A lot of programmers have never seen a system like
this. As those of us who were online during a specific half-hour period a
couple weeks ago can attest, even _Google_ doesn't have a system that's
remotely as reliable as this: It appears to be possible to break all of Google
search, worldwide, in ten minutes by misplacing a single character in a text
file.

~~~
jeremyw
Hmm, on your Google point, we know that they use partial-cluster deployments
extensively, and several presentations point to sophisticated testing of these
momentary guinea pig users. I wouldn't hold a one-time lack of a sanity check
against their total uptime history. Tests ain't perfect.

~~~
mechanical_fish
I agree that we shouldn't extrapolate too much from this one incident. But
it's not like Google's super-secrecy policy gives us much choice. If anyone
from Google wants to tell us about their deployment infrastructure and explain
why this one incident really _was_ a nigh-impossible black-swan one-in-one-
billion-hour freak of nature -- or why Google has sensibly traded away a
certain amount of uptime in exchange for a more flexible architecture (or,
perhaps, more cash to spend on tasty gourmet pizzas) -- I'm sure we'll all
listen with rapt attention. Until then, we get to tease them mercilessly. ;)

Meanwhile, I'm sure that the original submitter would agree that _tests ain't
perfect_. If you read the link at the top of this blog post:

[http://timothyfitz.wordpress.com/2009/02/08/continuous-
deplo...](http://timothyfitz.wordpress.com/2009/02/08/continuous-deployment/)

...you'll find that this isn't merely an article about automated testing.
Automated testing is just a _part_ of the mighty continuous-deployment
ecosystem being described here. It isn't even the real heart of that system:
The heart is a planned, well-designed, _semi-automated_ routine for rolling
back changes in production. They roll out a change to a subset of their
servers, monitor for statistical anomalies _in the usage patterns of real,
live users_ , and only continue the rollout if there are no anomalies. If they
run into trouble, back they go.

~~~
jeremyw
I don't want to defend Google per se, but their uptime results speak for
themselves. I don't see how a rare bug necessitates mocking.

And I agree about resiliency of the deploy -- it's what I meant by
_sophisticated testing of these momentary guinea pig users_. Google's
presentations on this stuff are about analysis and data gathering of changes
both for immediate functional snafus and user preference for changes. i.e.
probably state of the art in this regard.

------
patio11
One of the greatest lines I have ever read on a blog:

 _It may be hard to imagine writing rock solid one-in-a-million-or-better
tests that drive Internet Explorer to click ajax frontend buttons executing
backend apache, php, memcache, mysql, java and solr. I am writing this blog
post to tell you that not only is it possible, it’s just one part of my day
job._

~~~
plinkplonk
"writing rock solid one-in-a-million-or-better tests that drive Internet
Explorer"

I find this unparseable. (English is not my native language). As far as I know
"one in a million" means something like "very rare". Help?

~~~
briansmith
"rock solid" is "very reliable" "one-in-a-million-or-better tests" is "tests
which fail less than one in a million times".

Our Internet-Explorer-based tests are very reliable; they fail less than once
per million executions.

~~~
sbraford
While the "one in a million" better is a cool blurb, what does it really mean?

Let's say your team makes 25 commits per day.

25 * ~300 working days = 7,500 commits per year

That would take 133+ years to reach 1 in a million.

The more interesting metric to me is how often the build gets broken.

~~~
nuclear_eclipse
The part you're missing is the 15000 tests, multiplied by a new commit every 9
minutes, which in 8 working hours, is roughly 50 commit-test cycles, so
750,000 tests run in a day's timespan...

Edit: of course that assumes a peak commit rate matching or exceeding the
commit-test cycle period. The point being that even a considerably low rate of
failure in the testing mechanism could manifest itself as a blocked commit-
test-deploy cycle at least once a day, hence the importance placed on rock-
solid testing systems that should only ever fail when the tested code itself
fails.

~~~
TimothyFitz
We empirically have on average 70 builds a day. The number is higher than your
calculation because we don't all work 9-5, we're commiting frequently from
around 8am to 9pm. We also run builds repeatedly overnight to flush out any
intermittently failing tests we may have recently introduced. We'll run the
builds as fast as they can go from 2am-4am.

~~~
sbraford
So how often does a commit get checked in that causes a test (or tests) to
fail?

It just seemed to me like you were bragging that tests get run over and over
again. They only need to get run if any new code is committed, of course.

And what kind of commit is being checked in every 9 minutes? How big is the
dev team? Seems like an awful lot of commits. Is each one a full-fledged
feature / bug fix for the site, or are many 1-line changes to the code?

------
DannoHung
I tell people that we should aim for this sort of automation and they pat me
on the head and say, "No, no, that will never do."

I think there's an idea that if something goes wrong because you let an
automated system do it, it's somehow much worse than if something goes wrong
because there was human error. I don't really understand the reasoning.

~~~
TimothyFitz
Exactly. Drew Perttula put it better than I'll be able to: "IMHO, manual
testing has only two advantages: it’s the easiest thing to [try to] do; and it
has a lovely accountability chain. You can always blame the developer, and
non-technical people will easily accept that this is the “inevitable cost of
software engineering”."

~~~
mst
The thing I'd be really interested in is how you deal with UI changes - I've
never found a satisfactory way to test "is this ugly/confusing" other than
letting a few users bang on it on a staging server.

~~~
TimothyFitz
UI is an interesting problem.

The ultimate solution is to have business metrics drive your UI changes,
usually in the form of an A/B test. Then you have a clear winner. This A/B
would be run separate from the roll out structure (and indeed, we do LOTS of
A/B tests).

Sometimes that's not possible, for a new feature or for content without a
clear business metric to evaluate for. Either way we often have someone
manually test new UI, so that we're not exposing users to something
fundamentally broken. We usually do this by using the existing deploy system,
but turning the frontend on only for QA users.

In the end, you do what works and is cheap, and that's usually something
slightly different for every project.

------
amix
From what I have read Facebook use a similar method: commit and deploy often
and rollback if something messes up. We also use this method on Plurk.com and
have done so for about a year. Thought, IMVU's case is pretty extreme :)

The major problem is rolling back client side changes (that are located in
scripts or CSS). This is pretty costly to rollback, because of browser cache -
we solve this by having real versioning of the static files so we can force a
refresh of browser cache (real versioning = script_{timestamp}.js and not
script.js?v={timestamp}).

~~~
abstractbill
_commit and deploy often and rollback if something messes up_

This describes pretty well what we do at Justin.TV too. These days I push new
code about 5 times a day.

~~~
TimothyFitz
I've read your post on unit tests, and I didn't understand what you were
trying to say.

Were you saying don't write automated tests that test your code, instead focus
on monitoring the actual production invironment?

Or were you saying that specifically the "unit test" class of automated tests
are not worth their time?

I can imagine a system that monitors the business metrics well enough to
prevent defects from slipping into production (it's a stretch, metrics are
soft and squishy moving targets), but I can't imagine using _only_ those
metrics to find every bug you ever slip into production. Metrics are so
distant from the bug that caused their downturn; you'd waste so many cycles
debugging. The gap between writing the code and finding the problem would be
much larger than if unit tests found them; that has to slow things down as
well.

~~~
abstractbill
Here's where we are putting our effort:

\- Monitoring the production environment, _tons_ of effort. We record and
analyze an incredible amount of data about everything that happens on the
site, and have more and more automated processes looking for anomalies (though
still nowhere near as many as I would like).

\- Automated testing not including unit tests, some effort. I wouldn't be
opposed to us doing more of this, but it's not incredibly high-priority and
there always seems to be something else that's more important.

\- Unit testing, yeah, not worth our time as far as I'm concerned.

------
inerte
I think the original article misguided some people. It all looked very simple,
update the code and put it in production. That _is a horrible idea_, as some
have noted.

What's not horrible is having thousands of tests, on dozen of machines, 9
minutes to-live, with selective updating of users, and rollbacks, as this
article has explained.

The original post was too light on details, I guess. Its intention was not to
be comprehensive anyway, the focus was why recently changed code should be put
in production ASAP. But it looked like the author was simply FTPing after
commit. And the whole "SOMEONE IS WRONG ON THE INTERNET" thing kicked in.

~~~
TimothyFitz
Honestly I think it's a gradient.

I'm also one of the developers on a hobby project called <http://TIGdb.com>
(Jeff Lindsay is the other, and has written the majority of the website) We
don't have a big Continuous Deploy infrastructure, but we also don't have the
users and business requirements of IMVU.

We started with the usual, completely manual deploys and hard-to-setup
sandboxes, and have been iterating towards a fully automated setup ever since.
The entire time we've been doing this, we've been committing and deploying
often. Our users _are_ patient, because we're giving them something they can't
get elsewhere and we're giving it to them for free. As we do introduce
regressions, we'll post-mortem them (probably using the 5 why's technique) and
we'll slowly evolve a system to prevent regressions. If the site is a success,
we'll have evolved a world class deploy system. If the site never makes it
that big then we won't have wasted time on infrastructure. It's classic lean
startup thinking (even though TIGdb is really just a hobby project).

~~~
sbraford
Just curious - who maintains the Selenium tets, and how big is the development
/ "QA" team?

I've never worked in a team big enough that it could devote resources to
maintaining all of the following kinds of tests: * unit * functional * AND
acceptance * plus writing the actual code

IMHO, a neutral third-party group like QA should be responsible for writing &
maintaining acceptance tests.

~~~
eries
Just a thought: perhaps these two facts are related?

------
lgriffith
Interesting idea but....

Looks like meeting that goal would constrain you to write code to be used by a
robot and not by a human. There may be many cases where this is both doable
and acceptable to the end user. So no problem with that.

I am greatly challenged to see how this could be done for a highly
interactive, visually oriented, subtle pattern generating response to user
input, type application. Computers are still not as bright as earth worms when
it comes to generalized pattern recognition. Which means we programmers are
about as bright as earth worms when it comes to writing such code.

How then could computers automatically test all the software reactions to the
wonderful and totally unpredictable behavior of mere humans as they interact
with your software? The test cases would expand to consume all the resources
available for development. All you would get done is writing all but
impossible test cases. At least you wouldn't ship bugs.

This does not consider the explosion of combination and permutations of inputs
that prohibits exhaustive testing that no matter how many systems you run
tests on.

It would be much easier and cheaper to go out of business. Your certainty of
being free of shipped bugs would be much better than one in a million.

~~~
mechanical_fish
As I noted elsewhere in this thread
(<http://news.ycombinator.com/item?id=475391>), this article is not merely
about automated tests. The author says that his company is using continuous
deployment because it lets _live, human end users_ bang on the code, as
quickly as possible, in bite-sized chunks that can more easily be rolled back
and fixed.

~~~
lgriffith
Then why do such an exhaustive automated test?

Why not have your local tests, automated or not, cover the common cases and
error conditions to catch programmer stupidities? Then let the actual humans
do the strange corner cases.

If your design is even close to correct, testing repeatedly tested code is
pointless. If your design is corrupt and your implementation is sloppy, no
amount of testing is going to save your ass.

I do very rapid turns and I am a one man team. I can turn my system in less
that 30 minutes and have the user testing it in a live situation on the other
coast. If I want 10 turns a day, I can easily do it. Low coupling, high
cohesion, clean correct design, and disciplined implementation makes it
possible.

I agree that doing things in small chunks is a great way to do it but doing
the equivalent of a weeks worth of global automated testing for each small
change seems like a silly exercise. That is except for the server hardware
salesmen and system admin people.

The sales commissions and payroll look rather good. The production of real
value is questionable. Bang for the buck is as important for testing as it is
in any other part of product development.

~~~
jmathes
I think the source of your confusion is that you're a one man team. You don't
have to solve problems that 20-man teams have to face. At least half of all
the code I depend on is code I do not understand, so I have to depend on its
tests, and I have to make the same promise to consumers of my code. If my
change breaks code someone else wrote that I didn't foresee, I am depending on
his tests to tell me what I screwed up.

~~~
lgriffith
Maybe the problem is that you have the 20-man team. There is no coherence in
the code. The design is wrong, coupling is too high, and the module cohesion
is too low. The large team makes certain that is the case no matter how
"tight" (aka heavy) your quality control process.

I have found from working in large teams, there is a core four who get things
done. The rest are simply dead weight dedicated to shuffling paper and
attending meetings. At best, they do nothing. At worst they create more work
than they do.

Use the right four and dump the other sixteen. You will get at least ten times
more productivity and ten times higher quality without even breaking a sweat.
If you don't have the right four, you are hosed from the start.

~~~
mudetroit
This works fine if you are tackling a problem that can be sufficiently
addressed by 4 developers. Depending on the size and scale of the problem you
are trying to suggest and the time line required for delivery you may need a
larger team.

When you begin to take that into account you realize you have to find ways for
the larger team to work together and still produce a quality product. Hence
the techniques being used by the author and other companies out there trying
to address similar problems.

~~~
lgriffith
"...you have to find ways for the larger team to work together and still
produce a quality product. "

I am not sure its possible. The communication overhead of so many linkages
forces incoherence. The resultant incoherence forces still more additions to
process and body count. That adds still more communication overhead. The
result is still more incoherence - not less. If something is "finished", its
simply because time, money, resources, and toleration ran out. The end result
was simply called "done".

Maybe that is the best we can do but I am hard pressed to call products
produced that way quality products. See Vista et.al. for instructive detail.

------
s3graham
Whoa, I love the sound of this as far as development process... But what
really blew me away is: 3D Chat makes $1M/month? Really? Or did I find the
wrong IMVU?

~~~
TimothyFitz
Yep, that IMVU. Here are a few more staggering statistics that we've
published: [http://www.vator.tv/news/show/2009-01-22-recession-not-
affec...](http://www.vator.tv/news/show/2009-01-22-recession-not-affecting-
imvus-virtual-world)

------
pskomoroch
I was thinking of doing a basic Django + Selenium + Hudson continuous
integration how-to blog post, but this blows me out of the water :)

~~~
TimothyFitz
I would love to see that post. The first question most people ask me is "How
do I get there?" and I don't have a great place to point and say "start here"

A well written concise introduction to continuous integration / constant
testing would be a boon to this community.

------
mhartl
I don't recall the last time an article linked from Hacker News so quickly and
dramatically expanded my notion of what is possible in software development.
Bravo!

------
pj
Continuous deployment is good, but the comments are valid.

There is a certain non-zero probability for errors to occur during deployment.
Binaries have to be reloaded, database connections have to be reconnected,
sessions have to be restored, etc, so the more you deploy, the larger the
coefficient before this probability in the "will something go wrong" equation.

So, what we do is break up our system into deployment groups where some
handful of users gets updated a few times an hour sometimes. We test the
deployment on this small set of users, usually they know the change is coming
and are ready to test the change in real time.

Sometimes we repeat this process using different deployment groups. Test in
this one, then test in that one, until we get a final small errorless
deployment and then we roll out to the masses.

If it is successful, we roll it out to the masses.

Your site doesn't have to be /all/ beta or /all/ production. You can have
batches of users in different groups.

------
jacquesm
Any chance of more detail than you are giving in your posting ? This is
extremely interesting stuff, I'd really like to know a lot more about what
goes in to achieving this.

~~~
eries
what do you want to know?

~~~
jacquesm
Everything :)

No, seriously, I'd be much obliged if you could tell what tools go into your
setup, how much of it is created in house - and thus unavailable - and how
much of it is off the shelf, preferably open source. I'd very much like to
spend time on recreating what you've done there.

~~~
eries
I've written in light detail about this in a few places; I'd be glad to share
more. Here's an assortment off the top of my head. Feel free to ask anything
else you'd like to know.

[http://startuplessonslearned.blogspot.com/2009/02/continuous...](http://startuplessonslearned.blogspot.com/2009/02/continuous-
deployment-and-continuous.html)

[http://startuplessonslearned.blogspot.com/2008/11/five-
whys....](http://startuplessonslearned.blogspot.com/2008/11/five-whys.html)

[http://startuplessonslearned.blogspot.com/2008/09/new-
versio...](http://startuplessonslearned.blogspot.com/2008/09/new-version-of-
joel-test-draft.html)

[http://startuplessonslearned.blogspot.com/2008/12/continuous...](http://startuplessonslearned.blogspot.com/2008/12/continuous-
integration-step-by-step.html)

~~~
jacquesm
Thank you, I'll be reading all of that later today.

------
gfodor
This is great stuff, thanks.

------
shiranaihito
Why do all changes _have_ to end up in production immediately?

~~~
teej
Because having thousands of real users running your code gives you insight
that automate tests simply cannot match.

~~~
shiranaihito
Yes, but they might give you a hard time too if you put out something silly
before thinking things through.

~~~
BeefingJection
The obvious solution, of course, is to think things through.

~~~
shiranaihito
Right, but doing that 50 times per day is more challenging than once, for
example.

~~~
jmathes
Whether you commit your code once in one batch at the end of the day or 50
times in 50 smaller chunks, you have the same amount of complexity about which
to be careful. In fact it's more complex in the former case, because in the
latter, for each push, you know that all the previous pushes are working.

