
Why we don't schedule deployments during off-hours - joshuacc
http://blog.scoutapp.com/articles/2010/10/21/why-we-dont-schedule-deployments-during-off-hours
======
Murkin
Our deployment process was always at night (10,000+ user org).

Even if your process is great and there is no downtime during upgrades, any
problem might cause the whole organization to loose thousands of work hours
(instead of just your team working during the night).

If your deploying a "nice-to-have" app, its all fine.

Otherwise, get to work late, order a pizza and prepare for a night of
adventure.

~~~
InclinedPlane
The better approach is to figure out where the risk in pushing updates is
coming from and develop tools to mitigate that. More testing, pseudo-
production deployment infrastructure, better change management, more frequent
deployments, better understanding of the changes going out with big
deployments, etc.

The biggest companies in the world (amazon, google, etc.) who have literally
millions of dollars an hour/minute/second at risk if they screw up deployments
put in place all of these tools so they _can_ push out deployments at peak
times and be confident that they aren't going to destroy their business on
accident.

~~~
scottyallen
When I was at Google working on web search, we didn't push search
infrastructure during peak hours. I think this was mostly due wanting to make
sure we had spare serving capacity to failover to, in case the push was
broken/had performance problems/crashed/etc. Note that "peak hours" isn't the
same as "working hours". Just a few hours when we saw our highest traffic.

However, we also had a strict policy of no pushing in the middle of the night,
Fridays, or weekends, for all the reasons the article listed. Most problems on
a relatively mature system occur when you're changing stuff. When stuff
breaks, you want to have the people involved be fresh and thinking clearly.
You also want everyone who might be needed to fix something available, in the
office and in front of their computers if something goes wrong. Lastly, it's
not sustainable to have your engineers/operations folks regularly giving up
their nights and weekends for routine operations. You should save that karma
for when everything goes to hell.

This is all predicated on having pushes not negatively affect production
traffic when everything goes according to plan, and minimize the impact as
much as possible when something does break. One of the best ways I know to do
that is to increase the amount of traffic proportional to increases in your
confidence that the change is good. The idea is that any push starts off by
only affecting a small part of your infrastructure/traffic, and the longer you
run it without problems, the more traffic/infrastructure you push it to.

At Shopkick, we do this by pushing first to a single machine, watching it for
any errors for a few minutes, then gradually pushing it to the rest of our
machines. If anything goes wrong, only a portion of traffic was affected, and
the portion is well correlated to the probability of having a problem.

~~~
InclinedPlane
Conditional deployment with the ability to control the level of deployment
incrementally is a hugely powerful tool. Twitter and amazon both use them, at
least for higher risk changes. The ability to "roll back" a deployment
completely at the press of a button with a configuration change with no actual
running bits changing is very potent.

------
briandoll
It's been many years since I worked in a shop that didn't deploy during the
day (large financial institution aside), and for all the reasons the post
mentions I completely agree.

What enables a mid-day deploy, however, is a well thought-out deploy process
that aims for a zero downtime deployment. This can be especially complicated
when you have relational database changes baked into a deploy.

It's interesting to see some folks suggesting that they may not be able to do
this during business hours. Here's the thing: when bad things happen, you need
to deploy ASAP. Developing a great deploy process that allows your system to
remain standing during that deployment has numerous advantages, including the
ability to deploy at will, and have no fear in doing so.

~~~
sokoloff
Relational DB changes don't have to be particular show-stoppers, provided
you're willing to accept a few constraints:

1\. No select * in SQL; name your columns in selects and inserts (prevents the
select result set from changing or the insert from breaking when you add a
column later)

2\. All tables need table aliases in join queries. (Otherwise, adding a column
can cause a naming collision and introduce ambiguity into a presently non-
ambiguous select.)

3\. Transactional columns can be added, provided they have defaults or are
nullable

4\. Columns can only be dropped in two releases, one to remove all references
to them and the second to actually remove. (You may want to rename the column
first during the second release and drop it a few minutes/hours later once
you're sure no one is relying on it.)

5\. Modifying a column requires some care (to insure that you don't lock the
table too long during the modification). Here again, you can probably add a
column of a different name, gradually populate it, then rename into place
later, add a new table and left outer join to that, or wait for a future
downtime event (rarely required in practice).

It's not necessarily "trivial" to accomplish, but it's also not deep, black
magic.

~~~
nl
This is a good guide, but note that he said _zero downtime deployment_

That imposes an additional constraint - the new code needs to be backwards
compatible (while any data migration scripts are running), and - in a
clustered environment - it needs to be able to deal with the case where old
and new code bases and databases are available and running simultaneously.

~~~
sokoloff
If you release the new code only after the DB changes are complete, the new
code doesn't need to be backwards compatible.

The above guidelines are designed to provide no-change to the queries the old
code is running, so that should be possible to accomplish for 95+% of your
migrations.

For truly long-running migrations (longer than you're willing to wait to push
code), you're correct that the new code must handle both cases. In our case,
that's extremely rarely a factor, but we also don't push on every commit (nor
even every day) like some shops do.

------
barclay
As others have mentioned, this really only works if you have a simple (and
solid) deploy and rollback practice. Most people don't.

Given that, all it takes is one terrible deploy: shit goes way south and
you're down for most of the day. Now you have the CEO standing around your
desk nervously looking at you trying to re-federate the databases when you to
realize a 7PM deploy is not all that bad.

>Deploying a major update when I’m not in work-mode is awkward as well.

I'm sorry, no. I love my family and time, too... But logging in from home for
an hour or two in the evening to do a deploy and validation isn't that much to
ask, especially considering most of our salaries. If you're that sensitive
about it, come in late the next morning.

~~~
catshirt
sounds like the issue here actually lies in the suggestion that most people
don't have a sufficient deployment mechanism.

~~~
fleitz
Yup that is usually the problem, when deploys are the exception rather than
the rule you are looking for problems.

------
thirdstation
I do the same with minor exception.

If the busiest time is during working hours you maybe want to avoid it if you
are taking in money during that time.

Another good reason for doing maintenance during working hours is that's when
your network admins are also working - or anyone else outside your group whose
help you might need.

------
davidu
What are off-hours? This is the Internet.

We used to have off-peak times, but even those days are mostly gone now.

Anyways, I agree with the spirit of this article. Best to do pushes when
everyone is around or paying attention and you are refreshed, in case you just
unleashed a disaster. :-)

~~~
bmj
While our web applications get around-the-clock use during the business week,
there is plenty of downtime on the weekends. Really disruptive stuff
(generally hardware or OS-level changes) is done on the weekends, but normal
deployments and updates are usually scheduled in the morning, for the same
reasons outlined in the original post.

------
plnewman
This article is a little disappointing. The writer brings up the obvious
advantages to it, but doesn't say how he manages the technical risk of the
activity (unplanned downtime).

------
brown9-2
The best approach is to design and build a system so that you can make updates
to the live system with minimal impact: no need to stop and start servers,
risk unaffected components being down, etc.

Of course a lot of this depends on your framework, tools, etc., but a little
forethought can go a long way.

------
keltex
My addendum to this rule is I never deploy new features a week before I go on
vacation.

~~~
sophacles
I like to have a web app launch the upgrade script. That way I can log in from
my fone and start the upgrade right before I get on the plane. Whats the point
in having minions if they can't finish the details for you?

------
ojbyrne
I think launching on weekends (if your usage is lower then) is a viable
alternative. You can be fresh compared to doing stuff in the middle of the
night and you can give everyone a day off during the week to make up for it.

------
mikedanko
My body and mind have taken a beating I don't think I can do much longer from
ten years of useless night work swinging shifts sometimes three times a week,
that guy is too happy.

That's a great list of personal reasons, but business can come first for the
same reasons. My last big deployment was to half a million set top boxes. In
the end, I had to do it at night, but I had oh so many reasons for doing it
during the day:

* What if a percentage of the boxes bricked halfway through the deployment? A statistical insignificance in the lab doesn't help me help 15k people with bricks on their TV's. When they start playing with the boxes, I'm going to have to roll trucks and that's expensive. Instead man up the call center during a time where we don't have to pay shift diff.

* From an end-user standpoint, there's a different set of eyes on the products at different times of day. Think about it this way, if you sell porn, how much are you selling at 2AM vs. 9AM do you think? They're not going to call and complain about their porno not working, but it doesn't mean there's not a huge level of dissatisfaction. There's a lot of general economics and trade offs involved.

* Top notch help has the luxury of sleeping at this hour because they've probably earned the tenure. I'm not going to get the vendor of a vendor to help me with this stuff at 3am, and _they're_ definitely not going to be fresh and chipper, and getting anyone beyond support on the phone takes hours -- by that time, they're awake anyway.

Change the above as you see fit to adapt to your type of work, it's all the
same.

------
twymer
I think it really depends on the market too. If you're deploying something
that businesses use every day, taking it down during business hours is a bit
of a problem. If it's something I pay for and use personally, I would probably
prefer it be down while I'm at work and not 5 minutes after I get home.

~~~
nopal
Are more businesses moving towards deployments that have zero downtime?

The company I work for does not, but it seems like a lot of others are using
various strategies to bring up a new version of an application and then
transition users over to that, rather than bringing down the old one and then
bringing up and new one.

------
dclaysmith
I work in Ireland for a company whose primary market is the US which, as far
as builds go, is the best you could ask for!

I do builds around 9-10am GMT after a good night's sleep when I'm nice and
alert (and have had a cup or two of coffee). All the while, (most of) our
target market is fast asleep in the US.

------
rdl
I believe in doing deployments for enterprise apps on Saturday or Sunday AM;
everyone is still fresh vs. sleep deprived, and if something goes wrong, you
have a long time to fix it. Plus, minimal impact on the customers.

Another option is using the more-bogus holidays (Presidents' day?), or
religious holidays you might not celebrate (Easter isn't that secular, so non-
Christians can give it up).

The trade for staff is that you get comp time to use later, and usually at
some multiple. I love working on Christmas for 4 comp days later.

Working a weekend and getting 2-3 days off mid-week is awesome; not only do
you get more done while working uninterrupted on a weekend, taking a mini-
vacation during the off days is very cost effective. $49 suites in Vegas, woo!

------
semipermeable
A good approach I've used before is:

\- no Friday deployments \- deployments run primarily by a single person
should be done early in their work day \- avoid Monday deployments ... people
are usually fuzzier on Monday than they are on Tuesday, after having had a day
to go through the weekend's email and get their heads back into work.

------
SebMortelmans
I'm a big advocate of gradual deployment, both in terms of features (limiting
your changelog) as towards userbase (not deploying it for everybody at once if
it's a sensitive update). Not really too concerned anymore about what time of
the day, I think OP has a few valid points there.

------
protomyth
I would prefer this, but I currently have an edu gig and doing anything while
students are in class is a serious recipe for disaster.

------
code_duck
I schedule deployments during off hours.

The key is to wake up at either 4 pm or 3 am.

~~~
code_duck
Wait, did someone think I wasn't serious?

------
gcb
launching at off-hours or friday night is a way to the boss to abuse young
kids

