Hacker News new | comments | show | ask | jobs | submit login

Sometimes, it's cheaper to live with 10 minutes of scheduled maintenance at a low-traffic hour than to re-engineer your application to have 99.999% uptime.

Even if you have unit tests, continuous integration, a staging environment, and fast rollbacks, deploying to production can still be tricky:

1) Do you want to run MySQL, PostgreSQL, Oracle or SQL Server? Watch out for ALTER TABLE; it may sometimes hold table locks or invalidate open cursors. Of course, you could always rewrite your application to use a schema-free NoSQL database and migrate your records at read time. But that has a lot of consequences. Maybe 10 minutes of scheduled maintenance is an acceptable tradeoff?

2) Does your staging environment exactly match your production environment? Can you generate a full-sized, real-world load on your staging servers? Alternatively, have you built all the tools required for a phased deployment of your back end?

Now, I'm all in favor of deploying to production 5 times a day. That's pretty easy for any group with good unit test coverage and a fast rollback mechanism. But it's expensive to eliminate the occasional 10 minute maintenance window. And if you're going to take the site down for maintenance, it makes sense to do it at an off-peak hour.

I don't think the author's point is that everyone should be doing middle-of-the day seamless deployments from day one. I think the point is that if you continue to do deployments at 3am indefinitely you are hurting yourself.

There are lots of things that make sense at the appropriate scale. Not having any tests. Completely manual processes for builds and deployments. Having only one server. Making backups every day or maybe even just every week. Etc. However, as your business grows you need to recognize that it's important to change these things and move to better processes. Because if you don't then those things could become a very serious drag on development velocity and even business capability. You could find yourself spending all your time drowning in process when you should be spending your efforts more efficiently.

You stay up until 3am to perform a risky manual deployment, sometimes you spend some time fixing problems afterward. As a consequence you either spend a day in zombie mode with limited productivity or you start working much later. In either case you've taken away a fairly decent chunk of time that could have been used for productive work. Similarly, perhaps you spend too much of every day fighting your build system, or recovering from your build always being on the floor, or excessive operational support because your platform doesn't have sufficient redundancy at every tier. Etc, etc, etc.

middle-of-the day seamless deployments from day one

From my experience, it's easier to start there from day one than to try to get there after years of dysfunctional deployment processes.

Some things are generally better and easier to do a bit before you really "need" them. Deployment, build, branching, and backup procedures almost certainly fall into that category. Performance as well, as long as it's done smartly.

Interestingly, scaling is probably the least valuable thing to "overbuild" early and yet that's by far the most common thing teams tend to do.

I would just add "test automation" and "restore from backups" to that list.

Indeed! Also, pretty much everything from the "Joel test" goes without saying.

Is there any way to do fast database schema migrations? This is not a nitpick, I've posted a question on SO about this (http://stackoverflow.com/questions/6740856/how-do-big-compan...).

As is suggest above, you could use NOSQL. But NOSQL still has schemas, in a sense - if you decide to restructure anything you may still need to knock the db out for a while (or write really hairy code to work around two different versions of your data, but be unable to restructure stuff because you're too scared to change anything on the db).

I'm being serious, but also snarky: Don't do schema migrations.

I saw a talk by Brett Durrett, VP Dev & Ops at IMVU last week. They don't make backwards incompatible schema changes. Ever. Also, on big tables, they don't use alter statements ever, because MySQL sometimes takes an hour to ALTER.

Instead, they create a new table that is the "updated" version of the old table, and then their code does copy-on-read. i.e. they look for the row in the new table, if it's not there, copy it out of the old table and insert into the new table. Later, a background job will come through and migrate all old records into the new table. Eventually, the old table will be deleted and the copy-on-read code will be removed.

It's a lot of extra work, but they think it's worth the effort.

I need to finish my blog post on the rest of the talk...

This x infinity. Any time you have a thought to do something that will lock an entire table that is active in production you should think about another way to do it. The read-through "cache" with rolling back-compatability method is a great way to make such breaking changes without causing significant downtime.

No matter what you end up doing it'll be extra work. It's best to find a way of working that provides enough flexibility to continue developing at a good pace without being an operational nightmare. Generally that means you'll have to take things a bit slower, but that'd be true almost regardless of the technology you're using, schema changes rarely have a trivial impact.

Would love to see that blog post since a lot of people struggle with schema changes, particularly 'alter table'.

I tend to think in terms of "augmentations" instead of migrations. Backwards-compatible data changes are way to go. Adding tables/columns is better than removing tables/columns.

My app, on startup, ensures that it has all of the tables (with all of the columns) and any seed data that it may need (by issuing ADD TABLE and/or ALTER TABLE). This allows me to simply roll out new (tiny, incremental) database changes and code and be sure that it works.

This also makes testing easier, in that my integration tests can start with a new database from scratch, build what needs to be built, run the tests that talk to the database, and then remove the database once it has finished.

If you must make non-backwards compatible changes (renames, whatever), I would suggest doing them one at a time.

Here is a paper on that topic: http://pmg.csail.mit.edu/~ajmani/papers/ecoop06-upgrades.pdf

Their examples show that it is not possible to do all application upgrades without downtime, and they show how to keep downtime to a minimum (e.g. by only making parts of your application inaccessible).

There's an easy way to do 3am US deployments: do them from Europe. I used to work at a consultancy in Ireland that did this, and it worked quite well. Deploy first thing in the morning, and you have up until some time in the afternoon to fix things if it goes pear-shaped.

"You stay up until 3am... spend a day in zombie mode with limited productivity or you start working much later."

Another option is to just wake up much later: my office is dead until 2pm, and the "key players" often don't wake up until 6pm.

I guess it makes some sense if you are trying to deploy in the late hours but it doesn't sound sustainable keeping a whole team so far out of sync with everyone else. I'd imagine those with families or do things like sports of a night would eventually quit.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact