

Ask HN: how to push to a live site? - iphpdonthitme

So I've read a fair amount of books and links on the internet describing how to scale stuff.  Something I haven't come across are general strategies on pushing to a live CRUD site without taking it down.  Can anyone point me to resources that I've missed, or perhaps just tell me?
======
tlrobinson
Deploying a long-running JavaScript application, like 280 Slides (and most
Cappuccino apps), on one hand can be trivial (just copy the client resources
to your webserver!), but also can be an interesting challenge. Namely keeping
clients running an old version of the client-side code using the corresponding
version of the server-side app. When you add in something like Gears for
offline access it gets even tougher. There was a good presentation at Google
IO that covered all these issues: [http://sites.google.com/site/io/taking-
large-scale-applicati...](http://sites.google.com/site/io/taking-large-scale-
applications-offline---lessons-learned-from-google-docs)

We wrote a little custom code (we call it "bake") for deploying Cappuccino
applications:

"bake":
[http://github.com/280north/cappuccino/tree/master/Tools/bake...](http://github.com/280north/cappuccino/tree/master/Tools/bake/bake.j)

sample "bakefile":
[http://github.com/280north/cappuccino/tree/master/Tools/bake...](http://github.com/280north/cappuccino/tree/master/Tools/bake/example.bakefile)

It pulls your code from git (but could easily do local files, scp, rsync, svn,
etc), runs an optional build command (like "ant"), copies source paths to
destination paths in a deployment directory, gzips the deployment directory,
scp's the code to your server(s), ungzips it, and does a little magic...

Each version is placed in it's entirety in a uniquely named (unix timestamp)
subdirectory. We could just redirect from "/" to "/1221268756/" (for example)
but that's incredibly ugly, so we use the little known HTML <base> tag to
trick it. The index.html file in "/" is identical to the one in "/1221268756/"
except it has a <base> tag which tells the app all URLs are relative to
"/1221268756/" instead of the default containing directory ("/").

And it actually seems to work really well. The big advantage of this is you
can set your cache expire date arbitrarily far in the future, and your entire
app will be cached until you change index.html to point to a new <base>. 280
Slides, which is ~2.6MB uncompressed, loads on my computer in about 1.5
seconds if it's cached. The only problem with this approach is when you
deploy, all clients will have to re-download every resource, even ones that
don't change. A more granular system would be ideal, but significantly more
complex.

I looked at Capistrano briefly but decided against it for some reason I can't
remember. Perhaps that would have been better, but c'est la vie...

------
iigs
Approach 1)

A long time ago I worked (in a very small testing capacity) at a very large
video/streaming media serving company.

As I recall, the scripted procedure was to tag a release in CVS, and a script
would pull that down into a new directory and swing the symlink.

This site had a very mature and infrequently changing billing system code
path, which I believe was not modified in this process. If this doesn't
describe your workload you would probably have to change this to support
concurrent execution of multiple versions for people who have a session
established

Approach 2)

Using any mainstream hardware load balancer (or presumably a similarly
featureful free/open software LB), configure it to point at N+1 machines in a
cluster. Administratively remove machines from the new session pool one at a
time (virtual machines can make this flexible and easy to roll back). Once the
established sessions have expired or been forced out, upgrade the software and
roll them back in.

One neat aspect of this approach is if you have an "oh shit" hockey stick
scaling issue, you can watch it happen on one machine before deploying it to
every machine in your cluster. Also good for A/B testing, as mentioned in
recent articles here.

------
alex_c
If you happen to be using Ruby, then Capistrano (<http://www.capify.org/>) is
awesome.

I imagine there are similar solutions for other languages.

Worst case scenario, you can roll your own. A very basic trick is to use
symbolic links on your server - deploy the site to a new folder, and simply
point the symbolic link to the new folder when you're done.

~~~
lux
Capistrano works great for non-Ruby sites too! Our site is in PHP and it was
easy to get cap working for us. We also use github.com so the site essentially
pulls from there, which means we can also rollback in case we ever need to.

~~~
njharman
Concur, I looked a loooong time for something better then home grown
solutions.

Capistrano is great. I've used it to great success deploying Django(Python)
apps to complex environments.

------
run4yourlives
Could you define "push"? If it what I think you mean, you may want to check
out some of the deployment chapters in the documentation of rails, django and
other frameworks. They're usually not too bad at explaining how to get from
dev to live.

SVN also has some documentation on their site, and I imagine git would have
something similar.

------
zitterbewegung
Using lisp if you have a network based REPL you can modify the running code
live without taking the site down.

~~~
gibsonf1
We push code changes with Git to the cloud, then log into the server and git
pull. Then we use the repl on the cloud server to reload the code causing a
recompile of changed code including dependencies. Strangely enough, the site
continues to function during the recompile, which usually completes in less
than a minute depending on how many macros are effected by the change.

------
cosmo7
Deploy by svn. It's amazingly simple, and simple is usually best.

If you're smart you can orchestrate db changes non-destructively. If less
smart use capistrano.

~~~
iphpdonthitme
What's an example of a db change that "smart" people can do where Capistrano
allows you to be less smart?

~~~
cosmo7
The capistrano approach is to collect a set of SQL sequences that will alter a
sample server configuration from one configuration to another, with the
purpose of repeating this on other servers such as your production server.

This is fine if you are paid by the hour.

The smarter way is to make changes that span codebase versions. You want to
normalize a table so entities can have more than one address? Build an
addresses table crosswalked to entity id and let the new code use it. Once
you're happy with it you can drop some columns, but if you need to roll back,
you haven't thrown anything away.

To me the difference is like that between hiring a hooker and charming a
cheerleader. Either way you should make backups. ymmv.

~~~
iphpdonthitme
Hmm, I don't think I understand the diference. Aren't you both in the
capistrano case and the 'span codebase' example adding a table for example, so
wouldn't both cases require repeating sql sequences?

Although maybe what you are saying is that you can make 'dumb' sql changes
which break compatibility with past code vs 'smart' sql changes and code
changes which are backwards compatible?

------
amrithk
Hi, Would you be able to share some of the books and links on the internet you
have been reading that describes how to scale web services?

~~~
iphpdonthitme
\- Building Scalable Web Sites: Building, scaling, and optimizing the next
generation of web applications by Cal Henderson

\- Scalable Internet Architectures (Developer's Library) by Theo Schlossnagle

<http://philip.greenspun.com/seia/>

... and a whole buncha links on google with "scaling", although "scaling
mysql" seems especially popular

