Edit: I should also mention, if you are stuck on something like restricted hosting with CPanel which severely limits your deployment options (some of my clients are in this boat), then http://ftploy.com/ is a really cool solution. But you should really get your ass off cpanel asap.
Double edit: Some of the replies below have made some good points that I had not considered which weaken my argument. So while I'm now more ambivalent than dismissive towards the idea of using git to deploy, there are several modifications that should be made to this particular system to make it production-ready. See avar's and mark_l_watson's comments below and mikegirouard's comment elsewhere for some ideas.
I generally see four stages in the devops maturation of a programmer:
1. I rsync my code using pre-built commands in Fabric when I'm ready to push.
2. I write code and have hooks on the server to pull the repo when I tag a release in my VCS.
3. I use my language's package management system to build a source distribution that includes all of the necessary static assets, the web application, and any database migration code; I also use a sane versioning scheme to keep track of releases. When I want to push I use a build system that hooks into my continuous integration server and builds a distribution whenever the senior programmer tags a release. It is then made available to the production server in a deb or rpm repository where the senior programmer can then just run an update command (that updates with the new distribution and runs any necessary database migration or post-upgrade hook scripts).
4. You are so big that you've got a custom deployment system built on-top of BitTorrent (ala Facebook) or something similar.
It should be obvious where I'm at - I progressed from being an adherent to VCS deployment, to rsync only, to a proper source distribution release system. I haven't managed the devops for a team/application the size of Facebook yet but I'm sure I will get there soon.
Benefits of versioned archives over VCS for deployments: easily checksum and cryptographically sign; easily integrate with existing distribution specific package databases; deploy without requiring a VCS (and all its dependencies, including maintained and accessible VCS repo-hosting deployment infrastructure), probable security and speed benefits of the resulting (ie. minimalist) approach (both at the level of the host and the network).
Personally I use a combination of versioned archives and named and versioned target environments, each of which can be tested both individually and in combination (including regression tests). This works well for me.
I suppose then that you mean "rpm" or "debs" or the like. Not the "language package management system" as the previous poster mentioned. Because I've yet to see one that truly support more than tar xzf <list of deps>.
Even when they have signing support none of the packages are signed, anyways.
Just as small nitpick: Your VCS should support this, too. (Git does, for example.)
The reason I've started to like git is deletes. You can handle them with rsync:
The problem is that some projects have content uploaded in the same file tree (simple CMS installs). This might not be an issue if it was structured differently (symlink to another directory), but sometimes it's what I have. Using "rsync --delete" would remove newly uploaded user content. Yeah, I could use the "--exclude" option as well.
With git, I can just "git rm ..." and the file will be removed on deploy. Content can be mixed in the same tree and hidden with a .gitignore file. File content can be managed separately with rsync, if that's the best way. Just not FTP. Please.
Note that rsync also allows fairly powerful in-tree tweaking of details: if you give it the "-F" option, it will look for ".rsync-filter" files (see man page for details).
I'm not saying there aren't uses for rsync, but your dismissal of git
as not being suitable for a "true production deployment system" isn't
supported in any way. And stating that rsync was "specifically made
for this kind of thing" without comparing any of the trade-offs
involved is just appealing to authority.
Some things you may have not considered:
* rsync is meant to sync up *arbitrary filesystem trees*, whereas
with Git you're snapshotting trees over time.
When you transfer content between two Git repositories the two ends
can pretty much go "my tree is at X, you have Y, give me X..Y
please". You get that as a pack, then just unpack it in the
Whereas with rsync even if you don't checksum the files you still
have to recursively walk the full depth of the tree at both ends
(if you're doing updates), send that over the wire etc. before you
even get to transferring files.
* Since syncing commits and actually checking them out are two
different steps you can push out commits (without checking them
out!) to your production machines as they're pushed to your
Then deploying is just sending a message saying "please check out
such-and-such SHA1" and the content will already be there!
* You mentioned in another post here that rsync has --delay-updates,
this is just like "git reset --hard" (but I'll bet Git's is more
efficient). With Git you can do the transfer of the objects and the
checking out of the objects as separate steps.
* It's way easier for compliance/validation reasons to not get the
data out of Git, since you can validate with absolute certainty
that what you have at a given commit is what you have deployed
(just run "git show"). If you check the files out and then sync
them with some out-of-bound mechanism you're back to comparing
Having trying to rsync to 1000 servers at once from one box (the naïve
implementation with rsync) would take forever and overload that one
box, especially if you wanted to take advantage of pre-syncing things
on every commit so the commit will already be there if you want to
roll out (constant polling and/or pushing).
You can mitigate this by having intermediate servers you push to, but
then you've just partitioned the problem, what if you need to swap out
those boxes, they go down etc.
With Git you can just configure each of the 1000 boxes to have 3 other
boxes in the pool as a remote. Then you seed one of them with the
commit you want to rollout. The content will trickle through the graph
of machines, any one machine going down will be handled gracefully,
and if you want to rollout you can just block on something that asks
"do you have this SHA1 yet" returning true for all live machines
before you "git reset --hard" to that SHA1 everywhere.
As for your comment about being "back to comparing files", that's all Git is doing internally anyway. You can do the same with other deployment tools and sha1 hashes etc.
> it can all be accomplished with other tools
> and without needing the entire deployment history stored on each
> production machine.
> As for your comment about being "back to comparing files", that's
> all Git is doing internally anyway. You can do the same with
> other deployment tools and sha1 hashes etc.
You'd be pleasantly surprised how much checking/validation/syncing
logic that you have to write around e.g. rsync when syncing a Git repo
just disappears entirely if you just use Git to sync the files.
Maybe you start out on Heroku. Then you switch to your own machines and use this simple hack, or Dokku or something. Then something home grown. The complexity of deploy scripts can grow while the interface stays the same.
But for an argument based in pragmatism, rsync has tools such as the --delay-updates flag, which allows your entire deployment procedure to become a pass-or-fail atomic operation. This kind of assurance slows my hair loss as a systems administrator. AFAIK git has no such tools, but I'm certainly open to being corrected.
git fetch <remote> && git reset --hard <remote/branch|tag|hash>
Are you concerned about being wasteful with disk space? Or is there some other concern here? Some security issue perhaps?
Imho version control could be used for deployment but only when you use the release-branch of your project.
And ofcourse NEVER put your config in version-control ;)
I'm not sure I'd make that blanket statement. Version control seems like a great place for configuration. It allows you to centrally manage configuration details and provides an audit trail for debugging. You just want to make sure it is in a separate, secure repository and not mixed in with your app development.
You may want to look into salt, chef or puppet, which let you separate out your configuration from your security credentials.
Why wouldn't you want to version your configurations in general? Maybe I am not sure what you mean by "config" but in general version controlled configuration is always good. I even set up git in my etc sometimes to track changes I make to it manually (not in production, on my home machine).
Mostly it's because it seems people are using Git to deploy without a good reason. At least I haven't heard of an advantage enjoyed by those using Git for deployment.
There are some obvious disadvantages, so what is the compensation? It seems the only reason is that it's easy to type "git push". But of course any deployment method can be wrapped in an equally easy script command.
Okay, I have to admit to knowing of one advantage, and that is that only the delta of your changes will be transmitted over the wire, rather than a complete checkout. It's just that in practice the savings aren't usually enough to warrant the potential downsides. For my money i'd prefer rsync or any number of other solutions.
Once the repo has been fetched, we just check out the right tag/revision and do a local copy from the git repo into the app directory. At this step you can exclude .git if you want.
This process has an advantage over direct git checkout in that if you (heaven forbid) ssh onto the server and directly modify anything, you won't end up with conflicts.
Are you 100% sure that there is nothing you are exposing via your git repo that you want to keep away from the person who manages to hack your server or discover some means to reach the repo externally?
Getting hacked is not inevitable, but if you treat your systems as if it were you'll be a lot safer if it does ever occur.
* It may be hidden in some old commit (e.g: some password)
* You'd need to rewrite all history from that point
* Force push doesn't necessarily clean the data from the remote
1) You can go through the trouble of identifying and resolving all the edge cases that you encounter when using Git as a deployment tool. Keeping in mind that one of those may result in an embarrassing security disclosure. Woops.
2) You can use a deployment tool that was developed for that purpose, has existed for years, and has had many sets of eyes on it; many of which are inevitably more experienced than you. And you still may end up with an embarrassing security disclosure, but the chances are better that you'll hear about it through responsible disclosure channels first, rather than waking up at 3 AM to the voice of your boss/client asking why the site is redirecting users to buy Viagra at a discount.
A bonus third choice:
3) You look at existing deployment tools and ask yourself "I wonder why they do that?" Then, maybe ask around a bit. Once you've got a good idea of all idiosyncrasies involved with deploying software, then you embark upon building your own tool. I think you'll find that simply `git pull`ing from your httpd document root and `rm -R`ing the .git/ directory won't be your final solution.
Also, minimizing your exposure in case of a security issue is probably a good idea, so the convenience of deploying with git may or may not be worth this extra exposure.
The configuration can be handled in the post-receive hook too.
I advise against using Git as a deployment tool for serious development (you should use Puppet instead), but for quick hacking and personal projects it's perfectly fine.
As is suggested in the SO article you link, it's more appropriate to locally export the version you want to deploy and use ssh (or whatever) to transfer it to the live server. Nobody ever seems to try to justify _why_ you would want to use git-push, they just go about explaining how you do it.
Honestly we are just talking about transferring files here. However you automate it, as long as it gets the files from point a from point b, is fine. I happen to find it most convenient to use git since that is how I send and receive code changes everywhere else, and it seems foolhardy to introduce another file transferring tool without a really good reason why. Moreover it lets me very easily tell exactly what revision is sitting on the server, and also causes me to pause before I push. It is also really easy to integrate git, through hooks, with a continuous integration setup.
Many of us like to have some recent past releases sitting on the production servers in order to make instant rollbacks if we discover a bug after the code has been deployed. Git provides that for free.
* Capistrano can deploy via git (it uses export)
* Capistrano keeps a configurable number of releases around in case you need to rollback
* Capistrano provides an ordered task system with before/after hooks at every one of its pre-defined tasks
* Capistrano can be just as lightweight as using git to deploy:
* You may not need all the stuff that Capistrano provides today, but as your project grows, you will need it. Why waste your time with a compromised deployment hack when better tools are available and easy to use?
As I said you shouldn't use this for critical services, but it works great for quick hacking. It's also very convenient for non-production (i.e. testing/staging) machines, to automate continuous integration.
Build on a build server
Scp to live server along with generated config
Install with native package tool and hook into native service manager
Use salt / puppet / chef to do everything after initial build on your target servers.
I'm sad that so few people seem to build native OS packages for deployments. My build system creates a release package and sticks it in an apt repo, then puppet installs latest version of package when it runs.
After a while if they use python, node or ruby, maybe start generating language specific packages (pip, virtualenv, etc).
Next phase is when the need for upgrade/downgrade transactions comes about, handling transitive dependencies (my package needs another package, which in turn requires a third package to be upgraded, which is a base system package to be upgraded). Now 'make && make install's looks silly and mess up the file system with left-over files. Deployment ssh scripts become a tangled mess and so on. Then slowly they think "It would be cool if there was a system created that can transitively handle package versions, and maybe provide transactions with pre and post install scripts".
If they are lucky, someone will point them to apt or rpms or they'll write a broken version of those things from scratch.
(I'm especially interested in whether the restriction most of them impose on having multiple versions of a package is something you are dealing with).
It's nice, but has one major drawback which may be a showstopper, depending on your use case. It can only keep one version of a package at a time in each distribution.
For my post-receive hook, I always add a tag to mark a deployment:
git tag deployment-`date +'%Y%m%d%H%M%S'`
git log prod/master --oneline --decorate
git push $1 +HEAD:master
git fetch $1
The extra `git fetch` will pull down the auto-generated tags so you can see them locally w/a simple `git tag`
Edit: Forgot to mention, that since git ships w/a bash shell for Windows, most of this should work for Windows-based dev setups as well.
It's basically a more advanced version of what you're doing.
git remote add origin
git push origin master
instead you should have a build server which builds up a package (rpm, deb, tarball?) which is then used to deploy across the production environments.
you should also not compile JS/CSS etc on production system that is what the build server is for.
anything installed on a production system should be 'required' for the app to actually run.
that said, you can use capistrano (and other tools like this) to update 'demo' environments and dev environments (with git) however the actual TEST and STAGING environments should mirror the PROD environment (packaging).
On the other end, you have hipster developers who need Git on production systems because that's the only tool they know.
Don't get me wrong, I love Capistrano, git deploy hooks, ruby gems that do deploys (heroku), but most cloud hosts are only offering this mechanism to deploy apps. FTP became popular because of the ease of use for designers and webmasters. You don't always need to deploy your entire application for simple changes. Another big one is the ajax file editor in the browser.
For trivial changes a simple file change would suffice. When you do an entire deploy for app like this, depending on your dependencies and payload, it could take a long time. What if you had the wrong price and need to make a change immediately? Of course maybe now there are multiple environments which play a factor too.
I do realize that was before we had multiple web servers running the app and that is part of the reason, but there are still ways to make it work (file mounts).
I'm hoping more deployment options in the future and that cloud hosts realize the need is still there from traditional hosting.
Doesn't scale past one developer or one box.
I'd say 80% of the sites online are managed by one or two people. They may need the scalability of the cloud for traffic bursts, but we can't say cloud is the future if all of our existing tools and workflows are completely broken.
For the past year I was building a cloud competitor to Heroku. We had a traditional host (like HostGator) and we talked to those customers about moving to the new cloud infrastructure and all the benefits to why. Most people said it was too complicated and were stuck in their work flows (FTP and file managers). Which is why I wanted to chime in with FTP is not dead.
Worth the read if you're interested.
If you want to deploy using Git then the smart thing to do is to use one of the many continuous integration tools out there that were built specifically for this kind of workflow. I use TeamCity to run my tests and to build/deploy my website whenever I push to my default branch. This works really well for some of my sites, and although I'm looking for a way to refine this so I can also deploy database changes between local/staging/web servers I can't think of a better way of doing this.
When I deploy my website, I use a different approach: My webroot is just a symlink. My deployment script exports the repository to a directory with a unique name for every commit. When the export succeeds, the symlink is updated to point to the new directory.
The advantage: Changing to the new version is instantaneous. If something should go wrong, I can immediately revert by changing the symlink back to the old dir.
No, they are not. During the push, there will be a short amount of time in which some parts of the website will be operating on new code, while others will be operating on old code. If many components are in play (ie using libraries) you may end up breaking things if a new request comes in at the right time.
I work on a closed source system, so we will never deploy our code via git and then build on the server. So, in this case build locally (or on a build server), and rsync from there using deploy scripts.
And yes, I have deployed with git, so i'm not speaking out of complete backwards ignorance. I can still see a use for both.
It doesn't work on "all of the servers" either. All of my servers have ssh/scp available and will never have ftp.
Not a problem if you're only occasionally updating one site at a time, though I'd agree it doesn't scale up the way git would.
nat issues, no standard
valid points. I've never had issues with either but I don't work on the kind of projects a lot of HN users do, so I really tend to only care if it stripped the line breaks or not.
it doesn't work on "all of the servers" either. All of my servers have ssh/scp available and will never have ftp.
I stand corrected.
It is good to actually see the arguments against FTP at least.
It's hard to find out from google, because they claim I want to look for "ftp" rather than "ftps" and they're the same thing :/
I would tend to dismiss this kind of articles and suggestions even if they are OK - only because they promote by appeal to a fashion.
Thanks for the reminder and for the clear explanation of how git deploy might work!
On the other hand, if your site is just static (HTML/JS) files, I think it makes great sense to use Git to deploy, as there is no configuration to worry about.
Use a heredoc http://tldp.org/LDP/abs/html/here-docs.html
git --bare init