Every time someone comes up it with a "cool" way to deploy via Git, it turns out their definition of "deployment" is nothing more than dumping something on a server. It completely ignores every aspect of deployment other than putting the files somewhere else, which is 5% of the work and 1% of the complexity of deployment.
That's not even a sufficient professional scenario for anything but the simplest of non-essential website. Hell, this won't even do for the simplest of WordPress sites.
I use Transmit to FTP updated files to my web server. Then I send smoke signals to my colleagues to inform them of the changes, and document the updates on a papyrus using quill and ink. Should I be doing it differently?
I actually assumed "smoke signals" was some sort of Mac app that I didn't know about (it actually sounds like it could be a 37 Signals product). I even wondered if "papyrus" was a team-management wiki or the like. Then I got to "quill and ink" and realized you were joking.
Parchment only.
Papyrus has a standing export restriction from the issuing superpower (Egypt circa 200BC), so if it becomes a superpower again you could be extradited for disobeying the law of a that foreign country that attempted to force an unfair business advantage to the rest of the world.
The commonalities between SFTP and FTP end with the three letters in the acronyms. SFTP, unlike FTP with its passive mode and whatnot, is actually a decent protocol.
rickmb did not come even remotely close to specifying on which grounds he was asking his question. He might be a proponent of "if it's not automated [via capistrano/fabric/etc...] it's not deployed" for all I know.
More complex deployment processes can be kicked off in the post-receive hook.
I think it's useful to have deployment integrated with your SCM. You have a complete history of deployments. You can easily roll back to any version. You can deploy from any machine which has your SCM tool installed.
You can deploy Tomcat apps using a post-commit hook as well. Add your server as a non-origin remote, and it's a quick way to update a Clojure/ring site. For added predictabIlity, you can deploy a specific branch.
This bit has completely lost me. Why on earth would you want to do non-trivial munging in a post-commit hook instead of just writing and running a "deploy" script like everyone else does. You can run that over ssh, even, preserving the "fire once" behavior of doing a git push from the build system.
Copying a .war to /var/lib/tomcat/webapps and restarting tomcat is non-trivial? Perhaps your deploy script is more complicated than it should be. I don't understand your distaste with deploying via git over ssh instead of bash over ssh.
You haven't needed to restart Tomcat since at least Tomcat 5.5, if not earlier. Tomcat automatically polls for changes to the WAR file and updates accordingly.
That's the theory, but I've found many reasons to restart that I just do it automatically these days. The reasons are not limited to stupid webapps, jni, log4j stuff that outlives a deployment and ends up consuming all of PermGen.
An interesting point. I get the part about "stupid webapps" (e.g. loading a configuration file once and never checking it again), but I find the PermGen part surprising. Would you care to provide more detail? I thought that PermGen was garbage collected in modern JVMs; furthermore, wouldn't the new version of the application still use (roughly) the same classes/methods and thus (roughly) the same space?
This happens because Tomcat is unable to release some classes from the classloader. So every time I load a new WAR lets say I am using Spring. Tomcat doesn't remove the old Spring Jar from the classloader and just loads a second one for my updated WAR. After doing this a few times (Depending on the size of your app, permgen size and how many 3rd party libraries you have) tomcat will throw a permgen error. There is nothing Tomcat can do about this. It is the classloaders problem. Also it is not recommended to "hot deploy" to a production environment.
I've done some more reading on this issue, and I don't think what you are saying is correct.
First, class loading is more granular than JAR files. From what I can tell, the JARs are just repositories (if you will) for the actual class data. Once a class has been loaded from a JAR, it is distinct from it, and there is no need to keep the whole JAR in memory.
Second, there are many class loaders at play, not just one. At the least, there will be one class loader for Tomcat and one class loader for each web app. This is necessary to keep web apps from breaking each other (and Tomcat!) by loading incompatible versions of classes.
Third, once a class is no longer used (i.e. there exist no objects that are instances of it) it can be garbage collected, and will be if more PermGen space is needed.
Fourth, once a web app is unloaded, its class loader is no longer needed, and it too can be garbage collected (along with any memory it was using to store the aforementioned JAR files, although presumably that memory would be outside of PermGen).
Taking all of this into account, one might (like me) naively presume that once you unload a web app, all of its objects will no longer be referenced, and this will propagate up the tree of references (servlet -> object -> ... -> object -> class) until nothing from that app remains.
HOWEVER, this assumes that references exist in a tree. A singleton class would cause problems: it stores a reference to an object of its own type. This creates a circular reference pattern (class X -> object -> class X) and keeps both class and object in memory.
A possible solution might be to associate each class with the web app that loaded it, then when you unload a web app, you can walk through all of the classes it loaded and null out all of their static fields. This will allow the referenced objects to be garbage collected, and in turn, the classes of those objects.
But this is probably much easier said than done (it may even be impossible due to limitations on the JVM), and there are probably other issues I'm not even aware of and thus not considering. Never mind JNI, which just adds even more fun to the mix.
So ultimately I must agree with the recommendation to restart Tomcat when you're deploying web apps. From a developer perspective, I would also recommend eliminating static data in your libraries and web apps as much as possible.
A year and a half ago I worked at a place that had a Deployment Spreadsheet.
Each file was carefully FTP'd from the shared development area to the live site, and then marked off on the sheet with the date that it had been deployed, and with what feature.
Each new deployment meant either adding a new set of rows to the spreadsheet for a new feature, or for updates to existing features going back and finding an earlier instance of the file in the list and bumping the date on that.
Because of the shared nature of the CMS across different websites, we couldn't svn up in the CMS directory without destroying the current known state of the site. We also couldn't svn up outside of the subdirectory we were working on, as everyone else working on that site was working off the same network share. Everything was done very gingerly.
The day one of the developers came up with a method to take a list of files from the spreadsheet, tar them up and FTP them to the server for later extraction, he was overjoyed and very, very proud.
I got out of there, swore off PHP and its community entirely. I got tired of checking for a baseline of automated deployment, automated testing, version control, and individual dev systems at every single job I applied for; Ruby people don't look at you funny when you mention these things.
Addendum: I personally know a developer currently working for a famous local radio brand who directly works on the live website network via an FTP plugin. He has to contend with other developers dumping the contents of archives directly into his public folder.
FWIW, this is essentially how Oracle ships updates for their enterprise business systems: you get a tar file and a README telling you which files get copied to which spots on which servers.
That shitty deployment experience has nothing to do with PHP. There are plenty of valid reasons to move to Python or Ruby, but better deployment stories aren't one of them. Objectivity seems to be in short supply here on HN.
Isn't the ecosystem always trotted out while defending PHP, as well as its 'accessibility' to amateurs? Well, those amateurs are part of the PHP ecosystem, and you'll have to put up with that... or not.
heretohelp made the point better than I did, but I explained my thoughts on that with this sentence:
I got tired of checking for a baseline of automated deployment, automated testing, version control, and individual dev systems at every single job I applied for; Ruby people don't look at you funny when you mention these things.
It's about expectations. I tried really hard to find a place that had all these in the PHP world. In the Ruby world (for example), it's a given.
(I have only ever encountered one PHP shop during my rounds interviews that did automated testing... But they used CVS for version control. The very first Ruby shop I interviewed had all of the above, and treated it like they were no big issue.)
* Track changes to the repository as a whole, instead of per file. I care most about stepping through revisions of a project; inspection of the history of a particular file is a secondary concern for me, but is all that CVS can do. A "project" for me is usually a set of files that are interdependent to greater and lesser degrees.
The best you get with CVS for tracking the versions of all the files in a repository is manually creating tags all the time. With Git, Subversion, Mercurial, Bazaar, etc you get this for free; it's how these tools work.
(As an aside, DOS is also a proven operating system. For varying values of "proven".)
To be honest, most of them do so because it's the only option present on most shared hosting platforms. (S)FTP is for most of us the only option when deploying a simple site to the client's hosting.
Ofcourse it would be cool if all of our clients would host on a PAAS platform like Orchestra or PHPFog but that's not the case by a long shot i'm afraid.
I am currently trying to figure out a more clever way to deploy my code. I would love if you would share some thoughts on some general strategy and possible tools. I am using jenkins and git.
I use Capistrano to deploy PHP web apps and I love it. It basically checkouts stuff from the git repo, minifies and bundles the JS files, uploads the assets to the CDN and tags the repo with the version string. Though Capistrano has been traditionally a ruby deployment tool, it works well for any type of deployment. You just need to override or not use some of the Ruby-specific stuff. Deploying to multiple servers is ridiculously easy with Capistrano.
I couldn't agree more. Deployment via SCM tool is a terrible idea. The first thing to do when starting a project, no matter how small, is to set up a continuous integration server with a clear build and deploy process.
I deploy via git with a more complex post-update hook that runs tests before pushing into the production repo and all the other commands needed for my Django setup.
Err, what about rsync? My blog is versioned with git, but compiled with USSM¹. I don't want to version the generated html code, so I deploy it with a one line shell script that just calls rsync through ssh. Simple, and fast.
So in my case, rsync looks better than git at deployment. I can't imagine a scenario where the reverse is true. Does someone have a counterexample?
This tutorial is missing a lot of steps. It therefore expects the user to know how to use Git in order to consistently commit and push their changes under such a scheme. If the reader is a self-sufficient Git user (i.e. doesn't need a full step-by-step walkthrough), they wouldn't need this tutorial.
Anyhow, this scheme sort of old hat for Git users. Most of us require a build to deploy code, which generally results in SCM-polling continuous integration solutions like Jenkins to fully automate the preparations required to run new code.
This blog post sort of suggests a WordPress crowd's version of continuous integration. Go ahead and use it for a blog theme, or a set of assets. If you're doing this for a web application, you're probably doing something wrong.
Hello, I am learning git. Even if this tutorial does not comply to the rules of the art or omit basic git knowledge, it fits exactly my actual learning level. I have read the most basic tutorials and this kind of "almost second" level tutorial is perfect to have a better understanding of how pieces fit together.
The very first sentence is why Git is not the new FTP:
> First, create a directory on your server and initialize an empty git repository.
The number of people who deploy sites via FTP who also have SSH access and the ability to install Git on their hosting is small. Most people deploying to the web do not control their own servers.
I almost got kicked off of my old shared host for installing Java on my machine after I faxed them a copy of my driver's license to get SSH access. Got myself a shiny new VPS a few months ago and haven't looked back. I can finally do things I could only dream about with the shared host.
Unfortunately, this doesn't take into account the fact that stuff you keep in your development repository is almost never identical to stuff you want on your deployment server. There are files you keep in the former but not necessarily want/need in the latter (unit tests, documentation, etc.). Similarly, there are stuff to be put on deployment server which don't want to clutter your dev repository, like compiled CSS/JS assets.
As other have said, deployment should really be done by continuous integration server, and preferably using a protocol that's designed for transmitting files, not version control.
Wow, my coderwall post got posted to HN? That's pretty cool! Interesting title choice, though... FTP will still be useful for a long time.
I didn't really write this as a tutorial as much as an easy way for me to look back and find the commands when I have a forgetful moment. That's why it doesn't go into explaining how or why things work, explaining how to set up git, explaining how to set up your web server to serve files from a directory, or other preliminary information.
Just a bit feedback on the blog: The background image was very distracting for me. My eyes were always wandering off from the text. Not sure if that's just me or if others had that problem too.
Coderwall forces you to upload an image, then it "hipsterizes" it (their words, not mine). From what I can tell, this involves significantly reducing the size of the image, increasing the saturation and local contrast, and then stretching and blurring it.
As a photographer, this is quite annoying, because it's hard to predict what something will look like. I'll try to find a photograph of mine that'll stand up to being "hipsterized" better.
Had similar problem with a site called TVCatchup. They stream free TV in the UK. Their player sits in the middle of the page and that is surrounded by an advertising border. Nothing else on the page of consequence. What I found was that I found it very hard to focus on the player in the middle. Often I would get a headache watching. Told the admins, who didn't want to know. Basically I got an angry "Foxtrot Oscar". Ended up using an ad blocker until they found a way to defeat that. Now I don't use the site any more.
Completely off topic, but TVCatchup provide an XBMC plugin which completely avoids the advertising they add. No idea why, since it presumably destroys their business model, but it's great.
I just found out recently that SCP is horribly slow when transferring lots of small files. I found a blog post showing how to send compressed tarballs over SSH and it worked really well:
I know it's not ftp over ssh. I meant what I said in the context of the original comment; if ssh is the ftp replacement, then there is an application which transfers files over ssh with commands similar to those of ftp, which is sort of a compatibility mode to help ftp users migrate to ssh. (Although it is a new protocol from the ground up, it is usually used as a subsystem of ssh, thus using the same passwords and authentications and stuff.)
This is clever, but the idea of serving a website directly from a repository makes me very uncomfortable. If you already have git and ssh access on a server, then you probably have rsync access, as well. I simply include a Makefile in my repo with a publish target that runs a command like this:
This way I'm only putting the necessary files on the server and transferring changes is incredibly fast and secure (ssh is used by default).
In rare cases where I need to deploy from a repo on the server, I'll clone it to the remote host and use rsync locally to copy the files. But I still avoid serving directly out of the repo. There's too much sensitive information in version control to even risk exposing it during a misconfiguration.
I'm curious what kind of sensitive information you're concerned about exposing. I don't mind deploying with rsync--in fact, I'll do it quite a lot when the remote host lacks git--but I'd usually prefer to use git where possible.
I typically deploy with a similar setup that you described, but instead of rsync'ing the files locally I just make a clone of the repo and serve that. Though it is another step to fetch/merge, it stops me from losing any changes that someone did to production without telling me.
I only recently used Capistrano to deploy a project and it was very satisfying, so I'm gravitating towards that as my default deployment method.
> I'm curious what kind of sensitive information you're concerned about exposing
+1. Secrets should not be in your VCS repo. I'd guess the parent is talking about user/pass or pub/priv key creds.
I really wish "we" had better tools for passing around config, and secrets in particular. Chefs data bags are close, but I still don't want the master knowing my secrets.
In my node.js application I use the excellent Up[1] library along with the Up-Hook[2] plugin.
This allows me to update my code locally, push to github and have my server automatically download the changes (using girror[3]) and update the code, all with zero downtime.
What I like most about it is that it's all nicely contained inside the same code as my application!
I like this girror thing, it looks like it will fit my non-node projects much better than Capistrano's awkward git integration. (For some reason you linked to a fork, but it's no longer current)
You can have git separate the .git directory and the working directory by setting GIT_WORK_TREE and GIT_DIR in the post-receive hook. I usually put the git directories in /var/local/git, and I have some scripts to automate everything, but essentially I use this mechanism for certain projects (and for testing).
Surely the entire project doesn't consist of world-readable files? There's likely to be a htdocs or wwwroot subdirectory in there alongside .git that apache/nginx point to. Then, you have an extra level of breathing room for other files for the site (config files, session data, user uploads, ...)
When git software is easy for ordinary people to use and install on windows, easy enough that people are using it to share porn and pirated material, then maybe it will be like ftp.
As it is to run git on windows you, best case, have to install a lightweight nix OS. Cygwin mingw,whatever. I think we techies and power users don't realize how messed up that is if you want git to be more than a program for programmers. Ftp was a protocol, you could make a native client for everything. Git is a great tool, but not yet comparable to ftp.
Once, I had to maintain a website on godaddy, where I only had ftp access. I hated it. We would have interns working on it, and giving them the ftp password sucked.
Having git as a proxy, hooked to sync master on ~/www and staging on ~/www/staging was awesome.
If you want it: https://github.com/ezyang/git-ftp
If you do this make sure you set the permissions such that it is not possible to make changes to the files on the server independently of the remote otherwise next time you push you’ll end up with conflicts and, if you’re using an interpreted language, the version control markup that indicates where the conflicts are in files will break your code and bring down your website.
We use TeamCity and deploy right from our build server. We have a build configuration called "Deploy" that requires a couple parameters - the artifact to deploy, the environment to deploy to, and the build number (so we can deploy past builds, ie roll back). Our application software is built of a series of distributed .war and .jar files, but each represent a complete "service" within our system, and can be upgraded independently. This means we can deploy single artifacts and not cause conflicts within the overall system.
When we click the build button, a small Python script is called that rsyncs over ssh the already built artifact (pulled in from the existing TC build) over to the environment, which runs JBoss, and gets auto-deployed.
For our developers, our build process amounts to committing their code, getting an email from TeamCity when the build completes, then clicking the Deploy button. The tools take care of the rest, and we end up with a nice trail of logs available in case anything goes wrong, and a clear rollback strategy.
We also don't get an ever-growing git repository, or have to maintain a bunch of FTP configuration. Everything is done over ssh via passwordless pki. Best of all, it didn't take long at all to set up (the longest part was probably writing the python deployment script, which has some basic logic to know which environments live at which addresses, and also how to run some post-deploy commands for the 1 non-JBoss deployed .jar based service we have).
It's not perfect - we don't have an automated database migration mechanism right now, though in a distributed system like ours I'm not entirely sure one would work at this point (and we've got other plans to deal with that weakness anyway). I'd also like to move to Hudson or Jenkins at some point, as there is a real attraction to the rich ecosystem there (TeamCity is a commercial product).
One problem with this technique is that over time you will end up with more and more data on the production server, even stuff that has since been deleted. Even for a small website over the course of a few years can end up being GBs of wasted space which on production VPSs can be quite expensive.
Also, if your production server is breached, attackers can mine that data, for example for that accidentally checked-in password, or to search for security fixes that other sites might not have applied.
Using post-commit hooks to handle your deployment for you can have undesired effects if your scripts aren't perfect.
I would recommend looking into continuous integration over this method. I've been using TeamCity for just over a year and it is fantastic for a staging environment.
Reverting a push should be at least as fast and easy as the push itself. This minimizes downtime if you introduce a new bug. (Which I know you, the reader, would never do, but other people might, so this is for them.)
Using this workflow, if you break your site, reverting would require either a) fixing the bug and then doing another push or b) looking up and typing in the hash of the last known good commit. Neither of these are as fast and easy as "git push".
Our solution is to deploy with tags. Every push to production gets its own tag, and that is what we checkout into production. If the site breaks, you just checkout the previous tag name and you're reverted. This way reverting is exactly as easy and fast as pushing new code.
You could even streamline it a bit, by having a 'previous' branch that you automatically sync with the previous tag on deploy. So if something goes wrong you can always just `git checkout previous`.
Best move I did for a past client was getting them to deploy using Assembla's FTP tool. There is essentially a post-commit hook in their SVN repo that automatically pushes to the test environment on every commit. Deployments to prod use the same tool, but it is manually triggered rather than on every commit. (Competing for best move, was getting them to have separate test and prod environments.)
If you (or an unfortunate client) are using a cheap shared host that only provides you with FTP, this approach at least guarantees that the only code on the site is what exists in your VCS.
I almost never FTP stuff except when I have to get a single, unimportant file somewhere or if we're sharing stuff between friends' houses.
Springloops has a useful deploy-by-ftp feature. When you commit a change using their Subversion host it automatically pushes it out to the development or production server.
It's easy for designers to use since they can update a jpg or two for a client, commit a change, then hit refresh on the live site and it shows up.
I know it's not perfect, but for small web shops and non-technical people, it works great.
As someone who uses FTP (it's all I've ever known, at work I just check my work into CVS and I have 0 idea how the build process works), what would be a better way to deploy a website?
It depends on the site, and the stack you are using to manage it, but ideally you would have the following features.
1. All assets are under version control.
2. Committing code of any sort (HTML, CSS, Javascript, Ruby, Python, PHP, ...etc) triggers the unit tests and a failure of the unit tests prevents the commit from completing.
#unit tests should take n < 3 seconds
3. Pushing the code to the deploy repo (or branch) triggers the acceptance tests (behavior driven tests that look at the functionality of the site overall, account creation, login, profile_view, checkout) failure of the acceptance tests sends an email to the whole team. Code changes affecting security sensitive modules should be considered failing until they have been manually qualified (e.g. anything touching passwords or money).
4. If the acceptance tests pass and no critical modules were affected by the commit, that commit gets deployed into the production environment. That fact is recorded on the dashboard and emailed to the team.
This is difficult to achieve but once reached, it's like being over the step on a boat that hydroplanes; friction is reduced and you get greater velocity for the power expended.
I built a system to deploy to our test servers that uses a git repository on the test server, clones the repo into a named folder, and checks out a specific branch. QA is able to use a webpage to select a branch, which is automatically checked out in its own folder and a subdomain or cookie is used to do isolated testing of that branch.
So for us, git is an important part of our deployment process, though there's more that goes on especially for production deployments.
I agree the blog post should warn about it - but it can be perfectly secure if you do it correctly. I usually put all public files inside a directory inside the repo, e.g. public_html, and/or block access to dot-files altogether in the server configuration. E.g. in Apache 2:
<Files ~ "^\.">
Order allow,deny
Deny from all
</Files>
I'd question the assumption that ftp was ever the old ftp, if by "ftp" we mean "deployment channel to production services". As soon as your web site is complicated enough to need a database, you have database migrations or their moral equivalent to worry about as part of the upgrade process, an simply dumping files in a directory somewhere doesn't cut it
I've been looking into git (or dvcs generally) for deployment and have realised that deployment is like the inverse (dare I say dual :) of normal version control.
vc: metadata (permissions) are irrelevant.
dep: metadata is critical
vc: version everything by default (explicitly exclude)
dep: version by explicit inclusion
I've have a doc with numerous links and notes that I should share sometime.
git is closer to rsync than to ftp. scp and ftp are file transfer protocols, while git and rsync are higher-level systems that can use other underlying file transfer protocols
I feel, Your exact method might be useful for Test servers. I won't want all my git pushes reflect on the live server. For deployment, I think it's better if one maintains the post-commit-hook to a certain branch, named, say, "Deploy". Any push to Deploy branch would make it deploy maybe?
That sounds exactly like a system I set up with Jenkins a while back. It would deploy to a test server and run the tests from any branch. If you pushed to the deploy branch, it would deploy to the real server assuming all the tests passed.
The actual deploying and testing was all done with Fabric, but the idea was the same.
I think that's a pretty good case for using a CI system like Jenkins, especially because it's really easy to set up. It didn't support Git by default, but there was a plugin that was both easy to install and fairly comprehensive.
I think posts like this tells you only part of the story. Ops it is not just pushing code live.
I'd be great they explain how to know which code is live (ie. tag "releases"), how to revert to a previous version (ie. something went wrong), potentially something needs to be restarted, etc.
Seriously, try it. History expansion works on input lines separated by the user asking the shell to run them, it can't cross reference command arguments on the same command line.
I have tried both, the difference is only that !$ echos the expanded command, while $_ simply runs the expanded command; both create a directory and cd into it.
I could do automated deployments via FTP (I wrote scripts to upload certain parts of a project during the build process) with a cron job on the server. Just check for changes and do the real "deployment" stuff on the server itself.
There was a post yesterday about deploying small sites and I mentioned that svn export was the way to go, not thinking that there really is no simple git equivalent.
You can do a one-liner with git archive but I don't think it's as easy.
While people are not too hot on the idea of pushing the git codeline to the production server, the same idea is actually pretty good for kicking off a build or running test suits after a git push.
afaik serving websites from your home dir is a bit odd, since the user running nginx/apache shouldn't have access to it, something like /var/www is way more common.
and asuming you don't just use git for deployment but also for it's core purpose version control you should setup the origin to the actual origin and name the remote where you push to accordingly (production, dev, test, staging) and you shouldn't set it as the default push, to avoid excedently deploying when you just wanted to checkin the code to the origin repo!
saying that git is the new ftp is to my mind not correct.
I read that git doesnt handle big files very well the coder from the backup tool "bup" has written. That was why he invented bup.
Correct me if I am wrong, I would really love to replace ftp with git.
That's not even a sufficient professional scenario for anything but the simplest of non-essential website. Hell, this won't even do for the simplest of WordPress sites.
BTW, who on earth still uses FTP?