How to Deploy Software

lucaspiller · on March 1, 2016

One thing this glances over is that you should have something monitoring your production systems to make sure that they are running correctly.

To start with get something to monitor errors/exceptions and email you. To name a few services:

https://airbrake.io/

https://rollbar.com/

https://github.com/errbit/errbit (can be hosted on Heroku for free)

Also make sure that you have accessible logs that log useful information (timestamps, the user making the request, unique request ID). Then use syslog or a SaaS service to aggregate logs from all servers in one place, and keep them for as long as you can.

ultramancool · on March 1, 2016

Great point - a small note though, for those of us who don't like using other people's hosted services and have some of our own infrastructure to host - use Zabbix, Nagios, Cacti, Spiceworks, Munin, PRTG, etc. Lots of nice options. I know everyone loves cloud services these days, but I feel absolutely no temptation to put any of my business infrastructure in other people's hands when I don't absolutely have to.

I use a combinations of Zabbix and a very agressive Smokeping mesh deployed with docker (10 pings per minute from all separate DCs all other DCs) to monitor our worldwide nodes and be able to account for routing problems on the backbone rather than our nodes themselves. Very handy tool, particularly if you work with latency sensitive applications. I've been surprised some of the datacenters we use appear not to have access to similar...

vacri · on March 1, 2016

I wouldn't describe Nagios a 'nice' option. Every time I have to install or reconfigure it, it opens the wounds afresh (and I only have a relatively small deployment) :)

scurvy · on March 2, 2016

Hard to have a discussion about open source monitoring without mentioning OpenNMS. Best monitoring back end by far.

tachion · on March 2, 2016

Go for Sensu, you wont look back and it has terapeutic powers, when it comes to monitoring ;)

mwarkentin · on March 1, 2016

https://getsentry.com/ is another option (my favourite) for exception monitoring.

js2 · on March 1, 2016

It's also open-source https://github.com/getsentry/sentry

sytse · on March 1, 2016

We use it for GitLab.com and love Sentry.

bbrazil · on March 1, 2016

One of my clients got Graylog up and running pretty quickly.

Logs are only one part of monitoring though, and it's easy to miss the wood for the trees if you're drinking from the logs firehose. Some metrics monitoring is also vital, naturally I recommend http://prometheus.io as I'm one of the core developers :) https://blog.raintank.io/logs-and-metrics-and-graphs-oh-my/ discusses this a bit more.

akurtzhs · on March 2, 2016

I've used https://opbeat.com/ for a small side project and been fairly happy - they track exceptions, releases and performance.

sync · on March 1, 2016

Another error tracking system for your list: https://bugsnag.com/

It has been rock solid for us with no event limits and cheap pricing.

(not an employee, just a fan)

maslam · on March 2, 2016

Same here, we just started using it and are happy with it!

Mandatum · on March 1, 2016

The ELK stack is also good but will require some config - you'll need to pay for a subscription for alerts however there are free alternatives like Yelp's ElastAlert on Github.

orthecreedence · on March 2, 2016

Personally I found the EHK (ES, Heka, Kibana) stack much nicer than ELK. I especially like that each node can run Heka locally and ship directly to ES without having some intermediary server we have to maintain. On top of that, Heka is great for general event streams and can grow quite a bit more into your infrastructure as a general tool than logstash (at least last time I used logstash, which was a few years ago). Heka can also monitor and send you alerts via email ("Hey this server got 3 exceptions in the past 5 minutes!") on certain conditions.

We do use ES as a core part of our infrastructure, so our only real barrier was setting up Heka.

rmc · on March 2, 2016

> keep them for as long as you can.

Careful, you can run into Data Protection legal issues with that approach.

JupiterMoon · on March 2, 2016

Data is a liability as well as an asset.

rmc · on March 2, 2016

The operator of Pinboard says that companies should treat personal data like radioactive waste. You should not keep it around, or keep as little as is required/possible, and keep it secure and away from everything else.

It's a great way to think of it.,

blister · on March 1, 2016

Shameless plug, but my co-founder and I are about to launch our powerful logging service this month. One of our features is session-specific logs and the ability to send alerts at user-defined thresholds.

We'd love to get some feedback on our product when we launch to beta, so if you're interested, please sign up for our email list at http://logdebug.com

mholt · on March 2, 2016

Your website is still new, I see, so I imagine it will be more explanatory about how your service works. ;)

One thing about the plans that made me cringe was that "SSL" encryption was not available to the Startup plan. I feel like we're entering an era where HTTPS isn't really something that should be offered as a perk, ya know?

joshmanders · on March 1, 2016

One suggestion is remove the "Most Popular" banner above Small Business on your pricing. Being as your site is plastered with under construction, it's a turnoff seeing such false advertisement.

blister · on March 2, 2016

Sure, good tip.

Though the pedantic butt-head in me wants to point out that out of all of our paying customers, they're all using that level of service. :) So technically...

johns · on March 2, 2016

So keep it. Ditch the under construction stuff instead if you have customers.

AznHisoka · on March 2, 2016

Let us know when it's actually out. This is a big pain of mine right now but frustrating I have to sign up to be notified. That pain might cease to exist when you're actually ready

brightball · on March 1, 2016

I'll vouch for Airbrake. It does an excellent job of identifying patterns and grouping similar errors, tracking resolution across deploys, recording request context to help you figure out exactly what was going on while avoiding spamming you with error messages.

I haven't had much of a need to look at logs since I started using Airbrake three years ago. For the price, it's really hard to beat the value when you compare the price to various log aggregation systems.

jly · on March 1, 2016

A great list of resource - glad I ran into this post! I have rolled my own log scraping and exception monitoring and notification, but some of services do a lot more.

I have been using https://www.pingdom.com for uptime monitoring and some basic perf/latency tracking. It works well and doesn't cost a lot, plus it can hit my services from multiple regions.

fizerkhan · on March 2, 2016

Let me start with a disclaimer that I work for Atatus. Front-end monitoring is the other end of the circle, that complements production monitoring. With Atatus that is what we have done. We monitoring the front-end for both errors and performance issues - https://www.atatus.com/

kuschku · on March 1, 2016

> timestamps, the user making the request, unique request ID […] keep them for as long as you can

That's a sure way to go to jail.

Don't store which IP requested what for more than necessary — in many countries, there are limits if a few months on how long you may store metadata.

teraflop · on March 1, 2016

Can you elaborate? I find this incredibly hard to believe.

If it's true, you'd think it would be widely-known and publicized in those countries (whichever ones they are) but I can't find any reference to such a policy existing anywhere.

detaro · on March 2, 2016

Germany (and probably other countries n the EU as well) considers IP addresses PII, and you have to have consent to store it. Short-term logs might be justifiable as technical necessity (but the lawyers I've heard talk about it didn't want to put a "safe" time frame on that), but certainly not "keep as long as possible".

That's the reason Google Analytics, Piwik and others have an option to blank out parts of the IP address.

Jail is a bit dramatic though (at least here), unless you do something bad with them fines + maybe damages would be the maximum, and it is quite unlikely you'll actually get hit. But if you are ever in a situation where this would come to light (= already in legal trouble), it could make things a lot more complicated, especially if you process data for other companies.

rmc · on March 2, 2016

It falls under personal data about someone. Many (but not all) EU countries consider IP addresses to be personal information, and hence "Store it forever for no real reason" is illegal. You need to have a legitimate, legal, reason to store PI, you need the person's unambiguos freely given consent, etc.

There is currently a court case as to whether dynamic IP addresses from ISPs counts as personal information. But static IP address probably count. And all the other information (user X logged in at time Y and send message to other user Z) would be PI too.

angelbob · on March 2, 2016

It's commonly known in business. I've heard of it repeatedly, usually in terms of "here's what our retention policy has to look like for European customers."

15155 · on March 2, 2016

> That's a sure way to go to jail.

Not in (any of) the United States.

scurvy · on March 2, 2016

Reminds me of that scene in Snatch where Dennis Farina says "yeah, don't go to London."

Don't make any plans to have a physical EU footprint and you can care less what the EU thinks of your internal policies. EU customers can take it or leave it. Their choice. It's not a warm stance to take, but you've probably got a dozen other balls in the air that you're juggling. EU data compliance probably isn't one your worried about as a small startup.

kuschku · on March 2, 2016

On the other hand, if you comply with EU data policies, your customers might choose your app over another app that doesn’t

(especially european users, and businesses)

scurvy · on March 3, 2016

Of course. Such are the decisions and tradeoffs that any young startup needs to make. That's why I said that they could take it or leave it.

scurvy · on March 3, 2016

Too late to edit, but the actual quote was, "Yeah, don't go to England."

rmc · on March 2, 2016

And that's why Safe Harbour was found illegal, and why it's probably illegal for EU companies to use those US services.

matt4077 · on March 2, 2016

It could still be good practice to minimize all data when it doesn't serve a purpose, and I can't think of a reason not to delete the last byte of an IP address (except possibly in an internal setting where all clients are from the same subnet).

Personally, I don't see much of a problem when my IP is stored somewhere since you'd need a reference, but others have apparently thought more about it and want it to be anonymized.

(I've now come up with scenario: If you have access to multiple such sets of data, you could connect accounts from different systems. I. e. when the youporn logs leak, someone could tie the data to your HN account etc.)

kuschku · on March 4, 2016

> (I've now come up with scenario: If you have access to multiple such sets of data, you could connect accounts from different systems. I. e. when the youporn logs leak, someone could tie the data to your HN account etc.)

Combine that with leaks of a social network where you have real names and IP addresses, and you could actually identify the name, job, address, etc of everyone who watched a specific clip of porn.

That’s very bad.

Metadata. Not even once.

scurvy · on March 2, 2016

syslog and SEC is a good place to start before deciding to ship all your logdata to a 3rd party.

nicko84 · on March 1, 2016

[flagged]

detaro · on March 1, 2016

You are a raygun employee? If yes, please disclose that if you "recommend" a product.

michaelmior · on March 1, 2016

Same username, Technical Marketer at Raygun

https://growthhackers.com/members/nicko84

jhardy54 · on March 1, 2016

Ouch. Definitely staying the hell away from those people.

pc86 · on March 1, 2016

Agreed. It's pretty spammy otherwise, especially with one comment in nearly two years and just a few Raygun-specific posts.

kuschku · on March 1, 2016

You should see what the Nylas team does on here, or the ABP guys.

Lots of spam, never announcing they're promoting their own product (until you call them out on it, then they admit it, at least), and both like to lie.

paulstovell · on March 2, 2016

I really enjoyed this article. As an industry, when it comes to something essential like source control, we seem to have converged to a common set of practices and workflows. Deployment is arguably just as important, but I think the practices are very different on different teams. This article is like a more practical version of Continuous Delivery.

Three areas that I think would have been worth including:

1. Pre-production.

You deploy to test & other pre-prod environments more often than prod. They should use the same scripts/tools/processes as production deployments, only with different permissions.

2. Configuration.

Test and production environments will always have different config settings, so no team will ever be able to deploy to more than one environment without encountering this problem. I think there's still an open question around whether those configuration settings should live in the same source control as the code, in a different source control repository, or a dedicated system. Source control systems and sensitive values (passwords, API keys, etc.) don't always mix.

3. Build your binaries once.

The article is more focussed on dynamic languages, but for compiled languages, I think this is important. If you branch, compile, deploy to test, test it, get the all clear, then compile again and deploy to production, there's a lot of opportunity for differences between what you tested and what goes to production to sneak in.

In fact even for dynamic languages, this might be a valuable practice. What if the JS minifier on one build server is different to another, and the deployed script ends up being different in production to what was tested.

Disclaimer: I'm the founder of Octopus Deploy, and these practices might be biased towards enterprisey .NET/on-premises deployments rather than cloud hosted, dynamic language projects.

jmsb · on March 2, 2016

> 1. Pre-production

It interests me in how many combinations of pre-prod environments exist, as I (naively) expected this to be standardized, but it's not (e.g. https://en.wikipedia.org/wiki/Deployment_environment#Environ... ). I can see the need for two vs four environments for handling different stages of the life-cycle depending on business needs (and budget). Addressing how to choose the right set of environments for a given company/product would be useful.

I also want to say that your product is phenomenal and it has improved the quality of our shop's build and deploy pipeline by an order of magnitude. Thank you.

FourthProtocol · on March 2, 2016

1. Pre-production

Pre-production should test production in a non-production environment. Specifically, if a system connects to, say, SalesForce, and you've been using a local sandboxed instance of SalesForce in dev and UAT, the pre-prod should connect to the real, live SalesForce. I have a pre-prod environment for my current project, and it's useless because org policy states that only prod environments can connect to live instances of anything. Completely invalidates the reason for having pre-prod.

2. Configuration ...will always have different config settings...

Yes, aside from pre-prod. Pre-prod and prod MUST be mirror images, INCLUDING config settings. Identical. The same in every way except for host names.

UK-AL · on March 2, 2016

We use octopus deploy(Great software btw), most people use configuration stuff mostly for connection strings. Everything else is the same.

Cheezmeister · on March 2, 2016

> 3. Build your binaries once.

Assuming your builds are solidly reproducible like they should be, how do such differences "sneak in"?

Granted, truly reproducible builds in the first place are Really Hard(tm).

paulstovell · on March 2, 2016

We put an explanation here (hope it's OK to post the link):

https://octopus.com/blog/build-your-binaries-once

A real world example of this is that when .NET 4.5 shipped, the compilers in .NET 4.5 for 4.0 code produced different output than the .NET 4.0 compilers would have. So installing a system-level update on Windows on a build server would mean Test and Production got different results:

http://blog.marcgravell.com/2012/09/iterator-blocks-missing-...

Also, releases move through environments at different times:

Monday: 1.0 goes to Test

Tuesday: 1.1 goes to Test

Wednesday: 1.0 goes to Prod

Between 1.0 and 1.1, perhaps you updated Node, or went to Python3 - so your build server had to be updated. Reproducing Monday's build is going to be more difficult the more time that goes by.

As you said, truly reproducible builds are hard. Why not just zip the artifacts and use the same files, instead of rebuilding?

manojlds · on March 2, 2016

Docker for CI will help. Update your base image when you upgrade Node/Python.

kelvin0 · on March 1, 2016

In theory, feature flags seem like a good idea. Until you reach a point where too many flags become difficult to test in an exponential tree of combinations. Also, it demands tight discipline to make sure each new flag properly isolates some new feature ... Has someone had success with this idea, and be able to 'tame' an explosion of flags in their codebase? really curious

holman · on March 1, 2016

I can't speak to them now, but it happened a number of times while I was at GitHub. We'd have a number of things staff-shipped for weeks or months (occasionally a year!) and it was just... there... and nothing happened much to it.

As you might expect, it wasn't really the fault of feature flags but rather indicative of a process problem. We'd either have to prod the team or person in charge of it, verify that work was actively pushing things forward to ship, and so on.

At various times we'd have, for lack of a better phrase, the "No (wo)man", who would come in and say "no" to a lot of things. One of the best ways to achieve this is to see that nothing has happened on a feature flag for awhile and then send a pull request to remove the feature flag (and the feature entirely). This got people out of the woodwork who said waiiittttt a minute let me just finish that up, or if no one vehemently disagreed with the pull then you could actually just remove it entirely. But it did take some explicit reflection on whether the flags were defensible to remain flags.

yazaddaruvala · on March 2, 2016

On my team of about 10 devs, we create / destroy about 5-8 feature flags a month.

The only thing you have to remember is: Its either going to production, or its getting dropped.

There is an expiry date built into the feature flags framework (90 days), and either they need to be explicitly cleaned up and deprecated, or extended with a valid reason attached.

Also, we code review every commit. If any diff is coding around/through a feature flag, the first question is always: "Can you clean up that experiment first?"

ronnier · on March 2, 2016

Sounds very familiar. Are you in Seattle?

yazaddaruvala · on March 3, 2016

Something like that, yeah

dastbe · on March 1, 2016

Having worked with feature flags "at scale" (whatever that means :), here's some advice that I've learned from working with hundreds of developers across tens of teams and given to others. I'm leaving out experimental analysis (A/B testing, multi-armed bandit) because it just adds "...but consider how this affects your metrics".

  * Always be developing against the current running features. No brainer.
  * Design things so that they integrate feature flags, not work around them. This usually means pushing feature flag determination to more generic/common code.
  * Separate backend/frontend changes into separate feature flags when possible. Turn on backend changes early and often to better measure your feature's impact.
  * Give individual features their own flag, but also have a global flag that manages the entire experience. This makes it easier to manage your gradual dial up as well as shut off problematic features that would otherwise mess up the launch.
  * Be diligent about removing feature flags once they're turned on. Schedule it into sprint time, reward teams that remove them, make it a management mandate, whatever. Just get rid of them once they're no longer needed.
  * Invest in monitoring around your services that (ideally) can correlate failures with features. you should turn on features over the course of a few hours/days to mitigate customer impact in the event of failures and gain data about performance at 50/50.

I think the answer to your specific question of "testing every combination" is that you can't, easily. But by keeping the number of feature flags that are inactive low (< 150 is very liberal) for a given service, having everyone develop against the current running features + dev overrides, and using gradual dial up with integrated monitoring to catch poor interactions when the impact is small, you'll have mitigated a lot of your concerns.

swanson · on March 1, 2016

I've had good results with them. We never had more than 5-7 flags at any given time. We would flag things by environment (dev/staging/prod) and/or by user (everyone/beta users/internal team).

Flags were there because of two main reasons:

* Our app was deployed more frequently than a service we depended on (e.g. every commit for app, nightly for service in staging, every 2 weeks for service in prod, etc)

* We wanted to have it rolled out to a subset of users for testing

We did not run into exponential trees of combinations because we rarely had two different flags interacting. Maybe it was a happy accident of trying to make work parallelizable or maybe because our feature flags never lasted more than a month or so.

The code was intentionally dumb and the flags were stored directly in the source code (not in a database table or another config file or something). Simple, stupid calls to `FeatureFlipper.isFooReportEnabled()`. We did not test this class because each method was a simple boolean check that the current user appeared in a list or ENV != prod.

We stubbed out `FeatureFlipper` when using it throughout the app. Stub the feature to be enabled, check behavior. Repeat for disabled.

Most of the features were simply hidden at the view level. For users, if there is no button, it doesnt exist. We didn't particularly care if the user would "guess" the url of a feature flagged page -- not worth the effort.

Doing a "full rollout" of the feature was a non-event. Just delete the method on FeatureFlipper and go fix the compile errors :)

ktRolster · on March 1, 2016

I'm interested that you're having a problem with this, but you don't give much detail.

Why do you have so many feature flags at a single time? Do you have a team of 20-40 people or more? Are your features taking months to create? Are you trying to create a feature flag for every new line of code before deploying it?

A lot of times code can be deployed without using a feature flag, because it's small enough, maybe you are overlooking those cases?

kelvin0 · on March 1, 2016

Testing flag combinations for one. Also please re-read my post carefully. The details are there.

ktRolster · on March 1, 2016

  >Also please re-read my post carefully. The details are there

Clearly there is something wrong with your project that you aren't mentioning. Plenty of other projects do flags without problems. Your team has issues.

bhuga · on March 2, 2016

GitHub now has a build (we have about 12 builds per github push) that runs all the tests with all feature flags enabled. This doesn't hit problems that only appear in the matrix of some enabled, some disabled, but in practice has worked pretty well and found a few minor issues.

blowski · on March 1, 2016

> Deploying major new features to production should be as easy as starting a flamewar on Hacker News about spaces versus tabs

Great writing. Spaces all the way.

Bognar · on March 1, 2016

Elastic tabstops are clearly the best, but until we have good editor support I'm sticking with spaces.

http://nickgravgaard.com/elastic-tabstops/

csn · on March 2, 2016

For now, these decisions will be made for me by our gofmt overlords.

impostervt · on March 1, 2016

Seriously? Ugh. Tabs FTW.

BinaryIdiot · on March 1, 2016

I've actually thought about this a lot (for some reason).

Spaces are fixed. Tabs are adjustable via most user interfaces.

So, in theory, you'd think tabs are superior because everyone can have their own amount of spacing and that's that.

Therefore tabs > spaces

(I should probably leave now)

HeyImAlex · on March 2, 2016

Until you want to align code on non-tab boundaries.

BinaryIdiot · on March 2, 2016

True...damn it.

gauravagarwalr · on March 1, 2016

But guys, this is what is being proved. Stop. Don't. Come back.

felixmc · on March 1, 2016

Would someone mind explaining the argument for spaces over tabs? Possibly naive young dev here.

ktRolster · on March 1, 2016

How many spaces do you like to indent your code? Some people like 3 space indentations, some like 8 spaces, and some like 5. It's very personal (but anyone who doesn't use 3 is a reeky brazen-faced mammet).

A simple solution is to use tabs, because then every developer can set the tab distance however they like, and they are happy. It breaks down when you have code that is aligned beyond the indent, like this:

  if(a && b &&
     c && d &&
     e && f   ) {

In that case, increasing (or decreasing) the tab distance will ruin the intention.

The best solution is to use tabs to the point of indentation, and then spaces thereafter, but a lot of code editors don't support that, so in practice it's hard to implement, so people use spaces to preserve their formatting when it gets uploaded to github.

collyw · on March 2, 2016

Mixing sounds like the worst option.Its what has cause horrendous looking code in the past when you open someone elses code and they are mixed.

Python and spaces for me.

ktRolster · on March 4, 2016

  > Python and spaces for me.

Sounds like you are a very open-minded individual. Not.

collyw · on March 7, 2016

I say this from experience, not narrow mindedness.

Open a file where it is all spaces, and it looks the same on every machine. Open a file with a mix of spaces and tabs, and it often turns out an absolute mess.

pcora · on March 2, 2016

OMG! 3 spaces? EVEN number? 2 or 4. Never even.

ktRolster · on March 2, 2016

You sir, are a reeky, brazen faced mammet!

xorcist · on March 1, 2016

Short version: spaces doesn't break over time.

Longer version: First understand that there is no tabs vs. space. There is only tabs + spaces vs. only spaces. (Because not all indentation line up with tab spaces, and someone may wish to line up assignments, lists, etc.) Only one developer in the history of your project that uses another tab stop standard is then enough to mess up your indentation. And that's just one way to mess it up, in a sufficiently large project some creative developer will find another.

Spaces just work and ensures your guidelines are followed. The only downside is a few wasted bytes. It might have been a religious debate in a long distant past when someone actually counted bytes, but today it's mostly young developers who don't know better who engage in it (with a few exceptions). Linus uses spaces. OpenSSL changed to spaces as part of cleaning up their codebase. It's the default behaviour of GNU indent. It is a good idea.

cortesoft · on March 2, 2016

Right, but how many spaces?

collyw · on March 2, 2016

slavik81 · on March 1, 2016

Spaces always have the same width, so it's harder to screw up formatting and easier to enforce stuff like maximum line width.

Tabs width can be set to whatever you want, so everyone can use the spacing they prefer.

_vya7 · on March 1, 2016

Spaces work better with my preferred IDE setup. Therefore spaces are inherently and objectively superior to tabs.

andrezsanchez · on March 1, 2016

I personally like the idea of tabs better, but prefer spaces because it makes navigating the text less awkward when traversing tabs. It may sound stupid, but for my own stuff it's just more comfortable.

atonse · on March 1, 2016

I take umbrage!!! (oh wait, I am a spaces person too)

mwpmaybe · on March 1, 2016

Ah, but how many?

icpmacdo · on March 1, 2016

I like a mix of space's and tab's depending on if my line number is divisible by 3 or not.

blowski · on March 1, 2016

2 1/2. Obviously you're using an editor that can do half-spaces?

dogecoinbase · on March 1, 2016

Of course! I simply type one in with a half A-press.

walrus · on March 2, 2016

For those who don't know the reference: https://www.youtube.com/watch?v=kpk2tdsPh0A. It's definitely in the hacker spirit.

Zikes · on March 1, 2016

En Space: U+2002

McGlockenshire · on March 1, 2016

Not half, but thin. https://en.wikipedia.org/wiki/Thin_space

insulanian · on March 2, 2016

2 or 4

Karawebnetwork · on March 1, 2016

Tabs are obliviously superior since every developper can visually resize them to be as small or as big as they wish. Personally, I can't work with small indentation. My eyes have a hard time following long straight lines.

rb808 · on March 1, 2016

Does anyone else work on systems that take 3 hours to back up the DB, an hour to deploy, 1 hour to start up and a few hours for users to check out functionality before business opens on Monday? No to mention the federal regulations about what paperwork is required and who can even access production. Maybe I'm in the wrong site.

RandomChance · on March 1, 2016

Hah, we (hopefully!) get an update window every 18 months! And a staging a new production like test environment starts with another $100K invoice..

And thank goodness Microsoft hard-coded in a 10 hour delay to ensure the KDS root key of a domain is propagated, even if I create it before I create additional domain controllers...

Such fun!

tumba · on March 2, 2016

Off topic, but to create an immediately effective KDS root key, just set the effective time ten hours in the past. You can validate propagation by looking for the 4004 event in the KDS event log. This is probably not a good idea in production, but is useful when building/rebuilding a lab.

  Add-KdsRootKey –EffectiveTime ((get-date).addhours(-10))

See https://technet.microsoft.com/en-us/library/jj128430.aspx

henrikschroder · on March 2, 2016

Luxury!

jventura · on March 1, 2016

No federal regulations or 3 hours to backup the DB, but in my previous job we had to build a deb package on a build server, then copy the package to the repo server, and finally ssh into the test/preprod/prod server to do the deployment (using sudo apt-get update). And I didn't have access to the production servers..

brown9-2 · on March 2, 2016

All of that could be automated.

sz4kerto · on March 1, 2016

Hear, hear. I don't have any access to production. (Interestingly, that's just one more reason to fully automate stuff.)

ngrilly · on March 1, 2016

The post is interesting, but it doesn't mention two major difficulties: zero downtime deploys and database migrations.

holman · on March 1, 2016

Yeah; I tried to stay away from really low-level aspects (just because they're hard to generalize across languages), and also just because the damn thing was so long already, ha. :)

As far as database migrations are concerned, GitHub (and others, of course) takes the perspective of migrating before the code that uses those migrations go out. In other words, as @herge says in a sibling comment, the code that gets pushed needs to support two branches of code and two branches of data simultaneously. It's certainly some extra work (and can be pretty gnarly depending on the scope of the migration), but once you get to a certain point it's kind of the only way to do no-downtime migrations.

There's many possibilities to help with the actual migration process, depending on what database you're using. With MySQL, for example, you can do something like the process in lhm: https://github.com/soundcloud/lhm

Zero-downtime deploys aren't super difficult in Rubyland anymore (many have written how they achieve it in Unicorn, for example), although I'm not as familiar with how other folk do it across other languages and platforms these days.

heavenlyhash · on March 1, 2016

+1 -- treating the database and the backend as two separate services with their own release lifecycles and APIs is "obvious"...

...well, once someone says it out loud. :)

After that, of course it makes sense that they would have a need for an API compatibility window across at least two versions. It's exactly the same issues as supporting a backend and client side where you can't instantaneously force an update to all clients. With your DB, you're in control of the version, but you're certainly not in control of making it "instantaneously", so the same rules apply as when you're waiting for some curmudgeon user to update: API versioning and a support window.

Now, if only we had an automagic way to make it less painful for a project to support two different versions of an API from the same codebase....

herge · on March 1, 2016

We use django migrations, and I still haven't found a way to do zero downtime deploys and migrations short of doing the following:

If we are trying to deploy migration #1, first deploy a version B of the code that supports (but provides the same set of features as before) the db both after migration #1 and before migration #1 (maybe with the help of a feature flag that is set in the migration). Do the migration. The deploy a version C of the code which removed the feature check above. But all this requires 2 different versions of the code and a lot of process just to ship out one migration. It gets combinatorically worse if you have more than one migration to deploy.

jsmeaton · on March 2, 2016

(I think I'm just expanding on what you say you already do - but I'd already written it out by the time I realised that this was exactly what you are doing, so I'm going to just leave it).

If you can, always make migrations backwards compatible with the previous version of the code, so they don't need to be rolled back if the code needs to be rolled back. Having a good migration rollback procedure is nice too, but usually unnecessary if testing has gone well.

If you need to add a new field to the model then always create it as nullable, whether or not you use a default value. That'll allow this particular database migration to run on previous versions of the code. You can test this by generating the migration SQL and running it on the database for your test/dev environment which is running your current production code.

Once the new code and migration is running in production and you're satisfied with it, immediately create a new change which makes the previous migration more strict (remove nullable, use a data migration or default value for existing nulls). Test, stage, deploy that.

Having backwards compatible migrations leads into zero downtime deploys also. Once the data migration has run, your app servers can be running version X or X-1. Before you push new code to a node, do a graceful shutdown (allowing queued requests to complete) or remove the node from the cluster (haproxy socket api for example), update the code, bring the node back into the cluster.

Caveats:

- not all database migrations can be backward compatible, but most can be made so by breaking your change into two or more changes. First - make the change without strict integrity checks. Second, enforce integrity checks and provide defaults.

- zero downtime deploys requires multiple nodes (counter example would be welcome, but I can't think of one).

raziel2p · on March 1, 2016

I wish there was a way to "pause" incoming requests in web servers. Most deployments (migrations + code) take less than a few seconds, and I'd be fine with some users having to wait 2 seconds for a request to finish over their request hitting a 500 (due to inconsistent code/database) or 503 (putting the site into maintenance mode).

wvenable · on March 1, 2016

Usually we aren't deploying a schema change that's really huge so we just go for it and let the application crash for those users who happen to hit a place where the code/schema are out of sync.

Zero (no crash) downtime deployments seems like too much effort for too little gain.

AznHisoka · on March 2, 2016

Agreed. Seems like a case of optimization for technical completeness rather than a business need.

kl4m · on March 1, 2016

Basecamp apparently uses an nginx/openresty plugin to pauses requests during a deployment: https://github.com/basecamp/intermission

tpg · on March 1, 2016

haproxy allows re-dispatching failed requests some number of times. If you have an extremely brief outage due to a deploy, redispatching failed requests 3 times may be sufficient. I imagine other load balancers have similar functionality.

acdha · on March 1, 2016

One approach which can work pretty well in many apps is having something like a CDN or Varnish which can serve stale content when the backend is unreachable. That allows the code you need to bootstrap your app to be served as long as your edge cache is running and your JavaScript can some sort of retry+backoff for failed requests or even do things like check for image error states to trigger a reload.

sciurus · on March 1, 2016

Have your load balancer drain connections to a web server. When it has no request in progress, deploy to it. When that's done, move on the the next web server.

singlow · on March 1, 2016

This doesn't solve the problem when there is a DB migration.

sciurus · on March 2, 2016

This is covered by herge's parent comment. Deploy code that can work with either the old or new DB structure before you perform the migration.

netghost · on March 2, 2016

Folks from Braintree gave a talk about how they did this. They'd queue up all the incoming requests in Redis, then replay them once the site was back up.

duggan · on March 2, 2016

There is, sort of. If you're hosting on Linux, you can use iptables to artificially delay responses.

That and some other methods are described in a post on the Yelp engineering blog[1].

1: http://engineeringblog.yelp.com/2015/04/true-zero-downtime-h...

sciurus · on March 2, 2016

I don't think that's quite what raziel2p wants. Won't that still allow the web server to receive and process the request, but just delay it's response?

tmedbury · on March 1, 2016

Load Balancers.

bhuga · on March 2, 2016

> and a lot of process just to ship out one migration.

I would have agreed with you before working at GitHub, but the end result is that deploying is so easy that 3 deploys does not feel like "a lot of process." On most of our applications I can do this in 15 minutes or less.

mwarkentin · on March 1, 2016

Yep - unfortunately that seems to be the only way to handle it though.

mrighele · on March 1, 2016

For the part regarding database migrations, you can find here [1] an interesting podcast about SQL database (schema) evolution. Among the other things they talk about how to have multiple schema revision coexisting at the same time. While hopefully most of us just need to support at most two revisions at the same time, it gives a number of interesting ideas.

[1] http://www.se-radio.net/2012/06/episode-186-martin-fowler-an...

UK-AL · on March 1, 2016

We have something similar. But I don't like branches, I prefer single trunk development(generally agree with Martin fowler) + feature switches to isolate wip features. We store all binaries built. so we just roll back to the previous binary, which is a single button click for us.

Jemaclus · on March 1, 2016

Curious... Why don't you like branches?

UK-AL · on March 1, 2016

http://martinfowler.com/bliki/FeatureBranch.html

It's a controversial article though.

harlanji · on March 1, 2016

I'm pushing for toggles on my team, haven't made a single feature branch in this repo resulting in no merge conflicts (yay).

Sofar I'm the only one regularly doing it without feature branches. Running with the idea that just because you can branch cheap doesn't mean you should. Of course toggles are technical debt to be managed, but so are branches.

I've found good practice has been mentioning which toggles are available on the README with their defaults (could be generated)... they should be tracked and removed ASAP. I read a newer article that breaks them down into categories http://www.infoq.com/news/2016/02/featuretoggles. Toggles over branches are showing value as we run different variants on staging without having to redeploy different builds, but instead changing a launch variable. It's especially clear with 2+ WIP features. We're using environ with Clojure which doesn't have any fancy runtime toggling, but that'd be another thing to look at.

falcolas · on March 2, 2016

The tech debt associated with branches disappears with the completion of the merge request. The tech debt with feature flags doesn't disappear until someone gets around to removing the (now) dead code.

Personally, I use both, based on what is best for the feature. Why not, after all?

harlanji · on March 2, 2016

I love branches in theory, but the main problem has been in practice around CI. It's a nightmare trying to manage a staging environment with feature branches involved... given better tools/a better build pipeline I'd go back. I agree, toggles are basically programs branches compiled in... analogous goto vs if/else control structure.

But day-to-day, stakeholders want to test feature X that they've heard is going well but it's not stable enough for develop? Okay, let's [engineering time and $$$]. vs. adding FEATURE_NEWDB=true to the upstart script. We've already got automation engineers helping us out, but until the deployment problem is solved toggles are more practical in our case.

falcolas · on March 2, 2016

How hard is it to set your CI to run off a separate branch? Or to run CI for all branches which are checked into a repo (a common option I've seen implemented)

nissimk · on March 2, 2016

Nobody said it here yet, so I just wanted to mention that the design of this blog is really nice. I love the font, links, blockquotes and chapter title images.

It also rendered equally nicely on my android phone and my desktop browser.

I saw that the author has a github repo for an older blog style for jekyl, but I'd like to see a similar thing for this one.

Thanks

holman · on March 2, 2016

Thanks! Was thinking of open sourcing this soon as a one-off. I'll see what I can do. :)

krakensden · on March 2, 2016

It broke down halfway through on Android Firefox =(. Reader mode forever.

ctstover · on March 2, 2016

The font rendered with iceweasel on debian 8 is illegible.

TickleSteve · on March 2, 2016

"How to deploy web software".

The software world > web servers.

(Sorry to be picky, but some people seem to assume that all software is developed for the web these days whereas the web world is just a significant & vocal minority).

zeisss · on March 2, 2016

I think the text works fine for any kind of server software.

As he discussed, feature flags also work for downloadable software (desktop/mobile) - the multiple deployments obviously don't make that much sense in that case though.

collyw · on March 2, 2016

A great deal of software is web based these days. You install apps on your phone or your desktop. You deploy to a server.

TickleSteve · on March 3, 2016

true, but its still a 'sizeable minority'. As far as number of deployments go, the embedded software world massively outweighs server, client, mobile and desktop.

Unfortunately (or fortunately), we in the embedded software world tend to be less vocal than the others.

collyw · on March 3, 2016

I wouldn't have though deploy was the correct word for embedded.

If its one piece of software I would think of it as an install. If it's coordinating multiple pieces its a deployment.

oebs · on March 1, 2016

Thanks for the writeup - very helpful. It's always good to get a view of how others are solving the same problems oneself has.

That said, the article does come off a bit as trying to be authoritative, but at the same time it doesn't leave enough room for possibilities where alternative approaches may have merit as well (i.e. "this is how to do it" vs. "what worked well for us, ymmv"). Newbies that read this article will think that the principles described are the canonical way and even try and apply them in scenarios where alternatives may prove superior.

Other than that, a lot of good advice, well done!

AlexTes · on March 8, 2016

I've been arguing for using Git Flow. Reading the post I have to say the stand our lead takes against Git Flow and in favour of very cosy CI is perhaos steonger than I realised.

He argues for pushing about as often as possible. With our small team thats very do-able, every push gets tested and linted by the 'blue' or 'green'. You're supossed to only push passing code which you easily can by running the tests and lint locally. So instead of all the pain points mentioned in the post you write passing code, pull and rebase on other passing code, and then push. Little code review, no worries about hasty reverting, few / early conflicts keeping us from trilling each other up or writing incompatible features.

The reason I argue for Git Flow? Our tree is an absolute mess. Most often a single chain, of often linearly scrambled features. In other words removing one feature would be hard and require a bunch of legwork, not a couple Git commands.

If anyone strongly feels there's a better way for a small team than lightning fast CI let me know!

doublerebel · on March 2, 2016

I wish native Nodejs deployment was a solved problem, but there really is no comprehensive and universally used tool for deploying Nodejs using Nodejs. ShipIt mentioned in the article is barely a year old, it has a short featurelist and short list of users. PM2 (Keymetrics) is not bad but is buggy, also they seem a bit overwhelmed at the moment. Flightplan is decent but the syntax is more awkward than ShipIt. Every other common language has a stable deployment tool besides Nodejs.

I ended up going with Distelli, it's a SaaS but it's fantastic. These days deploys often involve more than just one app or language, and I really prefer a tool that can ship anything. Also, having a GUI to see deployment statuses is invaluable. With those requirements none of the Nodejs tools can stand up to the other, more mature utilities. And rather than have to write all my deploy logic in another language, I just purchase the service.

supster · on March 2, 2016

What issues have you run into with PM2? I'm running it in production myself, so I'm curious.

hodoublesy · on March 2, 2016

I've really enjoyed working with PM2 in production for the last 4 months... although our ramp up has been slow.

arrsingh · on March 2, 2016

Thanks @doublerebel. For those interested Distelli is at https://www.distelli.com

Disclaimer: I'm the founder at @distelli

datr · on March 1, 2016

I'd be interested in hearing people's thoughts on deploying feature branches to production before merging them. I've generally followed more of a git-flow approach [1]. This seems to have the advantage that multiple feature branches can be grouped and deployed together - thus, avoiding the problem in the article of the deploy queue becoming a bottle neck.

[1] http://nvie.com/posts/a-successful-git-branching-model/#crea...

majewsky · on March 1, 2016

At my old team, when we did the migration from SVN to Git, I set up the deployment workflow with mandatory pull requests for everything. And I included a small deployment tool in the QA system where every developer could just click a checkbox next to their branch, and that branch would be added to the QA system (which has a copy of the production data for extended testing).

Behind the scenes, it just does an octopus merge of all selected branches into master. Since the codebase was reasonably large, we almost never encountered problems with merge conflicts.

ikawe · on March 1, 2016

> multiple feature branches can be grouped and deployed together

I would respectfully argue that this is not a desirable feature.

Although you might save some time by deploying multiple branches at once, you cloud the waters of what and how to roll back.

I think a better idea is to make deploys easy, quick, and revertible so that you can deploy early and deploy often, and in the event of a rollback, you can rollback just the broken feature.

datr · on March 2, 2016

That's true but it would seem like there would be other disadvantages to.

For example, when do you do testing? If we test as soon as the pull request is opened, we know that master is going to change a lot between now and when we finally deploy this code so the tests might not be valid.

If, on the other hand, we wait to test just before we deploy to live we risk locking up the queue for too long. This might not be an issue if you're test only take a couple of minutes to run but if you have lots of integration tests (like facebook for example [1]) then it could become a big issue.

Is the solution to this, you just accept that the codebase you're testing won't exactly be the same as what's deployed to production and the risk that comes with that?

[1] https://developers.facebooklive.com/videos/561/big-code-deve...

stympy · on March 1, 2016

Our preferred deployment method at Honeybadger is to (almost) always merge to master before deploying. We will deploy a feature branch when we want it to be a little easier to rollback to a known good state (by deploying master) for changes that we are nervous about. Those deployments are rare, though, as we have an almost-production environment (it talks to all the production services, but no customers use it) for doing one last smoke test before unleashing code on customers. :)

platz · on March 2, 2016

Why worry about what is on master, if you save your build artifacts. so that if you need to go back to a the previous behavior, just redeploy the previous productuon build output

lopopolo · on March 2, 2016

You code is written in golang. You've been compiling it with golang 1.5.1. Then golang 1.5.3 comes out with critical security fixes for their TLS code.

That is why you care what is on master: because you need to rebuild if your runtime changes.

joneholland · on March 2, 2016

We deploy straight from master, but tag each build.

If you need to rebuild a specific version, it's as easy as checking out the tag.

softawre · on March 2, 2016

This feels like the correct strategy to me (and it's what we do too). Deploying many branches to production seems like a nightmare at any scale.

Too · on March 2, 2016

Debugging. Some bug only happens on customer X's installation. You need to know which version customer X is running so you can reproduce the bug in house and know what other impact it might have had.

nathancope · on March 2, 2016

Interesting article. Frank and honest. Good to read the experiences of others.

To get the disclaimer out of the way I'm a co-founder of Vamos Deploy. Our product addresses many of the deployment problems that have been discussed here so I thought I'd mention it. We are looking for feedback on the product and an early adopter or two - https://vamosdeploy.com

I'd like to cover some techie details here. Vamos Deploy encapsulates an application with it's dependancies and runtime config so it can be deployed as one to any number of machines irrespective of the OS (well, Linux and Windows at the moment). This encapsulation is achieved by configuring a 'grid' with all the application package versions, library/runtime dependancies, runtime property values and local repository names (hostnames usually). When the grid is deployed (all via CLI) the respective local repos get updated. You can have multiple grids on a host (in a local repo) thus enabling multiple, differently configured, encapsulated applications that don't conflict. It avoids duplication by the grid sharing the underlying application packages and libraries in the local repo. There is a audit log of all actions for traceability and transparency. A simple ownership model prevents non-prod code getting into production and restricts who can deploy to production. It can be combined quite easily with any config management tool for release orchestration. You don't need RPM/Deb packages or deal with Yum repos. We have concentrated on making it easy to learn and use so max benefit can be attained quickly.

I'll stop there. Be interested in anyones feedback here or https://vamosdeploy.com#contact for a chat.

richardwhiuk · on March 2, 2016

The title of the article was "How to Deploy Software", but almost all of the advice only works for server side software where you have total control over the deployment environment.

I'd be much more interested to learn about how people develop mobile and web apps, where feature flags are far less useful as you need to push the entire app to the AppStore, so your iteration time is much slower.

LeZuse · on March 2, 2016

I also recommend Zach's presensation https://speakerdeck.com/holman/how-github-uses-github-to-bui... [selfpromo:] When we were thinking about deploying our frontend builds, we got inspired by Ember's CLI Deploy pipeline (http://blog.firstiwaslike.com/deploying-ember-cli-apps/) and we've built something similar for our Webpack based app (https://github.com/productboard/webpack-deploy). Together with Git flow methodology we basically removed all friction from deploying new versions. Would love to hear your thoughts!

agentgt · on March 2, 2016

I don't know if I agree with the branch on every deploy particularly if you have a small team and use Mercurial where named branches live forever. I wish the article discussed dependency management more.

Instead of branching we tag every deploy and use dependency management heavily (ie maven, npm, etc). That is the project that gets deployed never really has any branches but is composed of lots of smaller projects each in their own repository which may have branches but they have to be released.

This approach cuts build time, improves coupling/cohesion as well as facilitate a possible transition of OSS useful components (that do not provide a competitive advantage nor or proprietary).

I have seen way too many projects that have this giant monolithic source tree (particularly PHP projects) and thus have to rely on branching much more heavily. I firmly believe this is the wrong approach.

softawre · on March 2, 2016

> When you're ready to deploy your new code, you should always deploy your branch before merging. Always.

Does anyone actually do this? This seems counterproductive - what if there are multiple branches?

NLips · on March 2, 2016

Assuming you have a product/release branch and many others, then I'm guessing the author means you should merge product into your own branch and deploy that, before merging your branch back into product (and deploying it).

That should work fine with multiple branches in most cases, so long as you have a system to stop anyone else deploying their branch while yours is running.

datr · on March 2, 2016

I've worked places where we will always deploy a feature request to an ephemeral environment. I find quite a lot of bugs are actually caught by this.

But I've never deployed a feature branch into production like the blog post suggests. I had my own questions about this lower down in the comments.

pippy · on March 2, 2016

I use deploynaut as my deploying tool, and I have to say it's made the process much smoother. Previously I'd simply use git to update a code base or sql workbench/pgadmin to update a database.

microcolonel · on March 2, 2016

The body text in this article is illegibly thin, please consider moving to weight 600 so that people can read your text. You've worked hard to write it, now it is time for people to read it. :-)

pmontra · on March 2, 2016

I didn't notice, but my NoScript blocked the web font and the default font is more legible with that color. In both cases changing the color from rgb(100, 100, 121) to rgb(80, 80, 100) is enough to improve the readability of the text. font-weight: 600 seems a little extreme.

jorams · on March 2, 2016

For some perhaps the color of the text is the only complaint, but for me the font is also illegibly thin [1]. I got around it by disabling the web font, because just setting the font weight to 600 didn't fix the odd shapes [2].

The bold version of the font is also available as a separate font family (AvenirNextLTW01-Bold). It looks much more like a "normal" font weight and is incredibly readable [3].

[1]: http://i.imgur.com/uVHXptR.png [2]: http://i.imgur.com/IXwR6EU.png [3]: http://i.imgur.com/xgz7EtR.png

pmontra · on March 7, 2016

I noticed that thin fonts are not so thin on high DPI screens. Example: that font is more readable on my 9" 2500x1600 (approx) tablet than on my 1080p 15" laptop. Maybe they are designing for retina Macs.

sabujp · on March 2, 2016

http://spinnaker.io : deployment software made by netflix (king of daily prod pushes), google, and the community for aws, gce, azure, etc

gandhineil · on March 2, 2016

shoutout to sublime for development

my5thaccount · on March 1, 2016

I just double-click a command file on my desktop and I'm done.