
The Evolution of Code Deploys at Reddit - spladug
https://redditblog.com/2017/06/02/the-evolution-of-code-deploys-at-reddit/
======
siliconc0w
It's nice to have a lot of options when managing deployments. Feature flags,
canary deploys, A/B or green/blue, etc. Canaries are really nice to catch the
majority of deploy issues (i.e 'it doesn't start up' errors, obvious exception
spikes or performance regressions). Feature flags let you hand off control to
individual product teams and encourage them to create 'bite sized' changes
which can be flagged. Blue/green is also a nice way to reduce risk given more
complicated cross-service changes where having another 'known good' copy of
production around is helpful.

~~~
spladug
Feature flags are super useful. We didn't really have room to cover our use of
them in this post, but if you're curious you can see the system we currently
use for it here:
[https://github.com/reddit/reddit/blob/master/r2/r2/config/fe...](https://github.com/reddit/reddit/blob/master/r2/r2/config/feature/README.md)

------
technion
It also says a lot about how basic deployment schemes are perfectly good at a
reasonable scale.

I've seen many projects in the 1-2 server range become messy nightmares as
people obsessed with 2017-reddit style deployments.

~~~
spladug
Keep it simple and boring as long as you can, and even then go with the
boringest solution to your problems.

------
wyc
Kudos to the Reddit engineering team. I've been really enjoying their quality
posts as of late, and I wish more companies of that scale were as transparent
with their engineering problems and solutions. Thank you.

------
artursapek
Great write-up. I liked the stage-by-stage structure of this.

I built a tool for the team to coordinate deploy locks at my last job, similar
to the IRC bot described here. People seemed to like it a lot over the
previous system of just shouting at the rest of the room that they're taking
prod.

------
eriknstr
I think it's strange how little attention this got here on HN. Submitted by
one of the Reddit admins even. I'd ask you to do an AMA but I have no
questions and besides you lot are pretty good at answering questions whenever
they come up anyway so.

~~~
spladug
Thanks! Do feel free to ask if you think of something :)

~~~
pacaro
How do you handle one way gates? Clearly at each deploy there are two
different versions running concurrently, and as you make changes you do so
knowing that, but as the system evolves their are points in (code) time that
you can't go back to. Is this not a concern because you would never roll back
that far?

~~~
spladug
Yeah, rollbacks are more of adding a revert to the top of the pile so we just
make sure we roll back things that can be rolled back. This is important to
think about when planning deploys.

~~~
gusfoo
> Yeah, rollbacks are more of adding a revert to the top of the pile

Neat. It reminds me of the method used in gaming nowadays, which is to just
write a "savepoint" marker in to the message stream instead of pausing the
entire game to save state.

------
lapitopi
If you're looking to build a deployment tool from scratch, please consider
Spinnaker first. I work for Netflix, and it's an open source Cloud Deployment
Tool they developed in-house
([http://www.spinnaker.io](http://www.spinnaker.io)).

It's excellent, I use it daily.

------
juanbrein
Lot of the features you built over the years were built in tools like
capistrano or fabric. Any particular reason on why not use them in the first
place?

~~~
nameless912
Probably simple-is-usually-better.

I've been doing an eval for a new environmental auditing tool at work, and
I've found that most of the pre-built solutions out there (e.g. Ansible Tower,
Chef Server, some tools that we have written internally) will _mostly_ meet
our needs with some coercion, but I decided we should write our own anyway
because it gives us the flexibility to only use and maintain the features
we're actually going to use.

It's very possible (likely, even) that the Reddit guys looked at fabric or
capistrano, and decided either:

1) the tool didn't map to their model of deployments, or 2) the tool did too
much and would require more maintenance than a dead-simple solution they wrote
themselves.

It's all a matter of perspective.

~~~
spladug
Spot on.

------
mschuster91
A question out of curiosity: why did you choose to write your deployment tools
from scratch, instead of going with something like Jenkins?

And, how do/did you provision new servers? By hand, or did you use something
like Chef/Puppet?

~~~
spladug
> why did you choose to write your deployment tools from scratch, instead of
> going with something like Jenkins?

Each step along the way was basically just a small modification on the system
before. AFAIK Jenkins doesn't come with the ability to safely deploy code to
hundreds of servers out of the box, so building out the systems to make that
the case would've been more work than just adding to what existed and for
unknown benefit.

> And, how do/did you provision new servers? By hand, or did you use something
> like Chef/Puppet?

We use puppet for configuration management. That's been the case for most
stuff since early 2011 for context. There's a lot more detail in our semi-
recent infra/ops AMA:
[https://www.reddit.com/r/sysadmin/comments/57ien6/were_reddi...](https://www.reddit.com/r/sysadmin/comments/57ien6/were_reddits_infraops_team_ask_us_anything/)

~~~
oblio
Jenkins would have bought you deployment queueing, which I see you developed.

On the other hand, since you weren't already using it for builds/running
tests, it would have added some overhead.

------
letientai299
Off topic.

With the quality of the article, I really wonder why most of reddit open
source mentioned in the article are not popular?

Is that because of the lacking of marketing?

