10+ Deploys Per Day: Dev and Ops Cooperation at Flickr

TimothyFitz · on Jan 21, 2010

Love that this presentation really nails the fact that continuous deployment is effectively a side-effect of truly integrating Dev and Ops.

(At IMVU our VP of Operations is also our VP of Engineering)

slillibri · on Jan 22, 2010

The actual presentation video is on blip.tv http://velocityconference.blip.tv/file/2284377/

Groxx · on Jan 21, 2010

Decent presentation. Didn't know they deployed (deplew?) so often, but makes sense. Staying ahead of the game is kind of why they are successful (as is true for nearly any web app, as switching is so easy).

The basics are:

  no fingerpointing
  understand each other
  automate automate automate every little thing
  automate automate automate every big thing
  profile the results

brlewis · on Jan 21, 2010

I agree with most of your comment, but switching is not so easy for social networks. Flickr is largely a social network for photography enthusiasts.

Groxx · on Jan 21, 2010

This is true. I was referring more to non-social-network web apps, like going between Google Docs and ZoHo, or switching version control hosts. There's typically a loss of some data there, but the really important stuff (raw content) can be pulled out and transferred with minimal effort.

Social networks are about the opposite, as a lot of the content isn't really "yours", it's a result of interaction. I made no reference to this in my comment though, so thanks for pointing it out :)

pt · on Jan 22, 2010

I used to believe that too. Until I saw all my Orkut friends switch to Facebook in a jiffy...

kaffeinecoma · on Jan 21, 2010

"Branching in code"? I'd be curious to know what is the average half-life of a developer there. How long until they throw their arms up and say "no more monkey-patching... I'm outta here!"

brown9-2 · on Jan 22, 2010

There was an article about Flicker's mentality on "we don't branch" a few weeks ago: http://code.flickr.com/blog/2009/12/02/flipping-out/

From the outside it sounds like it would be a nightmare to manage. How could you sanely test all the permutations of "Feature X on, Feature Y off, Feature Z = 4"?

paulhammond · on Jan 22, 2010

[I work at flickr and gave the presentation linked to above]

You don't need to test every permutation. Pretty much all of the flags in the codebase are independent of each other - someone might be working on changes to our admin interface in some files while someone else is changing the way comments are displayed elsewhere. There's no overlap in the changes.

If the flags do interact with each other then in most cases the features will launch one after another, so you only need to test a handful of states (foo off bar off, foo on bar off, foo on bar on)

If the flags interact with each other and the features will be launching at the same time (or the betas overlap) then you have to do the same amount of integration testing that you'd have to do with landing several branches into trunk at once. This is complex, we don't do it very often.

And we clean up flags once they're not needed any more, which minimizes the possible combinations.

I was skeptical about this before I started working at flickr, now I can't imagine working any other way.

mmastrac · on Jan 22, 2010

How do you guys handle plumbing work on different layers? Do you ever launch something that doesn't correspond to a specific user feature (say a cleanup to make something more reliable or efficient)?

We're looking at what we can do to adopt continuous deployment in our own shop. I suspect that it requires a bit of rethinking how development happens.

paulhammond · on Jan 22, 2010

All the time. For example, we switched from one video transcoding backend to another recently. Having a single config flag used to chose which codepath a video went through meant we could launch it for staff only at first. Then, as we rolled it out to more people we could very easily switch to the old codepath if we found issues.

We've used flags for changes at all layers within the application code itself - whether to use the more optimized javascript, the new css sprites, the new database access layer, the new database schema, the new spam detection system etc etc.

You get even more out of config flags when deploying non-user facing changes - nobody knows if you roll it back so you can turn it off and on as many times as you need to get detailed data on exactly what impact the new code has on your metrics (both business and infrastructure).

One thing we don't use flags for is changes in the layers below the application - the OS, web server, php libraries etc. It's much easier to roll these out server by server.

lecha · on Jan 21, 2010

Anyone has a link to the video recording of this presentation?

dnsworks · on Jan 22, 2010

The negative side-effect is that people tend to take what they want from presentations, and present them as fact or proof of best practices. Since this presentation, I know systems administrators at a number of rather poorly engineered, but large start-ups who have said "Our developers saw that and said, "See we should break the service all of the time and you should get in the way of deployments even if it means having to wake you up at 3am every night" ...

Jallspaw · on Jan 22, 2010

Sounds like they weren't understanding the presentation, because that certainly isn't at all the type of communication or collaboration we have at Flickr.

The video of the presentation, which might give more context, is here: http://velocityconference.blip.tv/file/2284377/

(please pardon my potty mouth on the video)

dnsworks · on Jan 22, 2010

I've watched it several times, it's a great video. I've just heard several stories now (often when asking admin friends if they watched it) of developers using it as an excuse for bad behavior, cherry picking a couple of points.

gcb · on Jan 22, 2010

And yet, it was ages since I saw a new cool feature at flickr.

last one I recall was the flashy album manager and video, both wich i rarely uses.