

10+ Deploys Per Day: Dev and Ops Cooperation at Flickr - brown9-2
http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr

======
TimothyFitz
Love that this presentation really nails the fact that continuous deployment
is effectively a side-effect of truly integrating Dev and Ops.

(At IMVU our VP of Operations is also our VP of Engineering)

------
slillibri
The actual presentation video is on blip.tv
<http://velocityconference.blip.tv/file/2284377/>

------
Groxx
Decent presentation. Didn't know they deployed (deplew?) so often, but makes
sense. Staying ahead of the game is kind of why they are successful (as is
true for nearly any web app, as switching is so easy).

The basics are:

    
    
      no fingerpointing
      understand each other
      automate automate automate every little thing
      automate automate automate every big thing
      profile the results

~~~
brlewis
I agree with most of your comment, but switching is not so easy for social
networks. Flickr is largely a social network for photography enthusiasts.

~~~
Groxx
This is true. I was referring more to non-social-network web apps, like going
between Google Docs and ZoHo, or switching version control hosts. There's
typically a loss of _some_ data there, but the really important stuff (raw
content) can be pulled out and transferred with minimal effort.

Social networks are about the opposite, as a lot of the content isn't really
"yours", it's a result of interaction. I made no reference to this in my
comment though, so thanks for pointing it out :)

------
kaffeinecoma
"Branching in code"? I'd be curious to know what is the average half-life of a
developer there. How long until they throw their arms up and say "no more
monkey-patching... I'm outta here!"

~~~
brown9-2
There was an article about Flicker's mentality on "we don't branch" a few
weeks ago: <http://code.flickr.com/blog/2009/12/02/flipping-out/>

From the outside it sounds like it would be a nightmare to manage. How could
you sanely test all the permutations of "Feature X on, Feature Y off, Feature
Z = 4"?

~~~
paulhammond
[I work at flickr and gave the presentation linked to above]

You don't need to test every permutation. Pretty much all of the flags in the
codebase are independent of each other - someone might be working on changes
to our admin interface in some files while someone else is changing the way
comments are displayed elsewhere. There's no overlap in the changes.

If the flags do interact with each other then in most cases the features will
launch one after another, so you only need to test a handful of states (foo
off bar off, foo on bar off, foo on bar on)

If the flags interact with each other and the features will be launching at
the same time (or the betas overlap) then you have to do the same amount of
integration testing that you'd have to do with landing several branches into
trunk at once. This is complex, we don't do it very often.

And we clean up flags once they're not needed any more, which minimizes the
possible combinations.

I was skeptical about this before I started working at flickr, now I can't
imagine working any other way.

~~~
mmastrac
How do you guys handle plumbing work on different layers? Do you ever launch
something that doesn't correspond to a specific user feature (say a cleanup to
make something more reliable or efficient)?

We're looking at what we can do to adopt continuous deployment in our own
shop. I suspect that it requires a bit of rethinking how development happens.

~~~
paulhammond
All the time. For example, we switched from one video transcoding backend to
another recently. Having a single config flag used to chose which codepath a
video went through meant we could launch it for staff only at first. Then, as
we rolled it out to more people we could very easily switch to the old
codepath if we found issues.

We've used flags for changes at all layers within the application code itself
- whether to use the more optimized javascript, the new css sprites, the new
database access layer, the new database schema, the new spam detection system
etc etc.

You get even more out of config flags when deploying non-user facing changes -
nobody knows if you roll it back so you can turn it off and on as many times
as you need to get detailed data on exactly what impact the new code has on
your metrics (both business and infrastructure).

One thing we don't use flags for is changes in the layers below the
application - the OS, web server, php libraries etc. It's much easier to roll
these out server by server.

------
lecha
Anyone has a link to the video recording of this presentation?

------
dnsworks
The negative side-effect is that people tend to take what they want from
presentations, and present them as fact or proof of best practices. Since this
presentation, I know systems administrators at a number of rather poorly
engineered, but large start-ups who have said "Our developers saw that and
said, "See we should break the service all of the time and you should get in
the way of deployments even if it means having to wake you up at 3am every
night" ...

~~~
Jallspaw
Sounds like they weren't understanding the presentation, because that
certainly isn't at all the type of communication or collaboration we have at
Flickr.

The video of the presentation, which might give more context, is here:
<http://velocityconference.blip.tv/file/2284377/>

(please pardon my potty mouth on the video)

~~~
dnsworks
I've watched it several times, it's a great video. I've just heard several
stories now (often when asking admin friends if they watched it) of developers
using it as an excuse for bad behavior, cherry picking a couple of points.

------
gcb
And yet, it was ages since I saw a new cool feature at flickr.

last one I recall was the flashy album manager and video, both wich i rarely
uses.

