Decent presentation. Didn't know they deployed (deplew?) so often, but makes sense. Staying ahead of the game is kind of why they are successful (as is true for nearly any web app, as switching is so easy).
The basics are:
understand each other
automate automate automate every little thing
automate automate automate every big thing
profile the results
This is true. I was referring more to non-social-network web apps, like going between Google Docs and ZoHo, or switching version control hosts. There's typically a loss of some data there, but the really important stuff (raw content) can be pulled out and transferred with minimal effort.
Social networks are about the opposite, as a lot of the content isn't really "yours", it's a result of interaction. I made no reference to this in my comment though, so thanks for pointing it out :)
[I work at flickr and gave the presentation linked to above]
You don't need to test every permutation. Pretty much all of the flags in the codebase are independent of each other - someone might be working on changes to our admin interface in some files while someone else is changing the way comments are displayed elsewhere. There's no overlap in the changes.
If the flags do interact with each other then in most cases the features will launch one after another, so you only need to test a handful of states (foo off bar off, foo on bar off, foo on bar on)
If the flags interact with each other and the features will be launching at the same time (or the betas overlap) then you have to do the same amount of integration testing that you'd have to do with landing several branches into trunk at once. This is complex, we don't do it very often.
And we clean up flags once they're not needed any more, which minimizes the possible combinations.
I was skeptical about this before I started working at flickr, now I can't imagine working any other way.
All the time. For example, we switched from one video transcoding backend to another recently. Having a single config flag used to chose which codepath a video went through meant we could launch it for staff only at first. Then, as we rolled it out to more people we could very easily switch to the old codepath if we found issues.
You get even more out of config flags when deploying non-user facing changes - nobody knows if you roll it back so you can turn it off and on as many times as you need to get detailed data on exactly what impact the new code has on your metrics (both business and infrastructure).
One thing we don't use flags for is changes in the layers below the application - the OS, web server, php libraries etc. It's much easier to roll these out server by server.
The negative side-effect is that people tend to take what they want from presentations, and present them as fact or proof of best practices. Since this presentation, I know systems administrators at a number of rather poorly engineered, but large start-ups who have said "Our developers saw that and said, "See we should break the service all of the time and you should get in the way of deployments even if it means having to wake you up at 3am every night" ...
I've watched it several times, it's a great video. I've just heard several stories now (often when asking admin friends if they watched it) of developers using it as an excuse for bad behavior, cherry picking a couple of points.