1) This represents a heterogenous set of services where the engineers in charge of each service were more-or-less free to use whatever technology stack they were most comfortable with.
2) Use puppet for provisioning everything. (There is some subtlety Scott went into about being able to bring up a bunch of servers quickly which I have since forgotten).
3) Everything shuts down the day after the election. Not much time spent future-proofing the code when it has a well known expiration date.
4) Testing testing testing. System had serious problems and 2008 and they didn't want to repeat in 2012.
5) Engineers and sysadmins had a lot of latitude to make changes in order to respond to issues as they developed. It sounded like this is essentially Harper Reed's management style (he seems like a cool guy to work for).
Fred Brooks, nine women / one month, etc.
I'm bothered that we as a society have trouble recognizing the point and purpose of overhead.
Source: visit to OFA offices in 2010. May have changed by now because there is no third term, but I doubt it.
This is obviously not ideal. I hope that OFA 4.0 will release a bunch of it as open source and it will help make the next cycle easier.
The diagram follows the style of the AWS Reference Architectures. I have yet to find a published set of diagram components that compare well to that style.
Edit: Juan Domenech has a small custom made component pallet in png form. And, it looks like the original diagrams are custom Adobe Illustrator drawings.
Oh well, I guess GAE is for toy project ahem.
What I can see makes me a bit curious.
- Frontend Web boxes are "big", where the (some) of the backend App boxes are small. I'd have though it would be the reverse -- frontend horizontally scaled first, backend vertically scaled first.
- Sometimes they use Zones ABC, sometimes AB. This is cool, but a lot of the "front" infrastructure is AB (including www), so not sure what advantage having ABC on backend pieces are. These are obviously super-critical for some special reason? I guess they also might be pieces that are-not/only-partially replicated to the secondary site.
- The failover US-West site is using Asgard along with Puppet, but the primary US-East one isn't. I guess this is for managing failure scenarios?
- Prodigious use of a lot of AWS services. Including internal ELBs all over the place. SQL Server and PostgreSQL sneak in a few places.
- Couldn't find the CDN!
- Looks like staging and testing are a complete replica (hard to tell, the resolution isn't quite there). They're big though. This is fine, but raises the question why the secondary site is just a part replica? If you provision the lot to stage, you'd figure you'd run the secondary the same way?
- The Data Warehouse runs on the secondary site, but it only accessed from the primary. Interesting. Wonder why they just didn't put it on the primary?
Zones: most apps were built for 2 AZ's at the start; apps deemed "critical" and "doable" flipped to 3 near the election.
Asgard: deployments had been tested with Asgard in East, but the rapid deploy in west actually used this approach. Thanks Netflix!
Services: some which are critical but missed the chart are IAM, sns, cloudwatch (for autoscaling), and yes, a bit of cloudfront was used on images sent out on SES (transactional only!) emails.
Thanks for the good questions!
"Oh, you run 10 servers and 2 load balancers across these 2 sub-systems? Here's 2 servers and a toy whistle for staging."
But then again, I suppose that's what you get when you can dump hundreds of millions of dollars on a problem over a couple years.
So in reality, it might be 2 staging environments and 12 testing. There was no real set number of environments.
But, overall, this is great work, and shows what's possible to those in the enterprise that think that Amazon is still a bookseller, and clouds are basically things in the sky you don't want on your golf days. This will only help further the progression to way less wasted cost & time in IT. (Unless you're a republican, I suppose. Or maybe especially if you're a republican, you should be studying this for 2016 and considering an alternative to hiring Very Serious IT Consultants.)
B) For costs, you can check the FEC filings... I cannot recall what they were month to month, but it will be accurately reflected quarterly there, so you can get a rough estimate. It was like a kabillion times more in October 2012 than it was in June 2011. The important part for us is that the costs scaled with our demand.
AMAZON $58,525.33 18-Oct-12
AMAZON WEB SERVICES $144,955.12 5-Nov-12
AMAZON WEB SERVICES $150,000.00 18-Dec-12
AMAZON WEB SERVICES $150,000.00 18-Dec-12
AMAZON WEB SERVICES $47,887.28 18-Dec-12
AMAZON WEB SERVICES $135.75 18-Dec-12
[traffic] -> [hosted server] -> [Amazon CloudFront files]
You want a citizen to be able to receive emails and give you money, then move smoothly to calling other citizens through the website, being provided with a walk sheet to go door-to-door in her own neighborhood, eventually leading a small team of other volunteers.... all of these require interaction with the database of other supporters. Ideally the campaign has over a hundred million people all contacting each other and urging each other to vote a certain way.
Let's not forget the capEx advantage - It's not like the Obama organization can afford heavy five or low six figure expenditures on infrastructure when who even knows if they'll be able to raise a series A or will even be around in a year.
AWS is good for organizations that need to minimize upfront costs, orgs too small to support suitable ops staff, overflow traffic, people who hope they're about to hockeystick growth or have no idea if they'll even be around in a year. Also it's good for people busy wasting limited partner's money.
It is, however, a pretty bad choice for a campaign with huge coffers, a large staff, certainty about the length of time they'll be in operation, and a very predictable seasonal use pattern. And of course they do it all with money from small contributors.
I'm not saying that aws or especially cloud models aren't a wonderful tool in a lot of cases. But we're training a whole generation of web jockies who are reflexively going to AWS because they think ops and infrastructure is scary.
I thought this was pretty obvious.
> Every dollar you save on your server infrastructure is a cell phone plan for a lonely field office in Texas struggling to make a dent in a hopeless battle but they're there because they care.
I have no idea what you're trying to convey here.
EDIT: I'll add that our origin offload for WWW hovered around 98%.
Just as a data point in support of the kind of traffic that US presidential elections can drive for election-related things, I happened to be working on the web team for a major American newspaper circa the '08 election and we saw something like 1.2Bn page views the month leading up the election. (Traffic analytics wasn't my team, so that's a vague memory of being told that from five years ago. Grain of salt, etc.) This is for simple news/op-ed, mind you, and only one source of many for that information. I have no problems believing that the main web presences of the two candidates would see even more traffic, and they'd likely be more complicated under the hood than "hit homepage, follow article link, read article."
I also wonder if in reality there were only one or two instances in each zone for each site behind the ELBs and the others were added to show how scalable the infrastructure is; or if their number actually did correspond to the avg. number of instances they ended up having to spin up.
(which then links back to the map)
I'd love to describe a network in (something like) graphviz dot notation and get a diagram that looks like this out the other end. Usually I wind up just using Visio and doing it by hand because work pays for it anyway, and it's actually very nice.
I think the only architecture that can be copied is the replication in staging and test.
Otherwise: very interesting to see such a large infrastructure used for the campaign. I wonder how it compares to an average campaign here in Germany. I'd assume only a 10th of the servers (while spending the same amount of money :)) used here. I can't see anything special on the web before elections other than some wordpress alike web servers.
Edit: D'oh: http://en.wikipedia.org/wiki/Project_Narwhal (disappointed it's not an easter egg)
Ubuntu 12.10 + chrome 26.0.1410.63
If this was our tax dollars at work, I would say it's a good investment since it worked and got Obama elected. By contrast, Romney's online infrastructure (ORCA), which in the context of this comment can be interpreted a non-tax dollars (i.e. private money) crashed on election day, and Romney did not get elected.
I would rather have my tax dollars spent on something that works than have something privately funded that doesn't work.
 which it's totally not, the campaign was entirely funded by matching donations and Obama did not participate in the federal election money scheme.
That was a literal question, not rhetorical or sarcastic. I'm well aware of the difference between campaign funds and tax dollars.
nness didn't make it clear as to what "This" was in his "This was made by Amazon" so I was trying to get more detail to determine if "This" was just the picture or of "This" was the entire infrastructure. I was just curious if Amazon ran this farm for free.
I caught up recently by drilling my Aussie wife for her US Citizen civics test - she learned all 100 questions by heart.
I made her Web flash cards, too: http://www.verticalsysadmin.com/US.html
I dropped out of college, too. Now I don't have an EDUCATION section on my resume. Plenty of work experience though.