Hacker News new | past | comments | ask | show | jobs | submit login
Obama Campaign AWS Infrastructure In One Diagram (awsofa.info)
297 points by sethbannon on Apr 14, 2013 | hide | past | web | favorite | 104 comments

Harper Reed and Scott VanDenPlas did a talk / Q&A at the Chicago AWS meetup group a few months ago. My hazy recollection of the salient points of the talk:

1) This represents a heterogenous set of services where the engineers in charge of each service were more-or-less free to use whatever technology stack they were most comfortable with.

2) Use puppet for provisioning everything. (There is some subtlety Scott went into about being able to bring up a bunch of servers quickly which I have since forgotten).

3) Everything shuts down the day after the election. Not much time spent future-proofing the code when it has a well known expiration date.

4) Testing testing testing. System had serious problems and 2008 and they didn't want to repeat in 2012.

5) Engineers and sysadmins had a lot of latitude to make changes in order to respond to issues as they developed. It sounded like this is essentially Harper Reed's management style (he seems like a cool guy to work for).

The subtlety you are referring to in #2 is that we moved from puppet for deployments to puppet to configure template nodes. We baked AMIs from those template nodes and deployed those, drastically shortening our time from boot to serving traffic.


Kinda odd saying that it has an expiration date. Aren't the same systems required for the next election?

I was at a talk by Rayid Ghani, the chief scientist for the Obama campaign. He said that they shut down everything after the campaign simply because they can't afford to keep on all the staff full-time for the 4 years, but they don't throw away the code.

With the billions of dollars pumped into election campaigns, it's odd that a political party wouldn't keep at least a small team on the code to continually test, update, refactor, and generally improve the quality.

Fred Brooks, nine women / one month, etc.

The problem is that political parties are fueled by donations, and it's hard to get people to donate for something like that. Not to mention you'd be competing with local/state/House/Senate elections with actual candidates for that fundraising money. I've donated politically in the past and even having only given small amounts, I still get emails daily and calls monthly asking me to donate to about half a dozen groups (many of which I've never donated to). Heck, wounded firefighters and cancer-striken kids have a low conversion rate for their fundraising - "money for technical systems to help a candidate to be named later" is even worse.

This will seem a bit OT, but it is relevant: http://www.ted.com/talks/dan_pallotta_the_way_we_think_about...

I'm bothered that we as a society have trouble recognizing the point and purpose of overhead.

The DNC does the job of folding any successful and reusable parts from the campaign into a continuing technology stack.

They keep a small team, just not the full battalion.

Source: visit to OFA offices in 2010. May have changed by now because there is no third term, but I doubt it.

Obama For America, Inc. is an actual not for profit corporation[1], and will likely be wound down before the next election. The long term election-to-election databases and such are all ran by the DNC Services Corporation.

1. http://www.corporationwiki.com/Illinois/Chicago/obama-for-am...

The same systems are not used for the next election. Bitrot, differing requirements and a general distrust of technology all account for the political machine to rebuild each cycle.

This is obviously not ideal. I hope that OFA 4.0 will release a bunch of it as open source and it will help make the next cycle easier.

With 3 and a bit years of downtime to sort out the crud and weirdness between uses.

Previous Discussion: https://news.ycombinator.com/item?id=5542368

The diagram follows the style of the AWS Reference Architectures[0]. I have yet to find a published set of diagram components that compare well to that style.

Edit: Juan Domenech[1] has a small custom made component pallet in png form. And, it looks like the original diagrams are custom Adobe Illustrator drawings[2].

[0] http://aws.amazon.com/architecture/

[1] http://blog.domenech.org/2012/06/aws-diagrams-palette-v10.ht...

[2] http://blog.domenech.org/2012/05/aws-diagrams.html

Kind of wish GAE to have some sort of whitepaper, reference architecture, best practices/patterns as well.

Oh well, I guess GAE is for toy project ahem.

Wish this had a legend. Obviously the box sizes and positioning mean something.

What I can see makes me a bit curious.

- Frontend Web boxes are "big", where the (some) of the backend App boxes are small. I'd have though it would be the reverse -- frontend horizontally scaled first, backend vertically scaled first.

- Sometimes they use Zones ABC, sometimes AB. This is cool, but a lot of the "front" infrastructure is AB (including www), so not sure what advantage having ABC on backend pieces are. These are obviously super-critical for some special reason? I guess they also might be pieces that are-not/only-partially replicated to the secondary site.

- The failover US-West site is using Asgard along with Puppet, but the primary US-East one isn't. I guess this is for managing failure scenarios?

- Prodigious use of a lot of AWS services. Including internal ELBs all over the place. SQL Server and PostgreSQL sneak in a few places.

- Couldn't find the CDN!

- Looks like staging and testing are a complete replica (hard to tell, the resolution isn't quite there). They're big though. This is fine, but raises the question why the secondary site is just a part replica? If you provision the lot to stage, you'd figure you'd run the secondary the same way?

- The Data Warehouse runs on the secondary site, but it only accessed from the primary. Interesting. Wonder why they just didn't put it on the primary?

Box sizes: this system supported lots of different programming languages/framewoerks/pre-built OSS, and as a result, some parts ran nice and lean (python) others chewed memory with reckless abandon (magento). Another factor was supplying enough network bandwidth to some hosts, hence larger sizes.

Zones: most apps were built for 2 AZ's at the start; apps deemed "critical" and "doable" flipped to 3 near the election.

Asgard: deployments had been tested with Asgard in East, but the rapid deploy in west actually used this approach. Thanks Netflix!

Services: some which are critical but missed the chart are IAM, sns, cloudwatch (for autoscaling), and yes, a bit of cloudfront was used on images sent out on SES (transactional only!) emails.

Thanks for the good questions!

Cool - Thanks for the replies!

I remember reading somewhere that they were using Akamai as their CDN provider (at least for their donation page).

Ask Scott why they ended up going with Akamai and how they, "reached out" to the campaign. Funny story.

We used Akamai, Cloudfront, and Level3 in various combinations.

We used a lot of Akamai. For all user facing services.

The most surprising thing about this is that test and staging are 1:1 duplicates of production. At the companies I've worked at dealing in scale, that has never been the case.

"Oh, you run 10 servers and 2 load balancers across these 2 sub-systems? Here's 2 servers and a toy whistle for staging."

But then again, I suppose that's what you get when you can dump hundreds of millions of dollars on a problem over a couple years.

With Puppet, and with AWS, presumably it's possible to flip a switch, and 5 minutes later have a replica which is around for the 30m you want to test it for, at 1/250th of the monthly cost

That part is a a little more nebulous. When deploying the entire full stack of all applications is automated, rolling a new environment is very simple. So, for instance, if I wanted to load test a specific end point in our API, I would just build a new instantiation of the API and test against that in a protected and isolated environment.

So in reality, it might be 2 staging environments and 12 testing. There was no real set number of environments.

Ok, a few of the boxes were smaller, but basically yes, running staging and test took big advantage of pay only for what you use..

This would be much easier to read if the entire thing wasn't slanted.

Really jazzed by CloudOpt, which I didn't know about. I had a multi-region architecture supporting AWS offshore development teams and HA between US-East & US-West, and this would have been a godsend. We had Windows Servers as dev workstations in Singapore and devs were based in India and Canada -- latency was actually fine to the desktop, and we used Amazon's internet connection to haul data back to US-East before hitting our data centre. This was satisfactory for most connectivity (throughput was fine) but occasionally the encapsulated routing would cause packet retry pains that traditional WAN acceleration may have helped with a bunch.

But, overall, this is great work, and shows what's possible to those in the enterprise that think that Amazon is still a bookseller, and clouds are basically things in the sky you don't want on your golf days. This will only help further the progression to way less wasted cost & time in IT. (Unless you're a republican, I suppose. Or maybe especially if you're a republican, you should be studying this for 2016 and considering an alternative to hiring Very Serious IT Consultants.)

Cloudopt rocked, but it's being renamed Union Bay Networks or something. http://www.linkedin.com/company/union-bay-networks

I would really be interested in know a) if I had a copy of the Puppet scripts how long would it take get this server configuration up and running and b) how much would the AWS costs be per month?

A) That's a bit nuanced and depends on the application you are referring to (there are ~200 represented here), but if you had the scripts and the repos and built from a vanilla AMI, a box would boot and configure within a few minutes. We shortened the boot times greatly by using puppet to configure a template node which we then used to create a generic versioned AMI per autoscale environment and deployed with Asgard. If you had the AMIs, an EBS backed instance would boot and serve traffic in anywhere between 15 and 30 seconds or so.

B) For costs, you can check the FEC filings... I cannot recall what they were month to month, but it will be accurately reflected quarterly there, so you can get a rough estimate. It was like a kabillion times more in October 2012 than it was in June 2011. The important part for us is that the costs scaled with our demand.

Here are the numbers I pulled:

AMAZON $58,525.33 18-Oct-12

AMAZON WEB SERVICES $144,955.12 5-Nov-12

AMAZON WEB SERVICES $150,000.00 18-Dec-12 AMAZON WEB SERVICES $150,000.00 18-Dec-12 AMAZON WEB SERVICES $47,887.28 18-Dec-12 AMAZON WEB SERVICES $135.75 18-Dec-12

Man, I didn't realize big sites were this complex under the hood. That site wasn't even complicated; if I recall correctly it was just some info pages and payment. I don't know anything about backend stuff. All my projects are like this:

[traffic] -> [hosted server] -> [Amazon CloudFront files]

I believe this includes all of their internal tools for campaigning and organizing and suchlike, not just the public website.

They're not internal; that's the point.

You want a citizen to be able to receive emails and give you money, then move smoothly to calling other citizens through the website, being provided with a walk sheet to go door-to-door in her own neighborhood, eventually leading a small team of other volunteers.... all of these require interaction with the database of other supporters. Ideally the campaign has over a hundred million people all contacting each other and urging each other to vote a certain way.

I am sure that was deliberate design. They want the common case to be as frictionless as possible.

In total, it's nearly 200 apps!

That just drives home how much simpler things have become with no SLA IaaS. Instead of spending 20+ hours racking an east coast/west coast pair of small cages or perhaps just 2 full racks each, you can save time and money by going with simple and glitch proof scale out systems driven by one button provisioning of hundreds of low performing vms. As an added bonus if your san or ethernet collapses you can safely sleep through the pager - somebody else will undoubtedly post the problem on their well monitored support forum.

Let's not forget the capEx advantage - It's not like the Obama organization can afford heavy five or low six figure expenditures on infrastructure when who even knows if they'll be able to raise a series A or will even be around in a year.

The campaign isn't a tech startup. Its a people startup that uses tech to facilitate the things that people do.

Perhaps that was a bit overly broad and not quite representative of what I meant to convey. The problems faced by a campaign and the problems faced by a tech start up are inclusive but not all encompassing. Every dollar you save on your server infrastructure is a cell phone plan for a lonely field office in Texas struggling to make a dent in a hopeless battle but they're there because they care. Are you gonna tell them they can't have phones?

I guess I was being too dry, but I was suggesting that such massive deployments in multiple regions costs far more in ongoing fees and engineering effort than an equivalently performing environment that is managed by the end user (campaign). And yet with all that expense you still end up with less reliability and transient issues that you end up compensating for by even more over provisioning and rearchitecture.

AWS is good for organizations that need to minimize upfront costs, orgs too small to support suitable ops staff, overflow traffic, people who hope they're about to hockeystick growth or have no idea if they'll even be around in a year. Also it's good for people busy wasting limited partner's money.

It is, however, a pretty bad choice for a campaign with huge coffers, a large staff, certainty about the length of time they'll be in operation, and a very predictable seasonal use pattern. And of course they do it all with money from small contributors.

I'm not saying that aws or especially cloud models aren't a wonderful tool in a lot of cases. But we're training a whole generation of web jockies who are reflexively going to AWS because they think ops and infrastructure is scary.

> The problems faced by a campaign and the problems faced by a tech start up are inclusive but not all encompassing.

I thought this was pretty obvious.

> Every dollar you save on your server infrastructure is a cell phone plan for a lonely field office in Texas struggling to make a dent in a hopeless battle but they're there because they care.

I have no idea what you're trying to convey here.

There is an existing fundraising infrastructure, an existing polling infrastructure, an existing media market infrastructure, all of these are highly successful and competitive and get candidates elected. When you're introducing sprawling new web based architectures into this mix you've got to prove the efficacy of every minute decision and improvement. There is very little trust.

So... you're saying that a tech startup does not have to contend with existing infrastructures? Do I have that correct?

Isn't "uses tech to facilitate the things that people do" what a tech startup does?

Its interesting to note that the the boweb-production servers of "www" don't have read-replicas (asynchronous replication) for read scalability. Anyone know why? Typically, one would use read-replicas to alleviate read workloads on the master.

Most likely the www servers are serving almost entirely static, cached content. Hell, they might not actually see any actual public traffic - they might just serve as CDN origins.

This is 100% correct. We also generated static snapshots of the origin and mirrored to S3. In the instance of something going awry with our primary origin servers, we could fail back to S3 as an origin.

EDIT: I'll add that our origin offload for WWW hovered around 98%.

Who would have thought the infrastructure behind Obama's site and whatever else they're running was so big. I always assumed it would be running maybe one or two instances, not require a whole freakin' diagram to explain.

As pointed out down-thread, there's a lot more to this than a simple website (voter analytics/GOTV stuff just for starters), and the potential user base/traffic load is enormous (ramping up particularly as you near election day).

Just as a data point in support of the kind of traffic that US presidential elections can drive for election-related things, I happened to be working on the web team for a major American newspaper circa the '08 election and we saw something like 1.2Bn page views the month leading up the election. (Traffic analytics wasn't my team, so that's a vague memory of being told that from five years ago. Grain of salt, etc.) This is for simple news/op-ed, mind you, and only one source of many for that information. I have no problems believing that the main web presences of the two candidates would see even more traffic, and they'd likely be more complicated under the hood than "hit homepage, follow article link, read article."

It's interesting that apart from the DynamoDB and the SQL Warehouse (Redshaft?) there's a pretty large LevelDB instance serving the sites/services.

I also wonder if in reality there were only one or two instances in each zone for each site behind the ELBs and the others were added to show how scalable the infrastructure is; or if their number actually did correspond to the avg. number of instances they ended up having to spin up.

My guess is that they wouldn't have used Redshift as it only became available for beta a few weeks before the election.

They used Vertica, which they didn't seem to be too pleased about. I think they would have liked to use Redshift it was available.

This is sized during the last few days; prior to that definitely smaller scale (same groups, fewer nodes)

Where did this lovely map come from and how can I generate my own?

It was made by Miles Ward on the AWS team but you can made a similar one at http://www.madeiracloud.com/ (and provision directly from the diagram / overlay monitoring info)

It links to an Amazon case study:


(which then links back to the map)

I tried to figure out what software this was from, but I gave up. I did find that Wikipedia has a nice list of network diagram software here http://en.wikipedia.org/wiki/Comparison_of_network_diagram_s...

I'd love to describe a network in (something like) graphviz dot notation and get a diagram that looks like this out the other end. Usually I wind up just using Visio and doing it by hand because work pays for it anyway, and it's actually very nice.

It's good 'ol Inkscape :)

Thanks for the answer! I was just playing with Inkscape the other day, I'll make a point of learning it for real next time I need to put something like this together.

Here is a chat from AWS Invent BDT 207: Big Data and the US Presidential Campaign [1]. Really interesting. They cover the diagram too.

[1] http://www.youtube.com/watch?v=X1tJAT7ioEg

That's me at the podium!

haha me too!

me too

no, you're down front with the goofy glasses. answering most of the questions. being smart. i'm the one futzing with the mouse and asking awkward questions like a wank.


That is your logic? I don't care if this infrastructure could send Obama back in time; it doesn't mean it is right for your needs. Some people glorify "successful" architectures, but it is at least as important (if not more so) to pay attention to your particulars.

my sarcasm detector is clicking.

I think the only architecture that can be copied is the replication in staging and test.

Serious question: does the 3d mode add any information? I am distracted by it but perhaps the size of the blocks have any attached meaning? Is there a 2d variant available?

Otherwise: very interesting to see such a large infrastructure used for the campaign. I wonder how it compares to an average campaign here in Germany. I'd assume only a 10th of the servers (while spending the same amount of money :)) used here. I can't see anything special on the web before elections other than some wordpress alike web servers.

It will be interesting to see how Organizing for Action ( OFA ), the follow-on group works out. So far they are raising money in decent chunks ( http://www.latimes.com/news/politics/la-pn-organizing-for-ac... )

Did anyone else notice the blue narwhals in the Staging and Testing areas? What's up with that?

Edit: D'oh: http://en.wikipedia.org/wiki/Project_Narwhal (disappointed it's not an easter egg)

It is in a way. The department heads were very worried about the scrutiny and tone of the press in covering project narwhal and actively discouraged its use as a codename for the project.

Mother of god...

Is this complexity really necessary?

Ask Mitt Romney that question

Yes and no. This diagram lays out how to do this kind of thing cheaply and scale the systems up and down quickly. It allows the campaign to bring these systems and services "in house" instead of contracting 3rd parties to manage adding boxes at great expense at the last minute based on best guesses of expected traffic. It allows you to respond to the spikes in traffic that are expected during a campaign and during promotions that would otherwise bring your site down or have you paying for unused bandwidth.

Is this not rendering for anyone else? Latest ubuntu + chrome. All I get is black.

Same for me, along with several other sites.

Ubuntu 12.10 + chrome 26.0.1410.63

WFM, Ubuntu 12.04 + chromium 25.0.1364.160

They misspelled "Wall Street".

Your tax dollars at works.

Please tell me that you weren't educated in the United States. I can't understand how else someone could misunderstand such a fundamental concept of our election process.

Ah, yes, because somehow wasting all that money on an election campaign is okay so long as it is voluntary taxation. Then I remember that this is the same group of intellectuals that voted up "Star Trek-like computer with voice interface" (a 15 line mashup that is neither a computer, nor Star Trek-like) to the top of HN. God bless America.

Do you actually think this is how taxes work?

Also the same group that created Y Combinator, Hacker News, and the Internet. If you don't like it, you can see yourself out.

We created the Internet?

Al Gore is American :)

This comment fails on many levels. Here is another:

If this was our tax dollars at work[0], I would say it's a good investment since it worked and got Obama elected. By contrast, Romney's online infrastructure (ORCA), which in the context of this comment can be interpreted a non-tax dollars (i.e. private money) crashed on election day, and Romney did not get elected.

I would rather have my tax dollars spent on something that works than have something privately funded that doesn't work.

[0] which it's totally not, the campaign was entirely funded by matching donations and Obama did not participate in the federal election money scheme.

No, just campaign dollars.

If by tax dollars you mean the money that I willfully donated to President Obama's campaign; then by all means, yes, my “tax dollars.”

Campaigns in America aren't publicly funded (that is, they aren't funded by tax money). The money to build this and to run the rest of Obama's campaign was raised by the campaign itself.

The only part our tax dollars paid for was the DARPA research that made all that technology possible. The rest was financed by campaign contributions. The Obama campaign refused public funding.

This was made by Amazon.

Amazon hosted all these servers for free?

All of you should know that this was paid with campaign dollars, not government funds. To pretend otherwise is an incredible indictment of our education system and an example of the assault on k-12 civics courses.

You severely misunderstood my comment.

That was a literal question, not rhetorical or sarcastic. I'm well aware of the difference between campaign funds and tax dollars.

nness didn't make it clear as to what "This" was in his "This was made by Amazon" so I was trying to get more detail to determine if "This" was just the picture or of "This" was the entire infrastructure. I was just curious if Amazon ran this farm for free.

We have K-12 civics courses?

We used to at least. I had classes about government every year from 7th to 12th. If we don't anymore, that is very disturbing.

I attended school in USA grades 7 - 11 and I did not have a civics class. Is civics part of the standard curriculum?

Texas public school, we studied government in the 12th year.

That explains Texas in a nutshell. ;)

Yeah, with the size of government now, you need half the school day dedicated to civics K-12. The Obama crowd would love that.

CA public school. Same.

Thank you. I dropped out in the 11th grade (went to college) which explains why I missed civics in the 12th. (I went to school in California.)

I caught up recently by drilling my Aussie wife for her US Citizen civics test - she learned all 100 questions by heart.

I made her Web flash cards, too: http://www.verticalsysadmin.com/US.html

I dropped out of college, too. Now I don't have an EDUCATION section on my resume. Plenty of work experience though.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact