It really bothers me that none of the official responses have adequately addressed DRM, which is the real controversy, and is what created the server crisis in the first place.
To each their own, but honestly I don't view DRM as "the real controversy". To me, the more important thing is that I paid them for a product that doesn't really work properly or reliably. Had SimCity shipped with on online play requirement but reliable servers such that the online requirement was unnoticeable...I doubt I would have cared all that much.
I think it's also technically incorrect to say online DRM is the root of the technical issues. The server requirements necessary to support online DRM would seem to be much less than the server requirements necessary to support online city-state storage and inter-city interaction. Had the online component ONLY been about DRM, I doubt they would have had capacity issues at all.
I appreciate the fact that a lot of people are militantly opposed to DRM (the more invasive the DRM, the stronger the opposition). And I don't necessarily disagree. But it also seems like the fundamental problem with SimCity isn't the DRM so much as it's the online gameplay nature of the game combined with a totally inadequate server capacity. The less shouldn't be "don't use DRM" so much as it should be "do a better job planning server capacity".
The server side component is not solely drm. It is similar to WoW and League of Legends in that it actively correlates between different cities and does some of the processing.
If the servers were just DRM, do you really think this would be melting them because too many people used cheetah?
Quick summary: They have a server shortage. They are adding new servers but this takes time (three days, apparently). Also, they're disabling Cheetah speed.
If only there was a way to start virtual server in less than three days...
Just because you're using EC2 (or any other "cloud" platform) doesn't mean you can just simply add new servers to your infrastructure, your application architecture needs to cater for this. With EC2 you have the "ability" to architecture your application in such a way that you can take advantage of this but if you don't then you have to do everything manual. Cloud is not a silver bullet.
If they're truly using EC2, and they haven't architected their infrastructure to be able to horizontally scale at the push of a button, then they're incompetent. Period.
It's forgivable that it doesn't happen automatically, as automatic provisioning in response to load isn't always an easy problem to solve, but if it's going to take them 3 days to bring up new machines on EC2 to handle load, they're doing it wrong. The ability to do this sort of thing quickly is one of the top few reasons to use something like EC2.
They very likely DID do something to bring up servers quickly. I bet they can spin up stuff like NA East 3 rather fast. Just because there is a DB somewhere who can't handle 2304982309482 people trying to slam cheetah mode while 230492340 are editing trade deopots, which is just a test they didn't happen to simulate well enough in load testing, doesn't mean they're utterly incompetent.
Server outside EA are pretty new to EA from what I understand
The weirdest crap melts sometimes when you apply real numbers of real non-tester peoples to things. Weird crap you cannot test for. Often has little to do with raw power, but instead is small architectural inefficiencies you cannot easily fix.
It does seem safe to say that technically they really dropped the ball. The inadequate estimation of server load combined with the (apparently difficult) task of adjusting to the actual load suggests at least a few groups of people didn't do their jobs properly (or the company denied them the resources to do so).
Yes, I realise that, I just would have ordinarily assumed that a choice to use EC2 also included consideration for scalability, automatic spin up, etc.
They're not necessarily using it for what the bottleneck is. They might do raw compute there, but are locked on a DB query or write that's hosted internally or that just is taking awhile to properly partition.