

Shout Out to the Server Teams - siglesias
http://stevestreza.com/2012/05/16/shout-out-to-the-server-teams/

======
rurounijones
I kind of take issue with the "No amount of load testing could adequately
prepare the server team behind Diablo 3 for firepower of this magnitude".

If your load-testing does not prepare you for the worst then your load-testing
plan is garbage.

They already knew how many copies had been pre-ordered and could make a pretty
good guess how many copies would be sold and activated on the first night.
Take that worst case estimate, now double it and test for that.

"But the cost of supporting worst-case scenarios!!" some may cry. This is
where rented servers / cloud setups are useful for elastic scale without
breaking the bank.

There are companies that can simulate load from users across the globe, I have
no doubt that blizzard would have the connections / influence /cash to set-up
a kick-arse load-test system.

~~~
stevestreza
Behaviorally there will be differences between how your users act and how your
load test plan is executed. A small difference in the two can cause problems
you didn't expect. That's the point I was trying to make.

~~~
rurounijones
For some things I can agree with you, usually web-apps with many possible
branching execution paths where it is hard to know exactly what will be the
most common use case to test for.

However in this case what can the users do that can be different to their
plan? The most basic example would be logging in, there are not that many
execution paths I can think of for that one.

As for the integration / reporting during gameplay, this is all done via a
strictly defined API which is called by the game client in relatively
predictable ways (game started, player does X, authentication ping every x
seconds, achievement unlocked), unlike a web-app where users can do whatever
they want the API usage flow is basically controlled by Blizzard.

That is why I am not impressed with blizzard on this one; they control
basically the entire use of this API apart from one thing, the number of users
trying to use it at any one time, which is there the load testing plan should
have worked.

------
burke
I'm very curious what it is in Diablo that's causing the issues. It seems a
bit odd to me that they're experiencing issues this severe for two reasons:

1) Diablo was developed from the fairly unusual position of knowing it would
face a launch to millions of users before development ever started. I'd have
imagined having this "scale it to infinity" mentality from the starting line
would have helped a lot.

2) The whole game is conceptually VERY easy to shard. Unlike WoW, there is
very little interaction between players that are not in a party together, and
the maximum party size is four.

I wonder whether the failures have anything to do with the achivement
tracking/broadcasting. It's the only component I can think of that breaks out
of these obvious sharding boundaries, and I can kind of imagine how large
friend lists might cause problems. Additionally, it seems achievement progress
was lost from the time leading up to one of the downtimes.

I know it's easy to speculate from here, and there are probably very
legitimate reasons for all of this. As an outsider, it seems like these
particular failures are things that, in general, just happen. Still, I would
have expected Blizzard in particular to be better prepared for this. It's a
bit surprising.

~~~
barkerja
No idea if this stands true, but did just see this on Reddit:
<http://i.imgur.com/efw0N.png>

~~~
burke
I saw that too. I tend to mistrust anything originating on 4chan, but it would
explain the authentication server blowing up.

I'm looking forward to reading (hoping they release) a post-mortem on this.

------
SeoxyS
I know all about servers being under so much load that everything falls apart.
Working at a nascent mobile ad network whose traffic doubles every month, and
whose monthly number of requests amount to 10 figures, I know all about it.

And yet… I feel like Blizzard could have made an effort to make its _single
player_ game run offline. The multiplayer is fantastic, but give us something
to fallback on.

~~~
soup10
Blizzard very intentionally made it online only to combat piracy. Not only
that but I read they have units/loot/and map layouts generated server side to
make it much more difficult for the crackers to release something playable.
The servers are doing a lot of work, so it's not surprising in the slightest
that they are having lot's of launch day issues.

It's a pretty sad case of putting business concerns over user experience. If
it wasn't an anti-piracy thing they would have happily made a offline mode
because it would drastically reduce the server load and all the support and
development costs that massive multiplayer games have.

~~~
nodata
> The servers are doing a lot of work, so it's not surprising in the slightest
> that they are having lot's of launch day issues.

If it's not surprising in the slightest, then they should have planned to
scale.

------
sriramk
My favorite Blizzard launch story actually involves Microsoft.

Years ago, before the days of the cloud and well-understood fail over
mechanisms, a very enterprise-y product happened to share datacenter space
with Blizzard. One fine day, Blizzard shipped an update to WoW and from what I
hear, it took down networking across the DC and left everyone scrambling.

Try explaining to your customers that your business critical service just went
down because Azeroth got a new continent.

------
nchuhoai
I can't stress enough how I admire and respect server/dev ops people. Their
job is among the hardest and people definitely overlook their importance way
too often. I wouldn't even know how I would go about finding them

------
dols
Shout out to the dev teams who create buggy services that crash all the time.
You keep the server teams employed.

~~~
fsniper
Really? How shallow thinking is that? I'm a sysadmin and our most problems are
not buggy software related.

~~~
jefe78
Another sysadmin reporting in. OP doesn't have a clue.

------
InclinedPlane
This whole episode has been a massive face plant for Blizzard.

Consider: they have access to all of the sales data so they know how many
copies of the game could potentially be played at launch.

They ran an open beta so they should have a good idea of how everything scales
relative to total simultaneous user count.

They have extensive experience with all aspects of patching, scaling, and
server operations through World of Warcraft and Starcraft II.

They intentionally decided to go with a "single player" experience that
required connectivity and incurred server load.

Given all of that, there really is no excuse to fail as hard as they did on
launch day. It is 2012, the standards are pretty high for getting things right
with digital distribution and with online games. More so, if you make a bold
decision to force connectivity for a single player game you damned sure better
get it right or you are going to destroy your credibility.

Blizzard is enormously lucky that they have a very strong history of
compelling games, these sorts of issues could easily cause an upstart game
studio to go out of business.

~~~
siglesias
Couldn't they have sold more digital copies than anticipated? I know I didn't
preorder (not like I was worried about stock outs or anything), just bought it
in the morning when it came out.

------
zobzu
Yea SRE people are just fine. But whoever decision maker decided it was a good
idea to run a single player game online just need to get a clue. And don't
worry, people who pirate will use a server emulator as they've done for every
previous such protection.

------
matt4711
It is probably too expensive (development wise) to scale for this many
concurrent users as it will only happen once (or twice in case of expansions)
during the whole lifetime of the game.

~~~
InclinedPlane
"Expensive" is incurring a tremendous amount of bad PR for a game on launch
day. "Expensive" is having the launch of one of your flagship games be forever
used as the butt of a joke. "Expensive" is turning your own customer base
against you by having people who took days off work or stayed up until well
after midnight to play one of the most anticipated games of the last decade
being thwarted in their attempts. "Expensive" is people deciding to put off
their purchases, perhaps forever, because they see the problems people are
having. "Expensive" is having your brand reputation dented to such a degree
that it affects future sales of all of your games.

That's expensive.

Compared to that servers are cheap.

~~~
patio11
What do you think this incident reads like in Vivendi's annual report? I'm
thinking "We made a few more mountains of money with the enormously successful
release of Diablo 3. Fans love it and monetization is six times previous
records for the series a per-copy-sold basis or 200 times higher per copy
played."

WoW also had launch issues. Players complained. Money hats were made.

~~~
Arelius
Money hats: Blizzard is making them right now.

But yes, if anyone realistically thinks these server issues significantly
alter sales, they must have also forgotten the SC2, WoW, DiabloII launch.

~~~
InclinedPlane
In the near term Blizzard isn't going to be going out of business, nor is it
going to have a shortage of money hats. But make no mistake, this is a serious
issue and it has tarnished their reputation. They still have plenty of excess
reputation at the moment but if they continue to take a cavalier attitude
towards customer satisfaction then there will be another incident like this,
and another, and another, until it really starts hurting their bottom line in
a way they can't ignore.

~~~
Arelius
Did it really tarnish their reputation when SC2, WoW, or DiabloII launched?
It's such a transient thing that seriously suspect that they could do this
forever and never effect sales in the slightest.

------
jalada
Given how much people love Blizzard, I wonder if they really care /that/ much
about launch day issues?

They have massive amounts of experience in this field, I'm sure they had the
capability to make launch day run much smoother, so why didn't it? Perhaps
they thought 'unprecedented demand for new game forces it temporarily offline'
sounds like a nice headline in the paper.

It's just one day after all, and all the people who play on launch day have
already spent their money, and probably aren't really the type to get a
refund.

~~~
corin_
People's love of Blizzard makes it worse, not better, when there are problems.

I was at the London launch event Monday night (working not buying), the first
person in the queue had been camped there since Saturday lunchtime. He said
that by the time he got his copy signed at 11pm he had time to get a train
home, get to work with ten minutes to spare, work a twelve hour shift, then go
home and spend the night playing the game.

A lot of people care a lot about games like this, and that includes how soon
they can play it.

------
Fizzadar
+1 for server teams! sysadmins never get appreciated enough.

However, as far as the Diablo 3 launch goes, some thoughts:

\+ Blizzard have been doing this for years

\+ They know how many people have pre-ordered and pre-installed the game

\+ The game is singleplayer, yet they decided upon this online requirement (no
offline play, thanks Blizzard)

All in all it's pretty annoying to purchase something and not be able to play
it because they simply haven't upgraded their infrastructure for the load they
should have expected.

------
AnthonyJoseph
Does anyone have any information on what their infrastructure looks like? What
do they use to manage their servers? I guess a lot of this information would
be "proprietary" but scaling something this large would be a great read. I
manage about 1000 non-critical (think kiosks) servers, and deploy the code to
them; and it is relatively painless, I would be curious to know how the big
boys do things.

------
viraptor
Strange to see them called "server teams". Devops - maybe. Devs - someone's
got to fix the actual code issues. Ops - if it's a platform configuration
issue. But whenever I read "server team", I'm thinking of the DC ops racking
the actual hardware.

Is it a common name for devops in other companies?

~~~
stevestreza
I wasn't writing this for the Hacker News audience, I wrote it so it could be
understood by anyone. In this context I used "server team" to refer to the
umbrella of devops (which is opaque industry jargon).

