

Scaling Guild Wars for massive concurrency - experiment0
http://www.codeofhonor.com/blog/scaling-guild-wars-for-massive-concurrency?

======
hythloday
Always enjoy these articles. I do think that the simulation strategy deserves
a bit more discussion, and the pitfalls of replays are glossed over slightly.

Simulating players gives you the benefit of end-to-end testing much more
easily (for example, it tests your login flow and disconnect flow, both of
which are often written without much care for performance). It also
practically demands that you write a scripting console for your game, which is
an invaluable resource for error testing, debugging and development both by
your engine and content developers. It's true that human players are more
unpredictable than bots, but that's not a drawback, because you want to
stress-test the hottest part of your game. For these purposes, you can write a
small test level (think an arena) and have bots spawn at either end and run
towards each other, attacking until they die. This isn't at all representative
of how players will play the game but it will very definitely catch
performance regressions.

Replays depend on two factors, deterministic playback and unchanging content.
Deterministic playback is possible (though extremely difficult) to achieve,
but it really has to be a priority from the beginning, and it has to have its
own testing suite and fixing regressions has to be a priority. It's impossible
to retro-fit it onto an existing game (sometimes by design, as non-determinism
occasionally is mistaken for randomness). Unchanging content is, obviously,
unlikely to happen during development and is much more likely to be possible
during betas, but during those time periods you have much more time to devote
programmer and content developer time and attention to ancillary features like
expanding bot programming.

~~~
kevingadd
The upside for the simulation problem is that many multiplayer games have
something approaching bot AI to begin with - enemy AI - so you can use that as
a starting point for those sorts of tests. I'm not sure if that was ever done
at scale for GW testing, but it would have been possible...

My understanding is that the replays were possible because the game engine was
architected from the beginning with the server 100% in control of game events
and the clients relying on message-passing to communicate with the server.
This means that the recording captured on the server is a definitive recording
of what actually happened in the game. Clients could desync, so a replay might
not reproduce that, but otherwise they were pretty accurate.

I'm not sure how the unchanging content problem was solved. I think everything
was revision-controlled aggressively enough that it would have been possible
to set up a local server instance using old content, and it's also possible
that most ordinary changes (altering a texture, a map, etc) wouldn't actually
change the results of a replay. Definitely a challenge, though.

Replays came in handy later on for other features (like letting players watch
others' PvP matches in-game) so they probably ended up carrying their weight
in terms of development effort. I do wonder if they felt like a burden
initially, though, before their value became clear...

~~~
endianswap
Dealing with the many scenarios that could break replays was problematic, but
being able to run several instances of the same game at the same time and
compare them occasionally for desyncs can help alleviate that. For example,
using a pointer value (address) as a hash table or sort key means that
depending on allocation order, iteration of game entities may be different
from one run to another.

------
BryantD
Great read.

I disagree with him on bots, though. CCP has a highly evolved bot system which
is tied into their build system, so that unit tests can include player
behavior. This isn't directly relevant to load testing (although CCP does that
with bots as well), but it certainly demonstrates why bots are worth the
effort.

And if you've got a full-fledged QA department, bot writing is a good QA
engineer task. This is also a decent way to give ambitious QA guys a coding
task that doesn't have the potential to affect customer-facing game play.

Without violating NDAs I can say I've worked at MMO companies that found bots
invaluable for load testing. Replays will definitely catch things that bots
won't, but if you're thinking about how many players you can fit in a zone,
etc., etc. bots are great.

~~~
endianswap
Pat's speaking for himself here, of course, as evidenced by the fact that bots
were crucial for load testing GW2 (they ran on Amazon EC2, even, so that they
could scale dramatically high.)

------
darklajid
A very interesting read, but I'm missing one step:

Game recording is established as a favorable way to make games better, to
avoid a crappy beta experience or problems right after launch.

How are you going to get decent recordings in the first place? Playing with a
couple developers seems a lousy way to jump start a good selection of games,
is it? How is this chicken and egg problem solved? I miss a way to bootstrap
the process, load a good set of (varied, representative) replays?

~~~
hythloday
A game the scale of GW will have a dedicated QA department with at least a
half-dozen testers, who are constantly playing the game anyway.

~~~
kevingadd
This is pretty much exactly how it worked when I was there. I imagine when the
team was smaller, they leveraged a combination of developer recordings +
recordings from volunteer testers (the game had a group of volunteer testers
in the high hundreds by the time I started working on it).

In particular QA recordings got used a lot to reproduce particularly tricky
bugs and race conditions, since players usually weren't savvy enough to spot
them.

The volunteer tester group was also useful for load testing, since it gave us
a way to throw a hundred or so players at one of our development servers and
record all of them simultaneously. It gave a reasonable approximation of what
the load might look like on a single production server and it meant we could
also capture recordings of how the servers (and clients) behaved under load
instead of just having 4-8 developers playing on them.

~~~
BryantD
Any comments on the infrastructure for capturing the replays? Did everything
get recorded? I'm used to recording all actions, ability uses, etc. but
recording movement isn't something I've seen done all that often and it seems
like it'd be another order of magnitude of data.

~~~
endianswap
Think of a web server with a fresh SQL database with no data (other than
whatever is pre-populated there by the installation.) Simply record all
incoming messages to the server over a period of time and save it to disk.
Then, when you want to replay that recording, start with a fresh database
again and replay all the messages. Assuming your system is deterministic (no
race conditions, generally not relying on a wall clock [there are tricks for
this, since it's a harder problem[1]]) and you're good to go. The key is
designing everything to support this from the beginning.

[1] Done right, you can play a minutes long recording back in seconds, which
is very useful for trying to fix a bug and verifying it against a known-bad
replay.

------
carlisle_
I was confused about the part where he mentions a rack full of 1Us was
generating too much heat and the solution was to put less servers in a rack. I
can't think of a situation where there is nothing at fault besides the
servers. The chillers should be able to keep up with 1Us like that, and it's
likely there was something else going on.

The situation we're running into at my job is with double density servers
(i.e. Dell C6105's). The problem we're having with heating double density
servers. Maybe somebody with more experience than me can explain why cooling
1Us would ever be a legitimate problem?

~~~
illuminate
"I can't think of a situation where there is nothing at fault besides the
servers. The chillers should be able to keep up with 1Us like that, and it's
likely there was something else going on.

...

Maybe somebody with more experience than me can explain why cooling 1Us would
ever be a legitimate problem?"

The situation in server labs, hardware and infrastructure-wise was much worse
seven years ago. Is it that hard to imagine?

~~~
carlisle_
I was in high school 7 years ago so yes, it is hard to imagine because I
really don't know how things used to be. I do appreciate the condescending
tone though.

~~~
illuminate
It wasn't consdescension, the design and supporting environment really was
that bad.

