

Synchronous RTS Engines and a Tale of Desyncs - forrestthewoods
http://altdevblogaday.com/2011/07/09/synchronous-rts-engines-and-a-tale-of-desyncs/

======
DCoder
C&C series, at least the older pre-Generals ones, work the same way - hashing
the full state in each frame and sending it over. When a desync is detected,
it dumps checksums of the last 256 frames along with current frame's checksums
of each object into a log file, these can be compared. The logs also contain
brief printouts of most objects on the map, and we actually found an
uninitialized field and a few more serious bugs thanks to these printouts.

The cheaters also found a small loophole in the hashing, because only specific
fields are hashed and not entire objects, one important field was missed that
gave cheaters an unfair advantage of seeing the current targets of their
opponents' units. It is easy enough to fix when you know about it,
fortunately.

------
skittlebrau
The Age of Empires/Age of Mythology series (and Halo Wars) work in an
analogous fashion. By the time we were working on AoM, we had a more
sophisticated hashing system that allowed the actual program flow to be hashed
to some extent (via lots of manual insertion of sync statements in relevant
spots). This allowed you to have some insight about how a given client got
into a certain state compared to the others. If you just hash the final state,
it can be quite mysterious how it ended up wrong. For example there might be
hundreds of actions that can change the hitpoints of a unit. If you see
hitpoints go out of sync, it helps narrow the search space for the bug a lot
to know something like "it was catapult 123's AOE curve calculation for archer
456" that diverged.

------
roryokane
I wouldn’t think that if you get a desync error the only option is to quit.
Couldn’t you give the user the “Desync Detected” dialog box, but then allow
them to choose a player whose version of the game state should be considered
the correct one? Then you would wait a few minutes while the most-chosen
player sends all players all state in their copy of the game, and then the
game can resume from that state. It may frustrate the user to have a long
pause in the middle of their game, but not as much as it would frustrate them
to have all of their game progress so far lost.

You would choose a player to be considered to have the master copy by who is
least likely to be cheating. In the case of a tie in votes, the player with
the master copy could be chosen with a deterministic random number generator
instead.

~~~
nopassrecover
I think the point was that resyncs (exchanging game state) are too intensive
(hence the prohibition on "join in game") - as you say if people are prepared
to wait a few minutes it's less of an issue. Having said that, this explains
load screens on joining multiplayer games much better for me (first load is a
state sync, but subsequently input-sync could be sufficient).

~~~
algorias
I think we'd be talking about seconds here, not minutes. 1K units might be a
lot to update in real time while doing a bunch of rendering etc, but a
dedicated sync pause shouldn't really take very long in isolation.

------
Periodic
Posts like this make me want to switch fields again! Video games can have so
many interesting problems to solve.

One thing it does make me wonder about is the difference between pure code and
code with side-effects. Knowing large regions of your state-transformation
code is pure would go a long way towards limiting the possible scope of these
desync bugs. The fact that many of them are uninitialized variables and
padding is interesting as well and makes me wonder what the language could do
to prevent these sorts of problems.

------
sehugg
My A*teroids clone "Comet Busters!" did this way back when, although you were
limited to two sides (each of which could have 1-3 players) and the only
inputs were thrust, fire, left, right, "special ability". This was around the
time DOOM had come out or was about to come out.

Everyone had to exchange their inputs in one frame and then evaluate them in
the next, so you were always a frame behind (1/24 of a second or so, or maybe
1/12). All math was fixed point which made it a bit easier. There was no
resync; the game would just end if the checksums didn't line up, so lots of
bugs were discovered this way :)

Lots of time was spent figuring out how to handle start-of-game, since either
player could initiate and we had to agree on a common screen size since it
affected game play.

~~~
wlievens
Did you write _Comet Busters!_? That's awesome! I played that game so many
times I can still hear the sound effects in my head over a decade later. I
really loved the version where you could blow the other players up.

Kudos to you :)

------
scottlu2
In the iPhone version of Warfare Incorporated, testers were geographically
somewhere else which made this problem even more fun. A history of the state
on each client that led up to the divergence is needed to find these problems.
When hashing game state, WI clients maintain a log of the state contributing
to the hash at a useful level of detail. The log gets trimmed as the server
acknowledges hash matches. On hash mismatch, clients post this log to a server
for analysis.

------
cdcarter
"Adding insult to injury, one of the most common cases is an uninitialized
variable."

Somebody needs to turn on compiler warnings!

~~~
nhaehnle
... and run Valgrind (or equivalent) occasionally.

We're using the parallel simulation technique in Widelands. Only using integer
arithmetic, compiling with -Wall and running Valgrind occasionally has kept us
very safe. The only recent cases of desync I can remember is when either (a)
somebody added a new feature and messed up the network protocol (which is
usually very obvious because the desync happens just after the new feature is
invoked) and (b) somebody new to the code uses a std::set<Foo*> to iterate
over a collection of objects. Basing simulation decisions on pointer values is
a bad idea.

------
coenhyde
Nice read. I assumed this is how massive RTS's handled their multiplayer but
now I know for sure.

------
ryandvm
I wonder if network connectivity has improved to the point that it would be
better to run a single gameplay engine on the server and let the clients
simply run the visuals.

Semi-related: Does Javascript have the chops to power a multiplayer RTS? I
keep thinking to myself that something comparable to the original Age of
Empires would be quite doable in HTML5. Not only would it run on modern phones
and tablets, a Google Maps style touch interface would be ideal for flinging
around the map and controlling units.

~~~
nhaehnle
The mod-turned-hobbyist-turned-free-software RTS 0A.D. uses JavaScript for a
significant part of the game code. The engine itself is in C++. Some alphas
have been released: <http://wildfiregames.com/0ad/>

Talking about the _entire_ game in the browser, it clearly depends on how far
you're willing to compromise in the feature set. Path-finding will likely be
the most painful part for a purely human-vs-human game, and opponent AI will
probably add some headaches as well.

------
fishtastic
This explains many mysteries from Starcraft 1: bad replay file after patching,
small replay file sizes, low network traffic usage from multiplayer even on
big maps.

The hash value sent between player also hints the mechanism for drop hack. My
guess it that the hacker would pretend to be the victim and send a bad hash
values to every other players. The hash check would fail and the victim would
just end up disconnected from the game. Haven't heard of such hack in SC2,
wonder how they fixed it.

------
baq
had experience very similar to the demigod one in spring rts (an open source
rts engine). took two days with a replay in hand - and i don't mean it in a
bad way, we didn't use a custom allocator so i could use the windows debug
heap calls.

