Hacker News new | past | comments | ask | show | jobs | submit login
Synchronous RTS Engines and a Tale of Desyncs (altdevblogaday.com)
83 points by forrestthewoods on July 10, 2011 | hide | past | favorite | 18 comments

C&C series, at least the older pre-Generals ones, work the same way - hashing the full state in each frame and sending it over. When a desync is detected, it dumps checksums of the last 256 frames along with current frame's checksums of each object into a log file, these can be compared. The logs also contain brief printouts of most objects on the map, and we actually found an uninitialized field and a few more serious bugs thanks to these printouts.

The cheaters also found a small loophole in the hashing, because only specific fields are hashed and not entire objects, one important field was missed that gave cheaters an unfair advantage of seeing the current targets of their opponents' units. It is easy enough to fix when you know about it, fortunately.

The Age of Empires/Age of Mythology series (and Halo Wars) work in an analogous fashion. By the time we were working on AoM, we had a more sophisticated hashing system that allowed the actual program flow to be hashed to some extent (via lots of manual insertion of sync statements in relevant spots). This allowed you to have some insight about how a given client got into a certain state compared to the others. If you just hash the final state, it can be quite mysterious how it ended up wrong. For example there might be hundreds of actions that can change the hitpoints of a unit. If you see hitpoints go out of sync, it helps narrow the search space for the bug a lot to know something like "it was catapult 123's AOE curve calculation for archer 456" that diverged.

I wouldn’t think that if you get a desync error the only option is to quit. Couldn’t you give the user the “Desync Detected” dialog box, but then allow them to choose a player whose version of the game state should be considered the correct one? Then you would wait a few minutes while the most-chosen player sends all players all state in their copy of the game, and then the game can resume from that state. It may frustrate the user to have a long pause in the middle of their game, but not as much as it would frustrate them to have all of their game progress so far lost.

You would choose a player to be considered to have the master copy by who is least likely to be cheating. In the case of a tie in votes, the player with the master copy could be chosen with a deterministic random number generator instead.

I think the point was that resyncs (exchanging game state) are too intensive (hence the prohibition on "join in game") - as you say if people are prepared to wait a few minutes it's less of an issue. Having said that, this explains load screens on joining multiplayer games much better for me (first load is a state sync, but subsequently input-sync could be sufficient).

I think we'd be talking about seconds here, not minutes. 1K units might be a lot to update in real time while doing a bunch of rendering etc, but a dedicated sync pause shouldn't really take very long in isolation.

I think this would essentially amount to having save/load game for multiplayer, and how often have you seen games have one of those?

Still, desync errors are pretty common in early versions of RTS, I remember pulling my hair out while playing Starcraft and getting desynced non-stop. Or maybe it was the cheaters!

Multiplayer save/load is not prevented by technical issues as much as it is by development time constraints and the ideal that such a situation is unnecessary. A scenario editor is not that far removed from dumping the current game state.

Robustness is no incentive either, since people will put up with a lot of bugs from games.

Posts like this make me want to switch fields again! Video games can have so many interesting problems to solve.

One thing it does make me wonder about is the difference between pure code and code with side-effects. Knowing large regions of your state-transformation code is pure would go a long way towards limiting the possible scope of these desync bugs. The fact that many of them are uninitialized variables and padding is interesting as well and makes me wonder what the language could do to prevent these sorts of problems.

My A*teroids clone "Comet Busters!" did this way back when, although you were limited to two sides (each of which could have 1-3 players) and the only inputs were thrust, fire, left, right, "special ability". This was around the time DOOM had come out or was about to come out.

Everyone had to exchange their inputs in one frame and then evaluate them in the next, so you were always a frame behind (1/24 of a second or so, or maybe 1/12). All math was fixed point which made it a bit easier. There was no resync; the game would just end if the checksums didn't line up, so lots of bugs were discovered this way :)

Lots of time was spent figuring out how to handle start-of-game, since either player could initiate and we had to agree on a common screen size since it affected game play.

Did you write Comet Busters!? That's awesome! I played that game so many times I can still hear the sound effects in my head over a decade later. I really loved the version where you could blow the other players up.

Kudos to you :)

In the iPhone version of Warfare Incorporated, testers were geographically somewhere else which made this problem even more fun. A history of the state on each client that led up to the divergence is needed to find these problems. When hashing game state, WI clients maintain a log of the state contributing to the hash at a useful level of detail. The log gets trimmed as the server acknowledges hash matches. On hash mismatch, clients post this log to a server for analysis.

"Adding insult to injury, one of the most common cases is an uninitialized variable."

Somebody needs to turn on compiler warnings!

... and run Valgrind (or equivalent) occasionally.

We're using the parallel simulation technique in Widelands. Only using integer arithmetic, compiling with -Wall and running Valgrind occasionally has kept us very safe. The only recent cases of desync I can remember is when either (a) somebody added a new feature and messed up the network protocol (which is usually very obvious because the desync happens just after the new feature is invoked) and (b) somebody new to the code uses a std::set<Foo*> to iterate over a collection of objects. Basing simulation decisions on pointer values is a bad idea.

Nice read. I assumed this is how massive RTS's handled their multiplayer but now I know for sure.

I wonder if network connectivity has improved to the point that it would be better to run a single gameplay engine on the server and let the clients simply run the visuals.

Semi-related: Does Javascript have the chops to power a multiplayer RTS? I keep thinking to myself that something comparable to the original Age of Empires would be quite doable in HTML5. Not only would it run on modern phones and tablets, a Google Maps style touch interface would be ideal for flinging around the map and controlling units.

The mod-turned-hobbyist-turned-free-software RTS 0A.D. uses JavaScript for a significant part of the game code. The engine itself is in C++. Some alphas have been released: http://wildfiregames.com/0ad/

Talking about the entire game in the browser, it clearly depends on how far you're willing to compromise in the feature set. Path-finding will likely be the most painful part for a purely human-vs-human game, and opponent AI will probably add some headaches as well.

This explains many mysteries from Starcraft 1: bad replay file after patching, small replay file sizes, low network traffic usage from multiplayer even on big maps.

The hash value sent between player also hints the mechanism for drop hack. My guess it that the hacker would pretend to be the victim and send a bad hash values to every other players. The hash check would fail and the victim would just end up disconnected from the game. Haven't heard of such hack in SC2, wonder how they fixed it.

had experience very similar to the demigod one in spring rts (an open source rts engine). took two days with a replay in hand - and i don't mean it in a bad way, we didn't use a custom allocator so i could use the windows debug heap calls.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact