Hacker News new | past | comments | ask | show | jobs | submit login
1500 Archers on a 28.8: Network Programming in Age of Empires and Beyond (2001) (gamasutra.com)
343 points by tosh 8 months ago | hide | past | web | favorite | 95 comments

From another Ensemble Studios interview with Gamasutra (emphasis mine):

"6. We did not plan for a patch. The version 1.0a patch, even though it was a success, was problematic in that as a company we had not planned for it. THE GENERAL ARGUMENT IS THAT IF YOU KNOW YOU ARE GOING TO NEED TO RELEASE A PATCH, THEN YOU SHOULDN'T BE SHIPPING THE GAME IN THE FIRST PLACE."

Those were the days!

(Author of the original article here) It was really pretty naive - nobody would plan for an RTS now without planning a series of patches to do the inevitable adjustments. Even for AOK we planned to patch and adjust. The 'general argument' was a MS publisher stance as quoted by Matt Pritchard - to do a patch in those days through the MS system meant a lengthy and expensive full test process, rollout, creation of a patch system, etc - so it was something you planned and budgeted for. The concept of 'day-one-patch' would have been pretty horrific. Patching is now something you integrate, plan for and expect - because we aren't shipping on gold masters to a printing company.

Even if we lived in a world with gold masters there would still be one reason for patches: Balancing.

My thinking is to expect patches when making a multiplayer game. There's just no practical way around it

The Command & Conquer approach (used by them and some others in the era of gold masters) was to release expansions to the game that incorporated the balance and other fixes as part of the release cycle of the game. In those days they didn't do independent patches because of the logistical complexity. It was an entirely different ecosystem before the Internet was common-place.

This is one of the reasons why Brood War exists.

Welcome to HN :)

Which is a ridiculous statement - I remember the number of 90's games that were utterly unplayable in that they'd crash constantly or do other ridiculous things (The remake of Temple Of Elemental Evil would actually delete your C: drive upon uninstall). I feel like things haven't gotten worse as far as the need for day 1 patches, they've just gotten more pragmatic on their deployment.

As a dev on modern AAA titles - the main driving reason behind Day 1 patches is that you do the first submission to Sony/MS/Nintendo 2-3 months before the game is about to be released. We just did a 1st sub on a title coming out in November. That means you still have potentially hundreds of people on the project for few months until the game actually comes out - so they might as well keep working on the game and their work is released as day 1 patch. Obviously we don't purposefully leave bugs and features until day 1 patch - but in the extra time we might as well add a few things and fix some bugs.

I'd never assume you left them on purpose. Have you worked with Steam? What sort of turn around time do they require?

Steam is super quick, especially with a company of our size, we basically have the necessary permissions to publish when we want to, it requires very little from Valve.

What do Sony/ms/Nintendo do in those 2-3 months ?

Compliance testing. They usually take a few weeks to come up with a list of things that are non-compliant that we have to fix for release, then we do 2nd submission, and then they take another few weeks to test that. But even if the game passes on 1st sub, it still takes weeks for the game to be actually burnt to discs and distributed across the world.

Ugh, the constant bugs and crashes were the reason I never finished Fallout 2.

I gladly accept the occasional 10gb update in return for a game that doesn’t crash multiple times a day or week.

That almost-relevant username.

Anyway, I don't get how updates are so big these days. The games as well, but I guess with some lazy programming... but the updates? I really wonder how hard it is to reliably patch a binary without replacing the whole binary, since apparently it outweighs the additional bandwidth cost for pretty much every company that does any sort of software patching.

One problem you face here is versioning. You can have tiered systems where you have a patch from version 1 to 2 to .. n. And then if somebody wants to go from version 1 to n, you just incrementally apply each and every single patch. Or you can just have a replacement 'patch' that means no matter what version or setup you're on, you're good to go following the exact same process.

And in some cases this is not only easier, but also can provide major performance benefits. For instance imagine an update changes something about a local database that requires some expensive process like reindexing or something. And then another update carries out further changes, and so on. Going straight from 1 to n instead of 1 to 2 to ... n can be vastly more efficient from a user perspective.

But if you're seeing huge updates then more often than not it's probably data, not code, though the same story applies there.

it's not only binaries that are patched. Data files, which can be of multiple gbs are patched too. You can either delta patch them, requiring significant cpu time or replace them entirely. Most devs prefer the latter

Game patches aren't GB in size because of code.

Yeah, it's rare I drop out of a game due to an error now, but during DirectX 5 it would happen multiple times an hour.

Heh, well it is called the Temple if Elemental Evil - so that is to be expected

You must have been terribly disappointed with NetHack.

Didn't expect to talk about BBS doors today!

I know I was.

And in 2018 with Internet connections a bit better than a 28.8k modem and hardware slightly more powerful than Pentium 90, the HD remake of AoE2 fails to deliver a reliable multiplayer gameplay, having lag of multiple seconds and at least as many out-of-sync bugs as the original game did 20 years ago. It should be closer to 150 than 1500 archers with the HD edition.

This is one of the many reasons why I refuse to buy AoE2. The "new" version does not offer multi-platform support, it does not offer a stable multiplayer, it requires half a gigabyte of RAM instead of the previous 32MB, and it still costs as much as I pay these days for some of the smaller but brand new games (not one they made profit on for almost twenty years already without any maintenance).

if you play it on voobly versus steam, there is almost no lag and plays super smooth

the unreliable gameplay is mainly on steam. if you go play aoe2 on voobly, the lag basically disappears and feels smooth as butter.

If you own the hd version, there is a patch to freely convert it to voobly.

This might be a side-effect of CPUs being much less deterministic than the older hardware, leading to even harder to track down out-of-sync bugs.

CPU timing might be more non-deterministic, but the result should still be the same...

As a low latency coder, I found it fascinating.

It turns out the game is synchronized by each player having the commands for each 200ms "turn" a couple of turns in advance, and then playing the actions so that the same happens on all player's machines. That includes sending random seeds around. And then there's a load of provisions for lost packets, slow machines and thus forth.

Is this how most games do it? I would think something like WoW couldn't do this, and indeed sometimes I'd see glitches where a character would blink (like the spell) to somewhere new.

Welcome to the world of game network programming :) it's an exciting place!

Yes, you have to be more-or-less real time, so you must compensate for latency, unreliable/slow connections, jitter, etc. If you're used to the web and the request/response model, you have to throw all of this out the window. The 200ms delay "hack" is pretty much standard practice, the window will differ from game to game (smaller in FPS's), but it's usually there.

Most games use UDP, since transmission of any single package doesn't have to be reliable, and in case some packets are lost, it's cheaper to re-calculate the state diff and resend one slightly larger package instead of two (or more) standard-size packages. Sometimes this can result in a "blink".

With sending around seeds and other "secret" data, you have to make a trade-off, since sending too much allows for cheating (map hacks, wall hacks, etc), but sending too little will create unpleasant surprises (enemy "teleporting" from around the corner).

Also often it's cheaper to run most of the calculations on the client (even including the critical stuff like hit tests, damage calculation, etc), and only occasionally verify the results on the server - especially in MMO's. Clients that are found suspicious get verified more often, and eventually get penalized / kicked.

Source: never actually wrote a networked game, but love reading about this stuff.

Am network game developer for over a decade. Stuck in desync. Send packets.

> Source: never actually wrote a networked game, but love reading about this stuff.

Got any favourite sources where I can learn more? It sounds pretty interesting!

Sorry for a late reply! Some good starting pointers:

https://www.gamedev.net/ https://www.reddit.com/r/gamedev/ https://www.reddit.com/r/truegamedev/

Some interesting case studies are anything by Id Software (Quake etc), and Lineage (that's mostly tales of a friend who is a hardcore player and a developer; he'd have the relevant source code open in a separate window while playing).

Cool, thanks.

No, most games don't do it this way now. When I created this type of networking in 1994 it was to solve the particular problem of lots of units and low bandwidth. Now where bandwidth is much less consideration games typically use an authoritative server (even if that server is a 'headless' process that runs on a machine with a client). All clients send turns to the server and it sends out the authoritative results to all the clients. Unreal and Unity both have some documentation I believe on how their networking works at the high level - they are really adequate for most cases.

Some links:


Unreal Networking Architecture: http://unreal.epicgames.com/Network.htm

88 points by jbrennan on Jan 8, 2011 | 12 comments


Creating a simple multiplayer example: https://unity3d.com/es/learn/tutorials/s/multiplayer-network...

I need to start saving HN goldmine threads like that as well. Thanks

There are still limitations to this architecture and instances when you have to run something else. If I would be doing an RTS with thousands of units, I'd still prefer the lock-step version described above, for example. Or, for a more concrete example, right now I'm revamping multiplayer architecture for a game with a huge open world — and the only way to get it to work without paying for dedicated server farm is to run a distributed authority key-value storage.

It depends on the genre and the specific game. RTSs can do it this way and some fighting games do, although some fighting games use a variation that involves predicting player input and rewinding the game state on misprediction in order to reduce perceived latency. FPSs tend to use a totally different model that involves sending only the most recent relevant information to each player and trying to balance server authoritative cheat-prevention with allowing clients a little leeway in order to make the twitchy gameplay feel better. There's a great paper on Tribes' networking model that is still largely relevant for FPSs.

I believe wow (and most MMO's) instead take input from each player and run the simulation completely on the server. Although often enemy and NPC positions aren't synchronized exactly between players. There's also much more tolerance for latency in crowded situations - it doesn't really matter if the ~30 other player characters lag behind their actual position in a town by even 5 or 10 seconds, you only need players in your party or who are fighting against you to be low latency.

This is essentially the way eve online does it (or did when I last played). The simulation is entirely server side, the client interpolates and every so often gets the authoritative version from the server. You always knew you’d dropped when your ship would still be flying around shooting at things but wouldn’t take any input.

They had another interesting wrinkle to make it handle huge populations. Having enough players in the same region would trigger “time dilation” and slow down the simulation. In a big fight (thousands of players), it could take 10 real minutes for one minute of game to pass. It made big battles a slog, but at least they were possible.

My understanding was that WoW would do essentially what's described here for the player's character and entities in the immediate area (simulate them for the next few frames), but the server also maintained a view of the world. The client would occasionally sync with the server, and that's when you would see the blinks.

If you're curious, this[0] is an actively maintained implementation of a WoW server.

[0] https://github.com/mangoszero/server

I worked on a big commercial MMO engine for about a decade.

In our implementation, the player runs ahead on the client (client autonomous) but server verified (actions replayed on the server). The new authoritative server position for the player is sent back to the client, and the client replays whatever player movements last made since the response to a movement is heard back from the server, with the player resynchronizing to that point in time where you moved against server objects and playing forward from that point transparently; the client (and server) maintain a short queue of movement history for each moving entity. Thus, if you ran into something on the server but it didn't obstruct your movement much, you would tend to blip much less. The physics framerate was very low compared to graphics framerate, and there would be some degradation in updates received on the client by distance as throttled by the server based on area of interest management. Position updates represented most of the bandwidth of the game. Everything is UDP based with different forms of reliability options on top.

NPCs were "server authoritative" and their actions are replayed on the client. Interpenetrations are resolved via rigid body physics resolution on the client if something blips, but the server is ultimately the source of truth (nothing can interpenetrate on the server), so if a rigid body resolution on the client doesn't resolve some condition, the eventual resynchronization of player position from the server would make it happen at some point.

It worked out pretty well most of the time; certainly you can construct many scenarios where it goes awfully bad from the perspective of a client (on the server everything is always fine), but we preferred the illusion of immediate feedback/low latency versus this queuing up everything to take place N milliseconds in the future, and we didn't need exact reproducibility between clients, just eventual (and hopefully pretty quick) consistency.

Were there collisions between players and players/NPCs? Or only on terrain.

It's mostly rts can afford that model due the relaxed requirements on input frequency.

Many other games require higher frequency such as driving, shooting and anything that has physical simulations.

They instead use prediction and reconciliation and aim for the highest update frequency

Counter strike has servers at 120hz iirc.

Here’s some pro/cons of the approach http://www.gabrielgambetta.com/client-side-prediction-server...

edit: try the live demo too http://www.gabrielgambetta.com/client-side-prediction-live-d...

Reminds me of this GDC talk:


Similar approach, with some extra fanciness (you see the input at a fixed delay, always, and roll back if necessary to handle mispredict).

> Part of the difficulty was conceptual -- programmers were not used to having to write code that used the same number of calls to random within the simulation

Can anyone grok this? I can't see why this (each player's simulation making the same number of calls to random) would ever not be the case if all players are running the same patch of the game and are executing the same commands.

(Original author here) - here was a really hard-to-find bug - in some instances more than one quantity of fish could be placed in the same location - that meant the game would work fine until someone fished that same fish the second time and the fishing boats would diverge in the different simulations. The world sync check only counted the tile contents so we didn't see it. - There were actually two RNG's in the game, one was synchronized with the same start seed on all machines (basically the same random pool) for combat and whatnot - the other was unsynchronized and used for animation variance, etc -things that weren't gameplay related. Not knowing when to use one of these specifically (e.g. animals facing seems like animation but definitely affected gameplay if they could be hunted) could alter the code path and cause an out-of-sync condition.

Why even keep the unsynchronized RNG? I'm guessing it was more performant when used appropriately?

If you don't have an unsynchronized RNG, then any unsynchronized content can't use an RNG, and unsynchronized content is important for improving a game's experience by using local resources for things that don't really need to be shared or sent over a network.

For example, maybe in a FPS, part of the non-gameplay-critical graphics use particle generators for a cool effect that not all players see (because it's behind a building for some of them and thus doesn't even need to be rendered); if these generators used a synchronized RNG, then all players would have to do computations for every particle effect happening anywhere, just so that the combat and more game-important RNG values would be in synch when they really need to be.

There were a lot of extra checks and code involved in the synchronized RNG - the results were loggable (to track down sync failures) for instance so bloating that up with every random fire flicker from a burning castle would have been crazy.

I wonder if that’s something a type system (or Hungarian notation!) could have helped with.

You might want something randomized in animation -- and you might therefore only run it if something is onscreen.

If you're using a global random pool, you've descyned here unless all players have the same thing on screen.

What they mean is that if you're doing anything that samples random, don't do it inside of something like a loop whose length is dependent on the local game state that isn't sync'd on the current turn; especially in the graphics engine for doing particle effects or picking random animation frames. Like if you sample random every animation update frame to pick which fire graphic to use on a torch or bonfire (very common), but you suffer graphic slowdown and sample it not enough or too many times, that could vary between clients unintentionally.

Once it comes time to simulate that next turn, if you have something different than other clients because of a missed update or graphics lag, even if object positions and the random seed going to be "fixed" by another turn update, all future interactions with any objects that over- or under-sampled random will be wrong and could create further sync problems.

For instance, anything that's based on the current user's window rather than global state.

Ah yep, that makes sense. I wasn't thinking in terms of what had to happen for actual gameplay, just simulation state.

For anyone complaining about aoe2 lag, it is mainly just the version on steam.

if you apply the compatibility patch, you can load your steam copy onto voobly for free and play with next to no lag. The difference is night and day plus better features with Voobly.

Thanks. Is there any reason why don't they just fix the steam version?

>For RTS games, 250 milliseconds of command latency was not even noticed -- between 250 and 500 msec was very playable, and beyond 500 it started to be noticeable.

I wonder if the rise of competitive RTS has changed this guideline. In SC2 people will start to complain if their ping to the server is more than about 130ms, and anything above 150ms starts to become painfully noticeable.

In Brood War being able to play on "LAN Latency" was a always a huge deal---to the point that unauthorized third party ladder systems enabled players to play at "LAN Latency" even when battle.net official didn't support it.

Human cognitive reaction times are a lot mroe latent than connections are now. The quality of play percieved by the players in the studies I did back in the day was much more about consistency of responsiveness and not that direct number of milliseconds. Best possible Human reaction time for cognitive tasks is still around 250msec for most players - it would be great to see updated information for tournament players (who are a different class of player) on what their actual perception-to-action time is in their favorite games. AOK and beyond code used an adaptive scaling system to go faster when the network would reliably move packets more quickly - so it would auto-adjust to 'lan speed' (actually a combination of the best render speed of the slowest PC plus an estimate of the round-trip latency). Also - the command confirmation is not waiting for RT latency - you are getting confirmation when the command goes into the local buffer - 'command accepted' - once that happens it is going to execute so you get the confirm bark sound from the unit or building queue - or the movement arrow triggers. The games actually do their command simultaneously when all machines are executing the turn.

>Human cognitive reaction times are a lot mroe latent than connections are now. [...] Best possible Human reaction time for cognitive tasks is still around 250msec for most players

This is an important realization. Our brain can perceive reasonably fast actions, but our reaction is much slower. Under good conditions we can easily tell a 60fps animation from a 30fps animation from a 10fps slideshow, but the fastest reaction time we can manage is around 100ms (the time one frame is visible at 10fps).

We are reasonable tollerant to latency because our brain has quite high latency, and all our actions have to account for that (for example the point in time where you decide to release a ball you are throwing is very different from the point it is actually released). On top of that, many real-world interactions behave similar to latency (e.g. springs). What throws us off is inconsitent latency, because then we are suddenly not able to predict when to perform an action in order to have the effect at the desired point in time.

Yeah, there are a few nuances to latency numbers.

The 250ms pure read-react time deals with arbitrary events, but when we can chunk reactions into a practiced technique our precision goes way up, to nearly the individual millisecond: thus musicians can play rapid passages with unusual rhythms in time if they have time to plan and prepare, but they lose this ability if dealing with unusually high latency(extreme reverb, amp across the stage, digital audio with huge buffer sizes). The technique, after all, is based on fast confirming feedback that your execution is correct.

And like you say, "bouncy" latency is even more disruptive. We can adjust to a small and consistent lag, but inconsistency will degrade any level of skill.

Agreed on all counts.

> And like you say, "bouncy" latency is even more disruptive

The technical term for this is "jitter". Networking and telecoms people pay a lot of attention to this metric, both for the reasons you cite and because jitter is much more noticeable than high-but-constant latency in voice or video communications.

It's not about any competitive RTS: StarCraft (1 and 2) have a game design and balance that emphasizes speed and reaction time. More "contemplative" RTSes with less or slower micromanagement can get away with higher latency.

The "command latency" you feel in RTS games is more closely related to RTT than ping, so those numbers are doubled already. Besides, people's expectations of performance and low latency have changed the last 20 years.

A command latency over 30ms hurts a lot. And I don't even play competitively. 250ms was not noticed? Absolute nonsense. A quarter of a second is gigantic when it comes to latency.

But that was a time when sub 100ms latency for networked gaming had never really been experienced. If you never played faster you wouldn't notice it when it's slower, I'd think.

People played lan games all the time with very low latency - I actually had to patch the code specially for the exhibition match at the AOK ReleaseToManufacture party because the Lan they were using had <1 msec latency - we had never had close to that even in our office so my code didn't handle it well and performed poorly on that optimal network.

LAN times were typically 14 to 20 msec ping on our blazing fast 10Mbits office lan at Ensemble Studios.

I specifically said networked....

I guess I could have specified networked over the Internet......

The word "Networked" could mean either a LAN (Local Area Network) or an internet connection.

The word "networked" is instead used to distinguish from "non-networked" gaming, which involves playing on a single PC. This could be either a single-player game like the Starcraft campaign, a turn-based hotseat game like Civilization hotseat, or a simultaneous shared-keyboard game like Achtung, die Kurve!

A classic. I've been working on doing a minimal Python implementation of some of the concept here: https://github.com/eamonnmr/openlockstep

Nice! Is this in a working state?

For the interested, the unannounced game referenced in the article, "RTS3", is in fact Age of Mythology, released in 2002.

Would it be possible to apply Delta CRDT's to something like game netcode?

I've been wondering the same. Thanks for asking.

I've forgotten all the auto-taunts except "Sure, blame it on your ISP!"

14: start the game already!

My personal favorite. !14 !14 !14

My friends always used to do:

A) Build a navy! B) Stop building a navy! A) Build a navy! B) Stop building a navy!

Nice town. I'll take it.


The competitive community is very small, but it's growing (which is amazing for a game released almost 2 decades ago).

Don't try this at home kids. Floating point determinism is a huge problem, especially with different chips/architectures/compilers out there.

I wouldn't be surprised if their game state ran completely on integer types to avoid the awful x87 instructions.

I'd love to hear more about this, any links I can look at?


Here's a decent primer on it. In general chips implement floating point math differently and you could google terms like "floating point register" and "floating point stack" to get you started. Most importantly, floating point math is not generally commutative due to precision/rounding.


If only it scaled to 300 archers over broadband.

This game is amazing and has aged so well. The only downside is the network pauses and out-of-sync errors that inevitably happen 40 minutes into a two-player game.

I can't remember the last time I didn't set the population limit to 75 in an effort to alleviate network issues.

play on voobly instead, its so much more stable

Why? What has Steam done that makes it so bad?

No I found the Steam version to be a faithful reproduction of the original. I had these out-of-sync issues back in the days of passing age2 around as a folder at LANs.

Its not really steam that is bad, but the AOE2 games netcode that is poor. There are a lot less 'wait for sync' issues with it.


My first 'network' game was Fire Power on the Amiga - must have felt exploring a new world writing those early network games.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact