The other problem with naively measuring the frame duration is that you'll get sub-millisecond-jitter because modern operating systems are neiter "hard realtime" nor "soft realtime", which in turn introduces micro-stutter because your game frames will be timed slightly differently than when your new frame shows up on screen (because of the fixed display refresh rate - unless of course a variable refresh rate is used like G-Sync).
This is also my main pet peeve on the web. You can't measure a precise frame duration (made much worse by Spectre/Meltdown mitigations), but you also can't query the display refresh rate.
In the 80's we took perfectly smooth scrolling and animations for granted, because most 80's home computers and game consoles were proper hard-realtime systems. Counter-intuitively this is harder to achieve on modern PCs that are many thousand times faster.
The last time I implemented a timing loop I came up with an approach that would mitigate jitter and oscillation artifacts occurring due to OS overhead (though not solving the aspect of video refresh itself being misaligned). One of my major assumptions was that I could run the game at higher frequency than refresh and therefore rely on downsampling my output, vs the norm of many higher-end games where the base tick is interpolated up to refresh rate.
1. Derive ideal number of elapsed frames using elapsed time from start divided by framerate.
2. Run updates to "catch up" to ideal frames.
3. Observe "rubberbanding" artifact as game oscillates between too slow and too fast.
4. Add limits to how many catchup frames are allowed per render frame to stop the rubberbanding. Implement a notion of "dropped frames" instead, so that our ideal time accounting has a release valve when too much catchup is needed.
5. Now apply a low-pass filter to the amount of catchup frames over time, e.g. if you have a tick of 120hz but your monitor is at 60hz you are likely to see an occasional high frequency oscillation in how many tick frames are supplied per render frame, like [2,3,2,3,2,3]. Filtering can make this [2,2,2,3,3,3]. (It's been a while since I did this, so I don't recall my exact algorithm)
The end result of this effort is that wall-clock time is obeyed over the long-run(minus dropped frames) but the supplied frames are also able to maintain the same pacing for longer periods, which makes the moment-to-moment experience very predictable, hence smooth.
While I haven't tried it, I think the equivalent thing when interpolating would be to freeze a certain issued delta time pace for some number of frames.
I didn't have "real time" but did have an online (as in 1s/s) animation of telemetry data, and the product was such that "smoother than smooth" just meant we increased the maximum allowed charts until we were just back to "smooth". While adding a little latency helped from moment to moment, over a period of minutes you would see a pattern of stutters that was hard to ignore once you were aware of it.
Ultimately I had to fill the screen as best I could, if the paint happened ahead of the deadline I had to decide on painting ahead on the next couple of frames, or doing cleanup work. There was never enough time for both.
> In the 80's we took perfectly smooth scrolling and animations for granted, because most 80's home computers and game consoles were proper hard-realtime systems.
Proper hard realtime means the software is designed to meet stringent time deadlines. If a deadline is missed then the system has failed.
Soft real time means you tolerate missing one or more deadlines if the system is designed to handle it.
The 80's hardware only ran the game code so there was never any CPU contention. There was no kernel, scheduler, threads or processes. The programmers could wrap their heads round the simpler hardware and use all available tricks to optimize every clock tick to do useful work.
Nowadays we have stupid cheap multicore GHz CPU's for a few dollars with GB of RAM so you brute force your way through everything on a general purpose OS like Linux.
Yes, and also the video and audio chips were "cycle-synchronized" with the CPU (e.g. it was guaranteed that a fixed number of CPU cycles after the vsync interrupt you'd always end up at the exact same video raster position etc...).
OTH making the hardware components "asynchronous" and the timings "unpredictable" enabled today's performance (e.g. by introducing caches and pipelines).
> This is also my main pet peeve on the web. You can't measure a precise frame duration (made much worse by Spectre/Meltdown mitigations), but you also can't query the display refresh rate.
Vulkan has extensions for measuring frame timings. I suspect DirectX 12 does too, given how similar it is to Vulkan.
This isn't the problem, those timers exist and are accessible (unless you're in a web browser). The problem is that your code is essentially running at an unpredictable point in time within the current frame. E.g. if you measure the time duration between the same line of code at the start of your per-frame function you'll get a tiny jitter and never exactly 16.66667 ms (assuming a 60Hz display). I'm not sure what's the exact reason, but I guess there are many things that can throw off this sort of time measurement in a modern operating system (for instance process/thread scheduling).
Modern CPU's already have several counters for 'nominal' cycles (i.e. cycles at a constant frequency). e.g. TSC instructions and Performance-Monitoring Counters.
Converting nominal cycles to time can be unreliable in some cases but not impossible.
I feel like there are some ideas in Fintech that haven't quite leaked over into the gaming space yet.
Has anyone investigated the idea of sacrificing an entire high priority thread to a timer loop that busy waits over a collection of pending timers? Most gaming PCs these days have more than enough cores to get away with the idea of a full-time high-precision timer thread.
In my experiments of this idea, I have had no trouble getting jitter under 100uS in managed languages (C#/Java). Millions of timers scheduled seems to be no problem, assuming mild optimizations are employed (i.e. order by next desired execution tick). With all of this magic turned on, I still have 31 other cores to play with. Task.Delay looks like a preschool art project by comparison to the precision I get out of my (arguably dumb-as-hell) DIY timer impl.
I have used this approach to schedule client frame draws in a custom UI framework, and it is indistinguishable from butter on my computer and across reasonable networks. In my prototypes, the client input events are entirely decoupled from the frame draws by way of a separate ringbuffer/batching abstraction. There is actually no locking in my architecture. Draws back to the client use immutable snapshots of state, so current updates can continue unfettered. The state is only ever mutated from a single thread.
Technically, I sacrifice ~2 threads in my architecture, as the ringbuffer also uses a busy wait technique.
All of this said, if you are willing to burn more power, use more cores, and think outside of the box just a bit, you can do some pretty amazing shit.
Consider these ideas at broader scale too - What if you could amortize the cost of that power-hungry timer thread for thousands of gamers instead of just 1? Are there maybe some additional benefits to moving 100% of the game state to the server and shipping x265 frames to the end users?
It's much easier than that, you just need to use the display refresh rate as your time base.
In the end, the only important event is when your new frame shows up on screen, and this is (a) very precise, and (b) out of your control anyway, at least for traditional fixed-refresh-rate displays.
It doesn't really matter where exactly within a frame-time-slice your per-frame-code runs, or how long it takes as long as it fits comfortably within one display refresh interval, the basic timer resolution that matters is the display refresh rate.
PS: it's more difficult if you also need to minimize input- and networking-latency, that's were it may make sense to to use separate threads I guess. But on desktop PCs (versus game consoles), so much also depends on the operating system playing ball...
PPS: busy looping probably isn't a good idea when power consumption matters (e.g. on mobile)
> Consider these ideas at broader scale too - What if you could amortize the cost of that power-hungry timer thread for thousands of gamers instead of just 1? Are there maybe some additional benefits to moving 100% of the game state to the server and shipping x265 frames to the end users?
I don't know of any online game that doesn't track the official game state on the server. This is required for anti-cheat reasons, if nothing else. However, networking would typically be optimized to send small packets of information, and typically over UDP. If a packet is missed, the next packet will just be overwriting it anyway, so no big deal. Clients would basically just locally simulate game state and then reconcile that against the game state being received from the server.
Also, rendering has to be done per-client anyway, since each one has a different camera state. So there's no economy of scale for doing that on the server rather than the client. In fact, there's probably anti-economy: Servers that aren't rendering don't need expensive video cards unless they're using them for physics.
I have the understanding that more modern games do sometimes use TCP sockets. And obviously modern bandwidth has made streaming frames realistically possible. Hence the recent emergence of streaming game services.
"Serverless" as in one of the peers is nominated as the Source of Truth? Or truly decentralized state? Because the first option is just changing who owns the server.
> The joy of "discovering" this approach was short lived: This system is very well known in the game industry and is actually the keystone of games such as Starcraft II or id Software engines !
Is it very well known in the game industry? I have trouble finding anything about it.
Yup it is well known since a long time. Fun fact: I discovered that technique myself around 1991 (and still have the ASM + C source code for the DOS game using it). The first time I remember it publicly described was on the GamaSutra website in a post-mortem for the first "Age of Empire" game. GamaSutra was/is a big deal among game devs so the technique became "famous" when that post-mortem was made.
I wrote about it in a thread here called: "Why bugs might feel “impossible”" about two months ago and someone commented that the game TerraNova (from 1996) had a fully deterministic engine.
BTW TFA say StarCraft 2 (2007) used the technique but Warcraft 3 (2003) was already using a deterministic engine. Save files for Warcraft 3 games (including multiplayer ones over the Internet) consisted in only recording the player inputs and the "tick" at which they happened. This make for tiny savefiles, even for very long games.
So basically: several of us independently discovered that technique in the nineties. There may have been games in the eighties already using such a technique but I don't know any.
It was more than 20 years ago so I don't remember: quickly skimming through it I don't think so. I still have emails somewhere I exchanged back then with the dev who wrote the article so I may be able to find the date.
"the internet never forgets things" is more of an caution than a promise.
Don't say stupid things that will look even worse in 10 years or sometimes fairly mainstream things that will look terrible in 40 years, because the internet never forgets.
But, you still need to archive stuff you want to reference later. Even if it's still out there, it might be much harder to find.
Yeah extremely well known. Doesn’t mean everyone uses fixed time steps though or talks about it a lot when they do. But you’ll find the concept creeps up a lot in physic engines, deterministic networking/gameplay and in discussions of engine architecture amongst other places.
Whether you call it a step or slice doesn’t matter, tick is another common term for the same concept. Fabien’s solution is mentioned in Glenn’s. Glenn just takes it a stage further afterward and allows the fractional time remaining to be accounted for when rendering.
As I understand it, the inner while loop's is essentially playing "catch up" with the outer while.
For simplicity's sake, let's assume that simulation_time is incremented to 1 in the inner while (simulation_time += 1U).
Further, let us assume that simulation_time is 42 in the first iteration. Now, the inner while needs to execute 42 times until it fails the loop condition (42 < 42).
On the second iteration, simulation_time is equal to 58 (42 + 16). The inner while executes (42 < 58) 16 times now.
On the third iteration, simulation_time is equal to 59 (58 + 1). The inner while executes (58 < 59) 1 time.
The real_time cannot stay the same, as it polls the number of ticks since SDL was initialized. So apparently, simulation_time is always smaller than real_time.
(If for some reason, real_time isn't incremented in an iteration, then the inner loop will not get executed. However, I don't see a case where it can happen.)
Now, inside the update function there is a position update akin to current_position += speed * 16U. Now with the above iterations:
- The first iteration, causes 42 update calls (so current_position is updated 42 times as well).
- The second iteration, causes 16 update calls.
- The third, calls update 1 time.
So we are advancing the position of something variable times. (We are also executing the inner while in variable amounts.)
Maybe I am missing something here, but why doesn't the non-constant number of updates cause jitter? I really don't understand why it works at all to be honest. I am trying to understand it fully.
The simulated time is incremented by 16ms per inner loop a roughly 60hz rate. You’re correct that the simulated time catches up to the real time. Just wrong about the number of sub-steps and how the timers proceed. Technically Fabian’s solution will likely run slightly ahead of real time because of the inner loop condition.
You do get jitter from a changing number of sub-steps per variable step which is why you want to do things like the interpolation in Glenn’s solution. This is also why you still want to do everything in your power as a game developer to reduce the variance in real step size.
> This is also why you still want to do everything in your power as a game developer to reduce the variance in real step size.
Well the main reason is that for either variable time slices or fixed time slices with interpolation, what the time for the simulation/interpolation you want is that for the current frame, but you don't have that until you render it. So what you do is assume that it will take about as long as the previous frame, which is a bad prediction if the frame time variance is too big. If you really want to eliminate the jitter you'd need to simulate/interpolate for a time that you are guaranteed to be able to render the frame in and then make sure that the frame is displayed when that time elapses and not before. This of course increases latency so there is always a tradeoff.
Another cause for jitter is that even measuring the time between two successive frames can be difficult with modern GPUs - the CPU time at the start of the game loop doesn't really cut it: https://medium.com/@alen.ladavac/the-elusive-frame-timing-16...
> The simulated time is incremented by 16ms per inner loop a roughly 60hz rate. You’re correct that the simulated time catches up to the real time. Just wrong about the number of sub-steps. Technically Fabian’s solution will likely run slightly ahead of real time because of the inner loop condition.
Can't the inner while take 1 ms or 1 ns for an iteration? I don't see how the inner while's execution time is at roughly 16 ms.
Okay, if my number of sub-steps is wrong, then I am really missing something here. Just trying to understand what exactly is wrong with my thinking.
It is supposed to be so simple, yet I have real trouble understanding it currently. I am basically stuck now.
The inner loop can take as little time as it likes to actually do the update but it is accounted in simulated time by adding 16 to the timer and accounting for that in the update by passing in 16 as the amount of time to update for.
Real time is updated once at the start of the step then simulated time catches up in 16ms increments. So you get as many update calls as there are 16ms increments to simulated time before it exceeds real time.
This is all very simple but also pretty finicky to think through. I’d replicate the code in the language of your choice and step through it.
I think I finally got it now. I set myself up with roadblocks preventing me from seeing the tree in a forest.
After learning what delta time (and variable timesteps) are, I finally understood what the core issue (frame rate independence) is and why you need a fixed timestep for it.
I learned about it reading the quake 2 sourcecode. In school or tutorial articles, they always teach the first technique, I find. "Delta timing" is what I knew it as.
Related: If anyone is interested, I've been working on an ECS based mini-game framework in Rust [0] that runs with this idea. It has a 1200 hz simulation clock, 60 hz rendering clock, and 150 hz physics integrator (Euler). I've also posted some example mini-games that use it [1].
Please note that non-60Hz displays are surprisingly common (not just "gamer monitors" with 120, 240Hz etc..., but also odd refresh rates like 72, 75, 85, 90 and 100Hz)
Ideally the 'render clock' would query the display refresh rate from the operating system, but this is harder than it sounds. The next best option is to measure the frame time, and then round to the next "common" refresh rate (AFAIK that's the only way it works on the web for instance).
Regarding "missing" a collision due to the delta-time being too large, I thought the game industry standard is to "backtrack" (in math, not in ticks) to determine the correct collision point and go from there. Granted, that is a bit more complicated than just keeping the delta-t/tick consistently small, but it is more correct.
Could be—it's been a while since I read about the technique.
I thought a game engine doing this figured out the collisions when the trajectories changed so then it's just a matter of rendering those known effects (rather than the more naive approach of checking each tick or frame whether anything's collided). Note that doing this at impulse can cause the game to slow down when many objects are moved at the same time, like an explosion. I think it works pretty well because most game objects are stationary, or even immovable; also moving objects are rarely subject to new accelerations—they stay ballistic.
But IANAGameDev, so do correct me if you know something about this technique.
t=0
cube1 p=5,0 v=0,0 m/s
cube2 p=1,0 v=2,0 m/s
// Game knows cube1 and 2 will collide at t=2
t=1
cube1 p=0,0 v=0,0 m/s
cube2 p=3,0 v=2,0 m/s
cube3 p=0,0 v=5,0 m/s
// Game knows cube1 and 2 AND 3 will collide at t=2
t=2
all cubes hit at 5,0
Its pretty difficult to do quickly and with numerical robustness, specifically when you have rotation. Here's a talk by the creator of Box2D while he was working for Blizzard on continuous collision detection.
Thanks, that was an interesting talk. Apparently what I was describing is "time of impact" as that talk describes here: https://youtu.be/7_nKOET6zwI?t=495
A lot of game physics engines are still pretty bad at this, which is why being able to clip through walls is such a common issue. Even in games where the physics are more or less solid, there are usually many bugs where fixed animations allow clipping through stuff (e.g. cover or hurdle-jumping animations), because those simply don't interact with the physics engine at all - you only need to get to a start point that places the animation's end point out of bounds and be able to trigger the animation.
on my simple games I essentially do this which seems to work but I have concerns about it cascading out of control if a frame takes too long. any tips for dealing with that? so far I just aggressively profile to make sure my game logic has a good time buffer before it hits my timestep.
const TIMESTEP = 1/60;
let since_last_frame = 0.0;
function update(dt) {
since_last_frame += dt;
while (since_last_frame > TIMESTEP) {
tick();
since_last_frame -= TIMESTEP;
}
}
function tick() {
// handle game logic at a fixed timestep
}
I’m not a gamedev at all so I never really dug into it, but the issue I’ve always had there was how to handle input when executing multiple simulation steps within a single “execution frame”.
Depends on how input can be handled you could repoll between each sub-step but practically just sending in the same input to every sub-step works. Usually you aren’t doing many sub-steps per step anyway.
Tick rate usually means how many times a server calculates the state of the world in an online game. So a client can have 300fps, but the server might only calculate 128 ticks per second. If your computer's FPS dips below that of the server's tick rate (or your internet is bad) you start missing information. The client usually calculates things as if it has perfect information, and then when a tick comes in an tells your computer it is wrong, the client corrects itself. So the lower the tick rate on the server, the more times this happens (since your computer has made hundreds of calculations since the last tick came in) and it seems slower / choppier.
Some games solved the low-tickrate / low-fps problem by timestamping user input independently and interpolating to the simulation, i.e. you can still hit things even with just say 10 ticks per second.
This is also my main pet peeve on the web. You can't measure a precise frame duration (made much worse by Spectre/Meltdown mitigations), but you also can't query the display refresh rate.
In the 80's we took perfectly smooth scrolling and animations for granted, because most 80's home computers and game consoles were proper hard-realtime systems. Counter-intuitively this is harder to achieve on modern PCs that are many thousand times faster.