Controlling Time and Framerate

[ ] What are some potential issues with lockstep update and render steps?
[ ] Why don't we want to allow the time per frame to vary widely?
[ ] Using an accumulator to separate update and rendering depends on interpolation. Why?
[ ] What are some tradeoffs between interpolation and fuzzy lockstep rendering?
[ ] How do we tell Vulkan whether we want to wait for vsync, and where does that wait happen in our Vulkano code?
[ ] How exactly do we structure our engine code so that interpolation is possible?

Reading

Read these two articles on managing the passage of time in game loops:

Controlling Time in Rust

If you haven't done so already, now is a good time to refactor your engine's huge run function. For example, you might split it up into:

some winit-related code
WGPU initialization, bind group and pipeline creation, etc
update(), which updates the engine and game state
render(), which draws the game and its sprites using WGPU

So your engine's core loop might look something like:

let event_loop = EventLoop::new();
let surface = WindowBuilder::new()
    // ...
    .unwrap();
let mut renderer = renderer::new(event_loop, surface);
event_loop.run(move |event, _, control_flow| {
    // ...
    Event::MainEventsCleared => {
        // ...
        update_game(&mut game_state);
        renderer.render(&game_state);
        // ...
    }
});

At what frame rate does this execute? The answer, since we don't define it explicitly, is "it depends". Let's measure it and see what we learn.

In Rust we measure time using the std::time module. We can represent a moment in time using std::time::Instant, and compute a high-resolution time difference from the present moment using Instant's elapsed() method, which yields a std::time::Duration, which can be easily converted into e.g. an f64. So you can do something like the following to see how often your game is updating:

// Initialize last frame timer
let mut last_frame = std::time::Instant::now();
event_loop.run(move |event, _, control_flow| {
    // ...
    Event::MainEventsCleared => {
        // Print elapsed time
        println!("DT: {:?}", last_frame.elapsed());
        // Reset last frame timer
        last_frame = std::time::Instant::now();
        // Proceed as before
        update_game(&mut game_state);
        renderer.render(&game_state);
        // ...
    }
});

On my machine, I see numbers like "16.15217ms" or "17.105362ms". Observe that 1/60 is 16 and two thirds—so we're running at approximately 60 frames per second, give or take. If you use a fancy 144Hz monitor, you may see numbers closer to 7ms, or if your system is specially configured you might see numbers close to 5ms. The important thing is that this behavior is some kind of default that is system-dependent in some way. If you don't mind small variations in timing between frames and some choppiness in movement (especially for continuously-moving objects), this is fine.

But what if you do mind? And how does the code we have enforce that "roughly 60 frames per second (or 144, or…)" rate? And what happens to the game simulation if there's a stall for some reason?

WGPU and Vsync

What exactly does this frame interval depend on—what is it that "blocks" until the next frame? What's happening here is that something in the code is waiting for a vertical blank period, an old-timey term from the days of cathode ray tubes (where the electron beam took actual time to be redirected by magnets back to the top of the screen). When rendering waits until the screen is ready to receive another image, we say rendering is vertically synced.

For an experiment, try to find out where the waiting is happening. Take these two lines of code:

let start = Instant::now();
//...
dbg!(start.elapsed());

And wrap them around any few lines of code that you think might be introducing the delay. As a hint, the main part of the delay is not happening because of anything in winit or the event processing code. Really, try it!

[ ] OK, I tried it!

What you'll find is that the culprit is acquiring the next surface image from the swapchain, whatever that is. Recall that the swapchain is how WGPU negotiates with the graphics card and display adapter to get us images to draw into. We initialize the swapchain with a certain number of images which are our framebuffer render targets (here, framebuffer refers to the actual, displayed framebuffer, not our miniature 2D framebuffer). WGPU needs to do two things with these images:

Allow our application to render into them
Give them to the operating system to put on the monitor

There are lots of valid choices for balancing these two concerns:

WGPU could let us draw into the image (1) while it's being rendered (2), but this could cause visible tearing when half the image is from one time point and the rest is from another
If we had several images, WGPU could let us draw into one while the other is being rendered.
- We could use them like a queue, filling up images with data and handing them over (then waiting for images to come back from presentation)
  - If our renderer were sometimes slow, WGPU could present the in-progress frame
  - If our renderer were very fast, WGPU could replace the previously-queued images with fresh ones

In WGPU, the presentation mode of a surface dictates which of these approaches are used. Respectively, they are:

Immediate mode
If using multiple images:
- FIFO mode (First In, First Out; the default)
  - Relaxed FIFO mode
  - Mailbox mode

For more information about their tradeoffs, check out this Vulkan tutorial; besides tearing, latency and energy usage are also important considerations.

So if we're stuck waiting for WGPU to give us a swapchain image, that's because the default presentation mode is FIFO. Something like this happens:

We draw the first frame very quickly.
WGPU gives us the second frame right away.
We draw the second frame very quickly.
WGPU is still presenting the first frame, so we need to wait for it to be ready again.
Finally we quickly draw into the first frame again.
… But now we have to wait for the second frame!

Since our renderer is way faster than the presentation interval, after we blow through the queue once we spend most of our time waiting. This is great from the perspective of power consumption, but not great from the perspective of predictable, consistent frame times.

What do you think would happen if you made the number of images in the FIFO queue really high? Would we spend more time waiting, less time, or about the same?

Try changing the presentation mode in the surface configuration to mailbox or immediate (or perhaps, AutoVsync or AutoNoVsync) and see what happens to the framerate and time spent acquiring a swapchain image. Especially take a look at what happens to the movement of your characters. You may also want to search the WGPU documentation for methods and constants involved with presentation mode.

Game Loops Revisited

Since rendering takes time (both in terms of our code and the operating system's), updating our game state takes time, and we can't do these things simultaneously (otherwise we may show inconsistent data on screen), we need a way to synchronize or otherwise schedule game updates with respect to rendering.

Our goals here are to achieve a consistent frame rate for smooth motion and to minimize latency. Energy consumption is also worth minimizing. In fact, these three goals trade off against each other: the most energy efficient option is to render as infrequently as possible, while if we render often we can achieve smoothness at the cost of latency by beautifully interpolating between several prior frames (once we know what motion occurred it's easy to average it out). We can even reduce latency by trading away smoothness if we extrapolate previous game states to future predictions, or guess what player or opponent inputs might be (correcting the display if our guess ends up wrong).

The rest of these notes will illustrate two points in this trade-off space. We'll use two notions of time here: wall-clock time (or "real time") and simulated game time (or "time steps").

Lockstep Rendering

The simplest way to synchronize updates and rendering is just to perform a render after every update. This bounds latency tightly as we only ever draw the most recently simulated time step, and we draw it as soon as Vulkan says we can.

update_game(&mut game_state);
renderer.render(&game_state);

Oh, that looks familiar!

This technique comes with some drawbacks:

If the user has somehow disabled vsync or is using a high refresh rate monitor, we actually have no idea how many game updates will happen per second!
Frames don't all take exactly the same amount of time, so we may have jittery, "janky" motion. This is exacerbated if occasionally a frame takes way more or less time than usual.

We might try to mitigate these issues by figuring out how much time has passed and then simulating exactly that much time; but this complicates our simulation code (we need sophisticated integration for position and velocity for instance) and it's hard to imagine a simulation that works well when some timesteps are shorter than five milliseconds and others are longer than 100. Maybe it would make sense to cap the maximum game step duration, and execute several shorter updates in one cycle to avoid instability; but then we're prone to a death spiral where one or two slow simulation frames can lead to cascading slowdowns, grinding the game to a halt.

We're on the right track though—it certainly seems like we need to reason about time explicitly to balance smoothness with latency.

"Renderer Produces Time"

One popular technique (that you'll see in the readings) is to use an accumulator that "fills up" with elapsed time, and when we have enough "stored" time we get to simulate one game step. In this scheme, the renderer (which must run on its own schedule to maintain the display framerate) acts as the clock that determines when the game should update: we say the renderer produces time (adding it to the accumulator), and the simulation consumes it (in fixed-size chunks).

We could implement our accumulator like this:

// During initialization...
let mut acc = 0.0_f32;
let mut prev_t = Instant::now();
// Let's clock the game at 60 simulation steps per second
const SIM_DT : f32 = 1.0/60.0;
// Later on, in our loop...
  let elapsed = prev_t.elapsed().as_secs_f32();
  acc += elapsed;
  prev_t = Instant::now();
  while acc >= SIM_DT {
    update_game(&mut game_state);
    // NOTE: This is when you should swap "new" keys and "old" keys for input handling!
    // Otherwise you'll see several frames in a row where a key was just pressed/released.
    input.next_frame();
    acc -= SIM_DT;
  }
  renderer.render(&game_state);

What happened? Well, even though rendering still drives the simulation, we have decoupled the simulation framerate from the rendering framerate. So we could update the game a hundred times per second, or fifty, or any rate we like. This also means that no matter what the rendering framerate is—30, 60, 144 frames per second, whatever—the game will always progress at the same rate each second.

This does, however, have important implications for latency and smoothness. Imagine that our game is updating 60 times per second. Most of the time, the elapsed rendering duration won't be exactly 1/60th of a second. If it's a little more, that's no problem; we'll update the accumulator and bank the extra time, updating once. But if it's a little less, we might not have enough time saved up to progress the simulation. On average we'll tick 60 times per second, but in the moment we'll see some rendered frames that are identical to the previous frame (no game updates) and some that represent two frames worth of movement (double updates).

An inconsistent update rate can cause visual stuttering, and we might also have problems with latency if we get unlucky with timing (since we often render a slightly stale state, user input might arrive almost a full two rendering frames before the player sees any feedback). We can fix these problems in three ways:

Crank up the simulation frame rate. A stream of 0-, 1-, and 2-update frames has more apparent variation than a stream of 3-, 4-, and 5-update frames.
Ignore small deviations in render time—if the elapsed frame time is "roughly" 1/60, round it to 1/60 (or 1/15, 1/30, 1/144). This can cause game time to drift from real time over a long enough duration but can be pretty effective.
Account for leftover time using interpolation.

Frame Time Fudging

A brutal but effective solution is to ignore minor deviations in frame time, locking them to a "reasonable" rate.

Code for that looks something like this, chock-full of magic numbers:

// Simulate at 60 hz
const DT: f32 = 1.0 / 60.0;
// .2ms is "close enough" to a target frame time
const DT_FUDGE_AMOUNT: f32 = 0.0002;
// Snap-to frame times of 1/15, 1/30, 1/60, 1/120, 1/144
const TIME_SNAPS: [f32; 5] = [15.0, 30.0, 60.0, 120.0, 144.0];

// Later on, when we're updating our accumulator:

// compute elapsed time since last frame
let mut elapsed = now.elapsed().as_secs_f32();
// snap time to nearby vsync framerate
TIME_SNAPS.iter().for_each(|s| {
    if (elapsed - 1.0 / s).abs() < DT_FUDGE_AMOUNT {
        elapsed = 1.0 / s;
    }
});
//...
acc += elapsed;

This has the effect of causing a little drift between wall-clock time and "number of timesteps * DT", so it's a place where speed-runners could get tangled up. But as long as you periodically reset your accumulator to 0 (e.g., after loading a level) this drift will be limited.

Death Spirals

Another important concern when simulation and rendering are decoupled is the possibility of a "death spiral": One frame takes a little too long, so we end up needing to simulate two frames; but each of these takes a little too long, so next time we need to simulate three frames; and so on. Eventually we might get stuck having to simulate thousands of frames in a single time step, and we'll never catch up!

This can be avoided by noticing if the elapsed time is above some threshold (the maximum number of frames you can reasonably simulate in one render step) and forcing the accumulator to 0 to help catch up. The game will visibly stagger and slow down, but it's better than the game freezing and crashing.

const DT: f32 = 1.0 / 60.0;
// If we're 10 simulation steps behind, give up and just do one step.
const DT_MAX: f32 = DT * 10.0;

// Just after time snapping...
// Death spiral prevention
if elapsed > DT_MAX {
    acc = 0.0;
    elapsed = DT;
}
acc += elapsed;
// Now we'll just do one frame.

Interpolating Game State

The problem of seeing sometimes no update and sometimes double updates occurs because the game updates in discrete steps. This discreteness is actually desirable, but can we find a trick to render "in between" the old game state and the next one? For some aspects of game state it's easy to see how to do this: assuming smooth motion we can take the halfway point of each object's old and new position. Likewise if the camera moves or if objects scale or rotate smoothly. We can interpolate anywhere between two floating point numbers or vectors (or even integers, if we don't mind truncation) using something like this:

fn interpolate(&self, other:Self, ratio:f32) : Self {
  other * ratio + self * (1.0 - ratio)
}

Not all state can be interpolated (for example, the creation and destruction of objects), so we may want to split RenderState from GameState. In fact, some state (like audiovisual feedback state) might reasonably only exist in the RenderState. (This last point is key—we can show the animation and particle effect and play the sound effect before running any game simulation, in our "rendering" code, before the character's actual physics state changes! So perceived latency can be made extremely low.) Our game loop will need a few alterations.

// During initialization...
let mut acc = 0.0_f32;
let mut prev_t = Instant::now();
// Let's clock the game at 60 simulation steps per second
const SIM_DT : f32 = 1.0/60.0;
// Later on, in our loop...
  let elapsed = prev_t.elapsed().as_secs_f32();
  acc += elapsed;
  prev_t = Instant::now();
  while acc >= SIM_DT {
    std::mem::swap(previous_render_state, render_state);
    update_game(&mut game_state, &mut render_state);
    input.next_frame();
    acc -= SIM_DT;
  }
  let ratio = acc / SIM_DT;
  let rstate = previous_render_state.interpolate(&render_state, ratio);
  renderer.render(&game_state, &rstate);

Of course, it's up to you: if render states are large, consider passing the old and new states and ratio through to the render functions rather than producing a new render state. Also notice the use of std::mem::swap to safely and conveniently swap the old and new states!

The key move here is the computation of ratio as acc / SIM_DT, which treats the remaining time in the accumulator after the simulation updates as a factor for blending between the old and new game states. This means we'll always be rendering slightly old data, but never more than one frame old. And, importantly, this latency will be disguised by the buttery smoothness of the interpolation.

This was a long one! The key takeaways are that we can get more predictable rendering behavior by taking charge of the presentation mode, and either do some heuristics and fudging to get fairly smooth movement or structure our code carefully to reach the state of the art in balancing latency with smoothness.