3D Animation/How Do I…?

Key Points

  • Today's Lab: Animation Lab
    • [ ] Why is animation important in a 3D game?
    • [ ] UV Animation
      • [ ] What are UVs again, and how can we do animation with them?
      • [ ] How do our model vertices/our vertex shader need to change to support UV animation?
      • [ ] How can we set up our model to support UV animation?
      • [ ] How is UV animation similar to and different from sprite animation?
    • [ ] Skeletal Animation
      • [ ] What is a rig or skeleton?
      • [ ] How is a model rigged, and what does that mean?
      • [ ] How do our model vertices/our vertex shader need to change to support skinned animation?
      • [ ] What is a joint or bone, and how does it relate to animation?
      • [ ] Think of two different ways to compute a local-to-global transform for a joint in a skeleton (one top-down, one bottom-up)
      • [ ] What are the key transformation steps for doing skeletal animation?
      • [ ] How is skeletal animation similar to and different from paperdoll animation?
    • [ ] How could we drive both UV and skeletal animation using the same system?

Animation, Revisited

In 3D, the meaning of "animation" is stretched a bit farther than in the 2D games we've been working with so far. Since objects are made of a bunch of triangles, we can animate their geometry in arbitrarily complex ways—and that's before we think about what to paint on those triangles! While physics or pre-programmed object movements can give a similar effect to animation, there are two main approaches to animation per se in 3D renderers: UV (or texture animation) and skeletal animation (also called skinned).

UV Animation

UV animation is named for the \(u\) and \(v\) texture coordinates used in texture mapping. Instead of animating the physical positions of the triangles making up a mesh, UV animation changes the UV-mapping over time. This can be done to achieve an effect like sprite animation (remember, we did sprite animation by changing the offset in a spritesheet) or for effects like water or moving light (by smoothly adjusting the UVs over time). Other effects that rely on scaling, rotating, or otherwise manipulating UVs are also types of UV animation, but we'll focus just on translating UVs.

UV animation often depends on defining the right texture addressing mode; remember that when we define a texture sampler (to pick texels from a texture), we also define what should happen if texture accesses are out-of-bounds. Effects that depend on scrolling a texture (while keeping the mesh triangles stationary) will need to make use of repeating modes to achieve the correct effect.

Generally a UV animation will work by applying some offset uniformly to a batch of UV indices in a model (either by directly modifying its ModelVertex data and reuploading the buffer, or in a shader). A UV animation might look something like this:

struct UVAnim {
    target_uvs:Vec<usize>, // which vertices' UVs to alter...
    timings: Vec<f32>, // frame timings
    uv_offsets:Vec<(f32,f32)>, // how much to change the UVs for each timing
    interpolate:bool // whether to move smoothly or jump between timings
}

To sample a UV offset we can determine which frame we're and interpolate:

let t = self.timings.last().unwrap().min(t);
let kidx = self
    .timings
    .iter()
    .zip(self.timings[1..].iter())
    .position(|(t0, t1)| t >= *t0 && t <= *t1)
    .unwrap();
let t0 = self.timings[kidx];
let t1 = self.timings[kidx + 1];
let tr = (t - t0) / (t1 - t0);
// tr is between 0.0 and 1.0.
// Now let's use that ratio to determine how
// off0-ish and how off1-ish the sample should be
let off0 = self.uv_offsets[t0];
let off1 = self.uv_offsets[t1];
let off = if self.interpolate {
    lerp(off0, off1, tr)  // Linear intERPolation; off0 + (off1 - off0)*tr
} else {
    off0
};

But how do we apply off to each of self.target_uvs? It actually probably shouldn't be used to alter the model's vertex data, since then we'd lose the old texture coordinate information when we adjust it by offset. We could instead copy the model's vertex data, modify its texture coordinates by adding the output offsets, and then upload it. A better approach would be to add the UV offsets as an additional field on ModelVertex; this won't work with instancing, but it would be a good solution (we'd just have to change the shader to add the offsets to the texture coordinates).

If we want to support instancing, we'd need to provide the instance-specific UV offsets in the instance data; we could also achieve this by writing the UV offsets for all instances into a texture, or by adopting a more restrictive animation scheme (for example, offsetting all vertices by the same UV offset rather than per-vertex offsets; or allowing only up to four distinct offsets, with the model vertex defining which of those four sorts of offsets should be used).

Since a model might have thousands or tens of thousands of vertices, and we may not want to animate UVs on all of them, we generally have two approaches available to us: either we give each vertex information describing which texture offset, if any, it should use; or we break our model into several meshes which are animated and drawn separately. This is often done for things like characters' faces or equipment they're holding.

Skeletal Animation

We saw skeletal animation back in the Blender demo. Besides the mesh geometry, a model might have something called a pose, rig, or skeleton. These terms are the geometry that gets animated: a hierarchy of joints (also called bones) which have a neutral pose (often the character standing with spread arms, like a capital T) and can be transformed away from that pose. Every vertex of the mesh is skinned: an artist defines how strongly each vertex is connected to all nearby bones. Then when a bone is transformed, the vertices attached to it conform to that transformation. In practice, a single vertex is connected to a relatively small number of bones (usually four or fewer), with a numerical weight for each such bone. These weights must add up to 1.0 and work to give the vertex's final position as a weighted average of the different positions its bones want to transform it to. Our vertex data needs to carry per-vertex bone weights:

#[repr(C)]
#[derive(Copy, Clone, Debug, bytemuck::Pod, bytemuck::Zeroable)]
pub struct ModelVertex {
    bone_weights: [f32; 4], // 32*4 bits, could get away with 16-bit weights probably
    position: [f32; 3],
    bone_ids: [u8;4], // 32 bits to encode 4 8-bit values, could fit into the last slot of the previous line for extra credit
    // tex coords, etc
}

Skeleton joints are usually defined as SRT (scale, rotate, translate) transforms. If these only allow uniform scaling in XYZ they're also called "similarities", while if they don't allow scaling at all they're called "isometries". Just like with a scene graph, we want to compute the joint-to-world transform. We also need an inverse transform so we can know how the joint's original transform relates to its current animated transform, which will give us the change we need to animate the individual vertices.

pub struct Rig {
    // every joint, laid out flat during loading
    joints: Vec<Joint>,
    // every inverse bind transform, one per joint
    ibms: Vec<Mat4>,
    // for each node in the joint hierarchy, which joint number is it?
    nodes_to_joints: BTreeMap<usize, u8>,
}

pub struct Joint {
    // Which joint numbers were the children of this joint?
    children: [u8; 4],
    // In local coordinate frame, the binding pose of this joint
    transform:Similarity3,
}

Ideally, we'll be able to do vertex skinning on the GPU, so we'll want a way to pass sampled animation bone data to the vertex shader:

#[repr(C)]
#[derive(Copy, Clone, Debug, bytemuck::Pod, bytemuck::Zeroable, Default)]
pub struct Bone {
    // rotation is encoded as the four parameters of this rotor
    pub rotation: [f32; 4],
    pub translation_scale: [f32; 4],
}

We might imagine that a model defines how many bones it has, and our instance data has an array of that many Bone structs. Unfortunately, we can't easily set up instance data of arbitrary size like that. Instead, we want to use a storage buffer and bind it for use by the GPU:

// make sure device_extensions includes khr_storage_buffer_storage_class: true.

// self.bpool = CpuBufferPool::new(device.clone(), BufferUsage::storage_buffer())
// self.spool = SingleLayoutDescSetpool::new(...)

let chunk = self.bpool.chunk(bones);
// create a descriptor set from the pool with the number of bones in this model
// create a descriptor set from the pool with the buffer() set as chunk

Then in our shader:

struct Bone {
  vec4 pos;
  vec4 rot;
};
layout(set = 0, binding = 0) uint bone_count;
layout(std430, set = 0, binding = 1) buffer Bones {
  Bone bones[];
} bones;

// vertex data
layout(location=0) in vec4 a_bone_weights;
layout(location=1) in vec3 a_position;
layout(location=2) in uint a_bone_ids;

// instance data
layout(location=3) in mat4 model;

// rotate a vector by a rotor.
vec3 rotor_rot(vec4 self, vec3 vec) {
  // from ultraviolet
  // see derivation/rotor3_rotate_vec_derivation for a derivation
  // f = geometric product of (self)(vec)
  f32 s = self.x;
  f32 xy = self.y;
  f32 xz = self.z;
  f32 yz = self.w;
  f32 fx = s * vec.x + xy * vec.y + xz * vec.z;
  f32 fy = s * vec.y - xy * vec.x + yz * vec.z;
  f32 fz = s * vec.z - xz * vec.x - yz * vec.y;
  f32 fw = xy * vec.z - xz * vec.y + yz * vec.x;

  // result = geometric product of (f)(self~)
  return vec3(s*fx+xy*fy+xz*fz+yz*fw,
              s*fy-xy*fx-xz*fw+yz*fz,
              s*fz+xy*fw-xz*fx-yz*fy,
              );
}


void main() {
  uint first_bone = gl_InstanceIndex * bone_count;
  vec3 new_vertex = vec3(0,0,0);
  // accumulate weighted sum (midpoint) from four weights
  for (int idx=0; idx < 3; idx++) {
    // bit operations to find which bone ID to weight
    int bone = int(a_bone_ids >> (8*(3-idx)) & 0x000000FF);
    float weight = a_bone_weights[idx];
    // weighted rotate-then-translate-by-(rotated)-disp the a_vertex...
    Bone bone_dat = bones.bones[first_bone+bone];
    vec4 rot = bone_dat.rot;
    vec3 disp = bone_dat.pos.xyz;
    new_vertex += (rotor_rot(rot, a_position) + disp)*weight;
  }
  gl_Position = u_proj * u_view * model * vec4(new_vertex.xyz, 1.0);

}

That's skinning done and dusted. We can talk about animation now. We can actually use a similar structure for keyframe 3D animation as we used for 2D animation:

pub struct Anim {
    trans_targets: Vec<u8>, // The joints this animation will translate
    trans_keys: Vec<Vec3>, // for each frame, one translation for each target
    rot_targets: Vec<u8>, // The joints this animation will rotate
    rot_keys: Vec<Rotor3>, // for each frame, one rotation for each target
    timings: Vec<f32>, // How long each frame lasts
}

Loading an animation will require some custom code on top of russimp to interpret the animation data. When we sample an animation we want to propagate the new transform information to every bone, imagining that each bone is in its bind pose to start:

// in impl Animation
pub fn sample(&self, mut t: f32, rig: &Rig, bones: &mut [Bone]) {
    assert!(self.duration() > 0.0);
    assert!(t >= 0.0);
    while t >= self.duration() {
        t -= self.duration();
    }
    let t = self.timings.last().unwrap().min(t);
    let kidx = self
        .timings
        .iter()
        .zip(self.timings[1..].iter())
        .position(|(t0, t1)| t >= *t0 && t <= *t1)
        .unwrap();
    let t0 = self.timings[kidx];
    let t1 = self.timings[kidx + 1];
    let tr = (t - t0) / (t1 - t0);
    // so far so similar.

    //let key_count = self.timings.len();
    // oh look, we're going to figure out what joints of the rig will move
    let ttgt_count = self.trans_targets.len();
    let rtgt_count = self.rot_targets.len();

    if ttgt_count > 0 {
        // there are trans_targets trans targets per keyframe
        // there are trans_targets * timings trans_keys total
        let tfrom = &self.trans_keys[(ttgt_count * kidx)..(ttgt_count * (kidx + 1))];
        let tto = &self.trans_keys[(ttgt_count * (kidx + 1))..(ttgt_count * (kidx + 2))];
        // Interpolate, for each target translation, the new translation for this bone
        for ((tgt, from), to) in self.trans_targets.iter().zip(tfrom.iter()).zip(tto.iter()) {
            let j = &rig.joints[*tgt as usize];
            bones[*tgt as usize].translation = (from.lerp(*to, tr)).into();
        }
    }
    // there are rot_targets rot targets per keyframe
    // there are rot_targets * timings trans_keys total
    if rtgt_count > 0 {
        let rfrom = &self.rot_keys[(rtgt_count * kidx)..(rtgt_count * (kidx + 1))];
        let rto = &self.rot_keys[(rtgt_count * (kidx + 1))..(rtgt_count * (kidx + 2))];
        // Same deal here, but for rotations
        for ((tgt, from), to) in self.rot_targets.iter().zip(rfrom.iter()).zip(rto.iter()) {
            let j = &rig.joints[*tgt as usize];
            bones[*tgt as usize].rotation = (from.nlerp(*to, tr)).into();
        }
    }

    // right now all bones have their positions set in joint-local terms.
    // we need to go from top to bottom to fix that.
    // After this process, every bone's transform data will represent a bone-to-root transform,
    // which we can use to modify vertices (since they're in the model's root coordinate space too).
    for (ji, j) in rig.joints.iter().enumerate() {
        let b = bones[ji];
        // transform all direct child bones by this bone's transformation.
        let br = b.rotation;
        let bt = b.translation_scale[0..3];
        let bs = b.translation_scale[3];
        let btrans = Similarity3::new(
            bt.into(),
            b.rotor(),
            bs
        );
        for &ci in j.children.iter() {
            if ci == 255 {
                break;
            }
            let b2 = &mut bones[ci as usize];
            let b2trans = btrans * Similarity3::new(
                b2.translation.into(),
                b2.rotor(),
                bs);
            // augment b2 translation: include rotate-move from b to b2
            b2.translation = b2trans.translation.into();
            b2.rotation = b2trans.rotation.into_quaternion_array();
        }
        // but then we need to multiply by the inverse bind matrix to
        // turn this bone into a "change in vertex translations"
        let ibm = rig.ibms[ji];
        let post_ibm: Mat4 = Mat4::from(Similarity3::new(b.translation.into(), Rotor3::from_quaternion_array(b.rotation), 1.0)) * ibm;
        let transl = post_ibm.w.truncate();
        let rotn = Mat3::new(
            post_ibm.x.truncate(),
            post_ibm.y.truncate(),
            post_ibm.z.truncate(),
        );
        let b = &mut bones[ji];
        b.translation = transl.into();
        b.rotation = rotn.into_rotor3().into_quaternion_array();
    }
}

This code works if only one animation is playing, but in general we want to blend the effects of multiple animations. To do that we'd need to separate the step of sampling the animation data and propagating transform data down through the rig. The game engines book has some good discussion of animation blending!

Activity: How Do I…?

  • Random groups
  • Three questions per group
  • For example:

How do I… get started?

How do I… render 2D graphics?

How do I… render transparent things?

How do I… load a scene?

How do I… make particle effects?