Key Points

  • Today's Lab: Camera Cloning
  • [ ] How are 3D vertices transformed into positions on a 2D screen?
  • [ ] What is the difference between a perspective and orthographic projection?
  • [ ] What do we need to know to describe a camera shot in a 3D game?
  • [ ] What are some challenges of positioning cameras in 3D spaces?
  • [ ] What are some additional challenges introduced by interactivity?
  • [ ] In your own words, explain how one or more fixed camera angles could work in a game and what information they need.
  • [ ] In your own words, explain how first-person cameras work and what information they need.
    • [ ] How is our geometry stuff from last week relevant here?
  • [ ] In your own words, explain how orbit cameras work and what information they need.

Check-in: Progress on 3D Games

Pairs of teams should get together and discuss what you've been up to with your 3D game. What have you implemented so far? What are the roadblocks you're up against? Can you figure out together a way to solve them or work around them by tweaking the design?

What has teamwork been like? Can you think of ways to improve it or make it more equitable?

3D Cameras

Think back to our discussion of 3D scenes. The extra cool thing about transformation matrices is that they can be inverted. We can compute a local-to-world transform, but we can also convert a world-to-local transform by inverting each matrix along the way from the world frame of reference to the object's frame. This is important when we need to, say, put a coffee cup in a character's hand; but it's also important for viewing the scene in the first place! If we have a transform that positions a virtual camera in the world somewhere, and we know the world positions of all the objects in the scene, then we also need a way to transform those world positions into the camera's frame of reference—this is just the inverse of the camera's transformation. From there we can apply perspective projection and map the 3D world onto a 2D viewport.

(If you want a refresher on matrix transformations, check out this nicely illustrated article.)

After we apply the inverse camera transformation to every vertex in the scene (on the GPU, natch) we know where every object will be with respect to the camera—but we don't yet know what the camera will "see". Real digital cameras have a rectangular image sensor in their body which measures light (focused by the camera lens) from the scene. You can imagine that there is a four-sided pyramid shape projecting from the camera out into the world, and anything contained within the planes of the pyramid is in principle viewable by the camera. Computers don't like infinitely small points, so we put a near plane in too, where the camera's image sensor would be. What the camera sees—and what we'll eventually map onto the viewport—is the visible portion of the scene from the perspective of that near plane, normalized to fit into that pyramidal shape (the camera frustum). We call this final coordinate space clip space. By playing with the relative distances between the left and right planes (or top and bottom planes) we can achieve many effects simulating camera field of view and other properties. (Santell's site has some good visualizations of this as well; here are a couple more.)

This frustum shape is what gives us the sense of perspective we need to make a scene feel 3D: farther objects are smaller (the plane they're on has to be shrunken to map onto the near plane) and nearer objects are larger (their size on their plane is closer to the size they'll be when projected onto the near plane). Since scene vertices are all defined in terms of homogeneous coordinates, we can apply a transformation which scales vertices' homogeneous w coordinate depending on the distance from the camera (their z coordinate!), and then divide out that w coordinate when returning to 3D coordinates to achieve sizes varying with distance. In the special case where our far plane is just the same size as our near plane, we have what's called an orthographic projection (parallel lines stay parallel).

In some sense this is where we have the key payoff of using homogeneous coordinates for everything: translations, scaling, and rotations all use one kind of matrix, which means that the camera projection code can uniformly (homogeneously) transform any object's location and size in space.

To sum up, object vertices go through a chain of transformations up to the point where they're drawn onto the screen:

Model space ★ model matrix ⏩ world space ★ view matrix ⏩ view space ★ projection matrix ⏩ clip space

Interactive Cameras

So that's 3D graphics programming—at least, that's how we get vertices from the world to the screen. Somehow we define a camera transformation (a matrix, or an eye position/at direction/up direction, or a position and a rotor) and parameters like frustum plane positions (maybe determined via field of view variables), and we get a way to go from world triangles to screen pixels. But how do we decide where to put the camera and what to point it at? Especially in an interactive setting, we might want the player to move the camera around, or have the camera follow the player character through space; we might have certain aspects of our game level that are meant to be viewed up close and others that are never meant to be near the camera, or viewed from behind.

In today's lecture we'll outline a couple types of cameras and how to implement them.

Fixed Cameras

The simplest way to make an interactive camera is not to make an interactive camera. Games like Resident Evil or Final Fantasy 7 use fixed perspectives in each room or zone to frame shots the way the level designers intended. Since each room has a fixed camera location and orientation, that information can be provided in advance. If a zone is very large, or if cuts between cameras are not desirable, we can also create a transition zone between the zones where the position and rotation of the camera will be interpolated from one shot to another along some series of points (the further into the transition zone the player is, the closer we get to the target camera shot, until we're entirely in the new camera zone).

One important question this brings up is how character control works: do directional inputs (e.g., on a joystick or wasd keys) move the player character forward, back, left, and right relative to the character? Or up, down, left, and right relative to the screen? For example, if I were to hold up on a joystick, would my character be moved upwards on the screen or would it move forward relative to its current facing? Because we know the camera matrix and the player character's local-to-world transform, we can easily convert directional vectors one way or the other—but we have to think about what feels best, especially if we have transitions between multiple camera angles.

In frenderer, you tell a RenderState at render time what its camera should be like (with set_camera), but you may want a Camera struct in your world for persistence from frame to frame. Camera has public fields for its field of view and position and orientation in space. When entering a room you'd want to set the camera parameters; if you wanted to interpolate between two camera configurations, it would be best to define those per-room as a camera object (or transform) and use Camera::interpolate to synchronize a movement from one to another (either on a timer or based on the player's position).

Aside: A lerp, or linear interpolation, is a function that takes two "points" describing endpoints of a "line segment" and a ratio r between 0 and 1, and returns a value which is the r-weighted average of the two endpoints. So, a lerp between 5 and 10 at 0.5 would be 7.5, or a lerp between (0,0) and (10,10) at 0.25 would be (2.5, 2.5). Lerps only make sense for certain data types—interpolating between rotations, for example, has to happen around the great circle of a sphere rather than along a line (only normalized rotors are valid rotations), so slerp is the spherical analogue (and nlerp is the slightly less accurate but much more efficient normalized linear version).

First-Person Cameras

The next simplest way to implement a camera is to lock its position and orientation to the player character's. In first-person games, the camera is placed at roughly chest or eye level with respect to the player, and its rotation in the xz plane is fixed to the character's orientation (generally controlled by changes in the mouse position). Since first-person characters generally only rotate in xz, the mouse also controls the pitch of the camera (and maybe the vertical angle of the player character's pointer, which is often some boring gun). Some games have characters that don't move like bipedal humanoids, but have independent movement and viewing directions, so only the camera position is locked to the player's position.

Our camera will now definitely need an update function which is called every frame to synchronize its position with the player's, and at this point it's helpful to provide an update_camera function which synchronizes the engine's camera with our FPCamera:

pub struct FPCamera {
    pub pitch: f32,
    player_pos: Vec3,
    player_rot: Rotor3,
}
impl FPCamera {
    fn new() -> Self {
        Self {
            pitch: 0.0,
            player_pos: Vec3::zero(),
            player_rot: Rotor3::identity(),
        }
    }
    fn update(&mut self, input: &frenderer::Input, player: &Player) {
        let MousePos { y: dy, .. } = input.mouse_delta();
        self.pitch += DT as f32 * dy as f32 / 10.0;
        // Make sure pitch isn't directly up or down (that would put
        // `eye` and `at` at the same z, which is Bad)
        self.pitch = self.pitch.clamp(-PI / 2.0 + 0.001, PI / 2.0 - 0.001);
        self.player_pos = player.trf.translation;
        self.player_rot = player.trf.rotation;
    }
    fn update_camera(&self, c: &mut Camera) {
        // The camera's position is offset from the player's position.
        let eye = self.player_pos
        // So, <0, 25, 2> in the player's local frame will need
        // to be rotated into world coordinates. Multiply by the player's rotation:
            + self.player_rot * Vec3::new(0.0, 25.0, 2.0);

        // Next is the trickiest part of the code.
        // We want to rotate the camera around the way the player is
        // facing, then rotate it more to pitch is up or down.

        // We need to turn this rotation into a target vector (at) by
        // picking a point a bit "in front of" the eye point with
        // respect to our rotation.  This means composing two
        // rotations (player and camera) and rotating the unit forward
        // vector around by that composed rotation, then adding that
        // to the camera's position to get the target point.
        // So, we're adding a position and an offset to obtain a new position.
        let at = eye + self.player_rot * Rotor3::from_rotation_yz(self.pitch) * Vec3::unit_z();
        *c = Camera::look_at(eye, at, Vec3::unit_y());
    }
}

Orbit/Over the Shoulder Cameras

A trickier type of character camera is sometimes called a follow or over the shoulder camera. These types of cameras are positioned so that the player and their feet are visible and are useful for action games where precise positioning is important. They'll also try to do things like lead the player character's movement so the player can see what's coming up (assuming what's in front is more relevant than what's behind) and catch up to the player character's position after it comes to a stop.

In these notes we'll discuss a simpler form of the follow camera called the orbit camera. While the first-person camera was positioned high up in the player character's body, the orbit camera is held behind and above the player character, looking slightly downwards at the character. The player can tilt the camera up or down (pitch), move it closer to or further from the player (changing its distance), or orbit the camera around the axis defined by the character's up direction (yaw). You can imagine that there is a selfie stick attached to the top of the character's head, and the player controls the angle and length of that stick.

pub struct OrbitCamera {
    pub pitch: f32,
    pub yaw: f32,
    pub distance: f32,
    player_pos: Vec3,
    player_rot: Rotor3,
}

Why are we using yaw/pitch angles here? We only have two rotational degrees of freedom and we don't need to interpolate camera positions.

impl OrbitCamera {
    fn new() -> Self {
        Self {
            pitch: 0.0,
            yaw: 0.0,
            distance: 50.0,
            player_pos: Vec3::zero(),
            player_rot: Rotor3::identity(),
        }
    }
    fn update(&mut self, events: &frenderer::Input, player: &Player) {
        let MousePos { x: dx, y: dy } = events.mouse_delta();
        self.pitch += (DT * dy) as f32 / 10.0;
        self.pitch = self.pitch.clamp(-PI / 4.0, PI / 4.0);

        self.yaw += (DT * dx) as f32 / 10.0;
        self.yaw = self.yaw.clamp(-PI / 4.0, PI / 4.0);
        self.distance += events.key_axis(Key::Up, Key::Down) * 5.0 * DT as f32;
        self.player_pos = player.trf.translation;
        self.player_rot = player.trf.rotation;
        // TODO: when player moves, slightly move yaw towards zero
    }
    fn update_camera(&self, c: &mut Camera) {
        // The camera should point at the player (you could transform
        // this point to make it point at the player's head or center,
        // or at point in front of the player somewhere, instead of
        // their feet)
        let at = self.player_pos;
        // And rotated around the player's position and offset backwards
        let camera_rot = self.player_rot * Rotor3::from_euler_angles(0.0, self.pitch, self.yaw);
        let offset = camera_rot * Vec3::new(0.0, 0.0, -self.distance);
        let eye = self.player_pos + offset;
        // dbg!(self.yaw, self.pitch, self.distance, eye, offset, at);
        // To be fancy, we'd want to make the camera's eye an object
        // in the world whose rotation is locked to point towards the
        // player, and whose distance from the player is locked, and
        // so on---so we'd have player OR camera movements apply
        // accelerations to the camera which could be "beaten" by
        // collision.
        *c = Camera::look_at(eye, at, Vec3::unit_y());
    }
}

We could just as well write:

pub struct OrbitCamera {
    distance: f32,
    rot: Rotor3,
    player_pos: Vec3,
    player_rot: Rotor3,
}

And then pitch the camera up or down by multiplying rot by a Rotor3 representing a small yz rotatation, or orbit the camera around by multiplying rot by a Rotor3 representing a small xy rotation. While the example code uses pitch and yaw angles and direct mouse control of those angles, the Rotor3 based approach would make it easy to perform smooth transitions between rotations:

pub struct OrbitCamera {
    distance: f32,
    rot: Rotor3,
    target_rot: Option<Rotor3>,
    rot_timer: f32,
    rot_duration: f32,
    player_pos: Vec3,
    player_rot: Rotor3,
}
impl OrbitCamera {
    //...
    fn orbit_to(&mut self, r:Rotor3, duration:f32) {
        self.target_rot = Some(r);
        self.rot_duration = duration;
        self.rot_timer = 0.0;
    }
    fn update(&mut self, input:&Input, player:&Player) {
        if let Some(tgt) = self.target_rot {
            self.rot_timer += DT;
            if self.rot_timer >= self.rot_duration {
                self.rot = tgt;
                self.rot_timer = 0.0;
                self.rot_duration = 0.0;
            }
        } else {
            // use events to rotate rot up/down or left/right
            // or move distance in/out
            // or set a target rotation and duration with orbit_to
        }
        self.player_pos = player.body.center;
        self.player_rot = player.rot;
    }
    fn update_camera(&self, c: &mut Camera) {
        // The camera should point at the player
        let at = self.player_pos;
        // If we have a target rotation slerp to that, otherwise use self.rot
        let r = self.target_rot.map(|r| self.rot.lerp(self.target_rot, self.rot_timer / self.rot_duration).normalized()).unwrap_or(self.rot);
        // And rotated around the player's position and offset backwards
        let eye = self.player_pos + (self.player_rot * r * Vec3::new(0.0, 0.0, -self.distance));
        *c = Camera::look_at(eye, at, Vec3::unit_y());
    }
}

This kind of camera is convenient for players since they can control their view separately from the movement of the character, but still keep the camera focused on the character. If the camera gradually points itself towards the character's facing direction when not being manually controlled by the player, it gives players who aren't interested in camera control a way to ignore the camera while still offering fine-grained control.

One important consideration here is that while collision stops the player from getting into awkward situations (halfway inside an obstacle, say) the presented code offers no such guarantee for the camera. With camera control a player can put the camera inside of a wall or behind an obstacle, making it impossible to see the character or showing parts of the level that were meant to be hidden.

To determine whether we have a clear view of the player, raycasts or sphere-casts are generally used from the camera's eye to various points on the player. If those raycasts hit something else before hitting the player, that means the player is occluded and the camera's position should be fixed—either orbited back to where it was before, moved closer to the player until the collision would no longer occur, or the intervening obstacles should get cutouts or other visual effects to allow the character's silhouette to remain visible. This writeup describes some of the major design decisions for follow cameras in the Unity3D setting, but the overall exposition is very effective.

A quick efficiency aside: since a raycast is as expensive as gathering collision contacts, it's worthwhile to have one stage of processing that determines what raycasts will be necessary during a frame, then conduct the raycasts in a separate step, and then allow interested entities to process the results of those raycasts in a third step. This way we limit the number of trips through the collision geometry and keep the cache happy.

Other Thoughts on Cameras

There are many more types of cameras that we haven't explored in depth. A fly camera is like a first-person camera, but allows for translational movement as well as rotation. Top-down or birds-eye-view cameras, orthographic cameras with an elevated angle (so-called isometric), and more are appropriate for different types of games. Camera design is also essential for 2D games, though we mostly ignored it earlier.

Camera design is game design, so one of the best venues for publications on the topic is the Game Developers' Conference. If you're interested in cameras I can recommend the classic GDC talk 50 camera mistakes. When your game has many types of cameras in it, composing lots of cameras becomes a challenge—but it is really key to cinematic third-person games. It's not uncommon to have dedicated camera programmers. Finally, camera special effects can be a very helpful polish tool for establishing game feel.

Today's Lab: Camera Cloning

Today's lab will have a reverse-engineering component and a relatively small programming component. We'll do it in project teams.

First, work with two partners to find three game cameras that are interesting and that are different from the provided basic FPCamera and OrbitCamera examples. They should be games you can play and experiment with. They don't have to be 3D games but at least one should be.

Second, describe in fine detail how each these cameras work with respect to the camera's position and orientation, the character's movement, the player's control over the camera, and the larger environment they're in. This description should be sufficient to produce an implementation of the camera for someone who hasn't played the game you're talking about. Take note of what input data are necessary for these cameras and how they depend on context.

Finally, pick one of these camera types and implement it using frenderer or your own engine/renderer. If it requires collision detection or ray casting, try to implement or fake that too! Basic ray-plane and ray-sphere collision code is in chapter 5 of Real-Time Collision Detection and you can find code online as well (or next week's collision notes). Efficiency isn't too much of a concern at this point. Be sure that you make necessary changes both to how the player moves and to how the camera moves! Player character, camera, and controls aren't really fully separable notions, so don't forget that they all contribute to a gameplay experience!