Week 11: 3D Cameras
Key Points
- Today's Lab: Camera Cloning
[ ]
How are 3D vertices transformed into positions on a 2D screen?[ ]
What is the difference between a perspective and orthographic projection?[ ]
What do we need to know to describe a camera shot in a 3D game?[ ]
What are some challenges of positioning cameras in 3D spaces?[ ]
What are some additional challenges introduced by interactivity?[ ]
In your own words, explain how one or more fixed camera angles could work in a game and what information they need.[ ]
In your own words, explain how first-person cameras work and what information they need.[ ]
How is our geometry stuff from last week relevant here?
[ ]
In your own words, explain how orbit cameras work and what information they need.[ ]
How isengine3d
different fromengine2d
?[ ]
How isengine3d
and thecamera3d
example different from ourcollision3d
starter?
Check-in: Progress on 3D Games
Pairs of teams should get together and discuss what you've been up to with your 3D game. What have you implemented so far? What are the roadblocks you're up against? Can you figure out together a way to solve them or work around them by tweaking the design?
What has teamwork been like? Can you think of ways to improve it or make it more equitable?
3D Cameras
Last week the notes included this little buried treasure:
The extra cool thing about matrices is that they can be inverted. We can compute a local-to-world transform as above, but we can also convert a world-to-local transform by inverting each matrix along the way from the world frame of reference to the object's frame. This is important when we need to, say, put a coffee cup in a character's hand; but it's also important for viewing the scene in the first place! If we have a transform that positions a virtual camera in the world somewhere, and we know the world positions of all the objects in the scene, then we also need a way to transform those world positions into the camera's frame of reference—this is just the inverse of the camera's transformation. From there we can apply perspective projection and map the 3D world onto a 2D viewport.
(If you want a refresher on matrix transformations, check out this nicely illustrated article.)
After we apply the inverse camera transformation to every vertex in the scene (on the GPU, natch) we know where every object will be with respect to the camera—but we don't yet know what the camera will "see". Real digital cameras have a rectangular image sensor in their body which measures light (focused by the camera lens) from the scene. You can imagine that there is a four-sided pyramid shape projecting from the camera out into the world, and anything contained within the planes of the pyramid is in principle viewable by the camera. Computers don't like infinity, so we add a far plane to that pyramid; computers also don't like infinitely small points, so we put a near plane in too, where the camera's image sensor would be. What the camera sees—and what we'll eventually map onto the viewport—is the visible portion of the scene from the perspective of that near plane, normalized to fit into that pyramidal shape (the camera frustum). We call this final coordinate space clip space. By playing with the relative distances between the left and right planes (or top and bottom planes, or near and far planes) we can achieve many effects simulating camera field of view and other properties. (Santell's site has some good visualizations of this as well; here are a couple more.)
This frustum shape is what gives us the sense of perspective we need to make a scene feel 3D: farther objects are smaller (the plane they're on has to be shrunken to map onto the near plane) and nearer objects are larger (their size on their plane is closer to the size they'll be when projected onto the near plane). Since scene vertices are all defined in terms of homogeneous coordinates, we can apply a transformation which scales vertices' homogeneous w coordinate depending on the distance from the camera (their z coordinate!), and then divide out that w coordinate when returning to 3D coordinates to achieve sizes varying with distance. In the special case where our far plane is just the same size as our near plane, we have what's called an orthographic projection (parallel lines stay parallel).
In some sense this is where we have the key payoff of using homogeneous coordinates for everything: translations, scaling, and rotations all use one kind of matrix, which means that the camera projection code can uniformly (homogeneously) transform any object's location and size in space.
To sum up, object vertices go through a chain of transformations up to the point where they're drawn onto the screen:
Model space ★ model matrix ⏩ world space ★ view matrix ⏩ view space ★ projection matrix ⏩ clip space
Interactive Cameras
So that's 3D graphics programming—at least, that's how we get vertices from the world to the screen. Somehow we define a camera transformation (a matrix, or an eye position/at direction/up direction, or a position and a quaternion) and parameters like frustum plane positions (maybe determined via field of view variables), and we get a way to go from world triangles to screen pixels. But how do we decide where to put the camera and what to point it at? Especially in an interactive setting, we might want the player to move the camera around, or have the camera follow the player character through space; we might have certain aspects of our game level that are meant to be viewed up close and others that are never meant to be near the camera, or viewed from behind.
In today's lecture we'll outline a couple types of cameras and how to implement them.
Fixed Cameras
The simplest way to make an interactive camera is not to make an interactive camera. Games like Resident Evil or Final Fantasy 7 use fixed perspectives in each room or zone to frame shots the way the level designers intended. Since each room has a fixed camera location and orientation, that information can be provided in advance. If a zone is very large, or if cuts between cameras are not desirable, we can also create a transition zone between the zones where the position and rotation of the camera will be interpolated from one shot to another along some series of points (the further into the transition zone the player is, the closer we get to the target camera shot, until we're entirely in the new camera zone).
One important question this brings up is how character control works: do directional inputs (e.g., on a joystick or wasd
keys) move the player character forward, back, left, and right relative to the character? Or up, down, left, and right relative to the screen? For example, if I were to hold up on a joystick, would my character be moved upwards on the screen or would it move forward relative to its current facing? Because we know the camera matrix and the player character's local-to-world transform, we can easily convert directional vectors one way or the other—but we have to think about what feels best, especially if we have transitions between multiple camera angles.
In our new engine3d
setup, you can call camera_mut()
on an Engine
to get a mutable reference to a camera, which has public fields for its field of view, position, target, and up vector. When entering a room you'd want to set the camera parameters; if you wanted to interpolate between two camera configurations, it would be best to define those per-room as a camera position and rotation and use cgmath::Vector3::lerp
and cgmath::Quaternion::slerp
to synchronize a movement from one to another (either on a timer or based on the player's position).
Aside: A lerp
, or linear interpolation, is a function that takes two "points" describing endpoints of a "line segment" and a ratio r
between 0 and 1, and returns a value which is the r
-weighted average of the two endpoints. So, a lerp between 5 and 10 at 0.5 would be 7.5, or a lerp between (0,0) and (10,10) at 0.25 would be (2.5, 2.5). Lerps only make sense for certain data types—interpolating between rotations, for example, has to happen around the great circle of a sphere rather than along a line (only normalized quaternions are valid rotations), so slerp
is the spherical analogue (and nlerp
is the slightly less accurate but much more efficient normalized linear version).
First-Person Cameras
The next simplest way to implement a camera is to lock its position and orientation to the player character's. In first-person games, the camera is placed at roughly chest or eye level with respect to the player, and its rotation in the xz
plane is fixed to the character's orientation (generally controlled by changes in the mouse position). Since first-person characters generally only rotate in xz
, the mouse also controls the pitch of the camera (and maybe the vertical angle of the player character's pointer, which is often some boring gun). Some games have characters that don't move like bipedal humanoids, but have independent movement and viewing directions, so only the camera position is locked to the player's position.
Our camera will now definitely need an update
function which is called every frame to synchronize its position with the player's, and at this point it's helpful to provide an update_camera
function which synchronizes the engine's camera with our FPCamera
:
pub struct FPCamera { pub pitch: f32, player_pos: Pos3, player_rot: Quat, } impl FPCamera { fn new() -> Self { Self { pitch: 0.0, player_pos: Pos3::new(0.0, 0.0, 0.0), player_rot: Quat::new(1.0, 0.0, 0.0, 0.0), } } fn update(&mut self, events: &engine3d::events::Events, player: &Player) { let (_dx, dy) = events.mouse_delta(); self.pitch += dy / 100.0; self.pitch = self.pitch.clamp(-PI / 4.0, PI / 4.0); self.player_pos = player.body.c; self.player_rot = player.rot; } fn update_camera(&self, c: &mut engine3d::camera::Camera) { // The camera's position is offset from the player's position c.eye = self.player_pos + Vec3::new(0.0, 0.5, 0.0); // This is the trickiest part of the code, since it relies on // some knowledge of matrix math. // This way we rotate the camera around the way the player is // facing, then rotate it more to pitch is up or down. Since // engine3d::camera::Camera needs eye, target, and up vectors, // we need to turn this rotation into a target vector by // picking a point a bit "in front of" the eye point with // respect to our rotation. This means composing two // rotations (player and camera) and rotating the unit forward // vector around by that composed rotation, then adding that // to the camera's position to get the target point. c.target = c.eye + self.player_rot * (Quat::from(cgmath::Euler::new( cgmath::Rad(self.pitch), cgmath::Rad(0.0), cgmath::Rad(0.0), ))) * Vec3::unit_z(); } }
Orbit/Over the Shoulder Cameras
A trickier type of character camera is sometimes called a follow or over the shoulder camera. These types of cameras are positioned so that the player and their feet are visible and are useful for action games where precise positioning is important. They'll also try to do things like lead the player character's movement so the player can see what's coming up (assuming what's in front is more relevant than what's behind) and catch up to the player character's position after it comes to a stop.
In these notes we'll discuss a simpler form of the follow camera called the orbit camera. While the first-person camera was positioned high up in the player character's body, the orbit camera is held behind and above the player character, looking slightly downwards at the character. The player can tilt the camera up or down (pitch), move it closer to or further from the player (changing its distance), or orbit the camera around the axis defined by the character's up direction (yaw). You can imagine that there is a selfie stick attached to the top of the character's head, and the player controls the angle and length of that stick.
pub struct OrbitCamera { pub pitch: f32, pub yaw: f32, pub distance: f32, player_pos: Pos3, player_rot: Quat, }
Why are we using yaw/pitch angles here? We only have two rotational degrees of freedom and we don't need to interpolate camera positions. We could just as well write:
pub struct OrbitCamera { distance: f32, rot: Quat, player_pos: Pos3, player_rot: Quat, }
And then pitch the camera up or down by multiplying rot
by a Quat
representing a small zy
rotatation, or orbit the camera around by multiplying rot
by a Quat
representing a small xy
rotation. While the example code in the camera3d
starter uses pitch and yaw angles and direct mouse control of those angles, the quaternion based approach would make it easy to perform smooth transitions between rotations:
pub struct OrbitCamera { distance: f32, rot: Quat, target_rot: Option<Quat>, rot_timer: f32, rot_duration: f32, player_pos: Pos3, player_rot: Quat, }
impl OrbitCamera { //... fn orbit_to(&mut self, q:Quat, duration:f32) { self.target_rot = Some(q); self.rot_duration = duration; self.rot_timer = 0.0; } fn update(&mut self, events:&engine3d::events::Events, player:&Player) { if let Some(tgt) = self.target_rot { self.rot_timer += engine3d::DT; if self.rot_timer >= self.rot_duration { self.rot = tgt; self.rot_timer = 0.0; self.rot_duration = 0.0; } } else { // use events to rotate rot up/down or left/right // or move distance in/out // or set a target rotation and duration with orbit_to } self.player_pos = player.body.c; self.player_rot = player.rot; } fn update_camera(&self, c: &mut engine3d::camera::Camera) { // The camera should point at the player c.target = self.player_pos; // If we have a target rotation slerp to that, otherwise use self.rot let r = self.target_rot.map(|r| self.rot.slerp(self.target_rot, self.rot_timer / self.rot_duration)).unwrap_or(self.rot); // And rotated around the player's position and offset backwards c.eye = self.player_pos + (self.player_rot * r * Vec3::new(0.0, 0.0, -self.distance)); } }
This kind of camera is convenient for players since they can control their view separately from the movement of the character, but still keep the camera focused on the character. If the camera gradually points itself towards the character's facing direction when not being manually controlled by the player, it gives players who aren't interested in camera control a way to ignore the camera while still offering fine-grained control.
One important consideration here is that while collision stops the player from getting into awkward situations (halfway inside an obstacle, say) the presented code offers no such guarantee for the camera. With camera control a player can put the camera inside of a wall or behind an obstacle, making it impossible to see the character or showing parts of the level that were meant to be hidden.
To determine whether we have a clear view of the player, raycasts or sphere-casts are generally used from the camera's eye to various points on the player. If those raycasts hit something else before hitting the player, that means the player is occluded and the camera's position should be fixed—either orbited back to where it was before, moved closer to the player until the collision would no longer occur, or the intervening obstacles should get cutouts or other visual effects to allow the character's silhouette to remain visible. This writeup describes some of the major design decisions for follow cameras in the Unity3D setting, but the overall exposition is very effective.
A quick efficiency aside: since a raycast is as expensive as gathering collision contacts, it's worthwhile to have one stage of processing that determines what raycasts will be necessary during a frame, then conduct the raycasts in a separate step, and then allow interested entities to process the results of those raycasts in a third step. This way we limit the number of trips through the collision geometry and keep the cache happy.
Other Thoughts on Cameras
There are many more types of cameras that we haven't explored in depth. A fly camera is like a first-person camera, but allows for translational movement as well as rotation. Top-down or birds-eye-view cameras, orthographic cameras with an elevated angle (so-called isometric), and more are appropriate for different types of games. Camera design is also essential for 2D games, though we mostly ignored it earlier.
Camera design is game design, so one of the best venues for publications on the topic is the Game Developers' Conference. If you're interested in cameras I can recommend the classic GDC talk 50 camera mistakes. When your game has many types of cameras in it, composing lots of cameras becomes a challenge—but it is really key to cinematic third-person games. It's not uncommon to have dedicated camera programmers. Finally, camera special effects can be a very helpful polish tool for establishing game feel.
Engine3D Starter
For this week's starter we'll generalize the code from last week into a crate called engine3d
. Set up your workspace like so:
- A
threed
folder, withA
Cargo.toml
like this one:[workspace] members = ["roguelike", "triangle", "triangle-tex", "number-guessing-game", "extreme-number-guess-challenge", "interactive-drawing", "scene2d", "scene2d-offscreen", "run-wasm", "engine2d", "test2d"] exclude = ["target", "content"] resolver = "2" [