L04 - Scenes

Lecture 04: Scenes (slides)

Learning Objectives

By the end of this lecture, you will be able to:

set up a camera to look in more general directions,
use homogeneous coordinates to represent all our transformations with a 4x4 matrix,
compound rotations, scalings and translations of objects in your scene into a model matrix,
transform the normal vector of objects in your scene using a normal matrix.

Up to now we have always been pointing our virtual camera in the $-z$ direction when setting up our scenes. But what if we want to point our camera somewhere else? Maybe we want to take a picture of a bird in the sky or point a telescope at a particular planet. We'll need to extend our ray tracer to handle this more general viewing setup.

Furthermore, the bird we are taking a picture of is probably moving, and might also be rotating if it's turning directions. The Earth is rotating and also orbiting the sun. In order to represent this movement, we'll start with some base model and transform the points on the surface when taking a virtual picture.

So we'll set up more general scenes today which consists of (1) setting up a general view and then (2) transforming our models.

General Views: pointing a camera at a particular point to "look at."

When the camera is pointing in a more general direction, the main thing that changes is the 3d coordinates of each pixel. Let's first review what we had previously, but make a few notational changes. We'll now call these pixel coordinates $(x_c, y_c, z_c)$ with the $c$ subscript denoting the fact that this is relative to our camera (eye) position.

$$ x_c = -\frac{w}{2} + w \frac{(i + 0.5)}{n_x}, \quad y_c = -\frac{h}{2} + h \frac{(n_y - 0.5 - j)}{n_y}, \quad z_c = -d. $$

where $w$ and $h$ are the width and height of the image plane and $d$ is the distance from the eye to the image plane, as we had previously. We will actually break up the 3d pixel coordinates calculation by recycling this derivation, and then using a change of basis to calculate the 3d coordinates (in 3d "world" space). We'll call the coordinates above our "camera coordinates". If we can find a way to transform "camera coordinates" into "world coordinates" then we have what we're looking for.

The question we then need to answer is: for a point $(x_c, y_c, z_c)$ defined with respect to a camera (in camera space), how do we define the $(x, y, z)$ coordinates of the pixel in 3d (world) space?

Inputs: camera (eye) position, point to look at, and "up" direction.

Let's first define the camera inputs. Like before, a camera has a position $\vec{e}$ defined in world space. Similarly, our camera points in a certain direction, which we have called the gaze, $\vec{g}$. Instead of prescribing a gaze, we'll prescribe a point to "look at", called $\vec{a}$ (also defined in world space). Note that the gaze can be inferred from this as $\vec{g} = \vec{a} - \vec{e}$.

We will also need to specify an upwards direction (in world space), which we will denote as $\vec{up}$. The reason we need this is because we can technically rotate about our gaze and possible get images which are upside down, so specifying which direction is "up" fixes this.

Expressing camera axis vectors in world space.

To determine the world coordinates of a pixel, we need to represent the three vectors in our camera system in these world coordinates. Let's label the x-axis of our camera system as $\vec{u}$, the y-axis of the camera as $\vec{v}$ and the z-axis of the camera as $\vec{w}$. Remember that $\vec{w}$ points into the camera.

We know that $\vec{w}$ is opposite our gaze direction, so:

$$ \vec{w} = -\frac{\vec{g}}{\lVert\vec{g}\rVert}. $$

The vector $\vec{u}$ is orthogonal to the gaze $\vec{g}$. We have many choices for the direction, so we'll use the "up" vector $\vec{up}$ to make it unique:

$$ \vec{u} = \frac{\vec{g}\times\vec{up}}{\lVert\vec{g}\times\vec{up}\rVert}. $$

Since the last vector must be orthogonal to both of these (for an orthonormal basis), we have:

$$ \vec{v} = \vec{w}\times\vec{u}. $$

This means that the 3d pixel coordinates $\vec{q}$ are

$$ \vec{q} = \vec{e} + x_c \vec{u} + y_c \vec{v} + z_c\vec{w}. $$

The addition of $\vec{e}$ at the beginning is due to the fact that our camera is offset from the origin of the world system. There are different ways to think about this. I like to think about starting at the origin of the world system (bottom-left triad in the image above) and then first taking a step to the camera $\vec{e}$. Then ask yourself: how much should I step along the $\vec{u}$ direction? ($x_c$), how much along the $\vec{v}$ direction? ($y_c$) and how much should we step along the $\vec{w}$ direction? ($z_c$)

We can also rewrite this using a change-of-basis matrix $\mathbf{B}$ in which the columns of $\mathbf{B}$ are the camera axis vectors expressed in the world system:

$$ \vec{q} = \mathbf{B}\ \vec{p} + \vec{e}, \quad \mathrm{where}\quad B = \left[\begin{array}{ccc} u_x & v_x & w_x \\ u_y & v_y & w_y \\ u_z & v_z & w_z \\ \end{array}\right], $$

where $\vec{p} = (x_c, y_c, z_c)$, i.e. the pixel coordinates relative to the camera.

What is the effect on the ray direction?

The 3d coordinates of a pixel are now $\vec{q} = \mathbf{B}\ \vec{p} + \vec{e}$. Therefore the direction from the eye to the pixel is $\vec{r} = \vec{q} - \vec{e} = B\ \vec{p}$.

Hint: just use `glMatrix`!

The equations above have a lot of cross products and normalizations. Luckily, glMatrix has a function to build this for us (assume eye, up and a are all vec3's):

let C = mat4.targetTo(mat4.create(), eye, a, up);

where C is used to denote the camera transformation.

Wait, why are we using mat4? $\mathbf{B}$ is a 3x3 matrix. Actually we can represent the whole transformation (change-of-basis and translation by $\vec{e}$) using homogeneous coordinates - more on this below.

We can use C to get the 3d coordinates of the pixel sample (using vec4.transformMat4), but we will need to subtract the eye to get the ray direction $\mathbf{r}$.

Transformation matrices.

Generally when you create a model, it's origin will not coincide with where you want to place the model in your scene. Furthermore, your model might be too big (or small) and might be rotating. So we may need to transform our input model in order to place it or animate it within our scene. In the subsections below, we'll discuss scalings, rotations and translations.

All of these transformations can be compounded into a model matrix, which represents how a model in your scene should be transformed when rendering it.

Scaling

If your model is too small, you might need to scale it to make it bigger when placing it in your scene. To scale a point on a surface $\vec{p} = (x, y, z)$ by some constant factor $s$, all the coordinates get multiplied by $s$: $\vec{p}^* = s\ \vec{p} = (sx, sy, sz)$.

We can write this transformation nicely using a matrix

$$ \vec{p}^* = \left[ \begin{array}{c} x^* \\ y^* \\ z^* \end{array} \right] = \underbrace{\left[ \begin{array}{ccc} s & 0 & 0 \\ 0 & s & 0 \\ 0 & 0 & s \end{array} \right]}_{\mathbf{S}_s} \left[ \begin{array}{c} x \\ y \\ z \end{array} \right] = \mathbf{S}_s \vec{p} $$

We can also have different values in the diagonal entries if the stretching factor is not the same in each dimension:

$$ \vec{p}^* = \left[ \begin{array}{c} x^* \\ y^* \\ z^* \end{array} \right] = \left[ \begin{array}{ccc} s_x & 0 & 0 \\ 0 & s_y & 0 \\ 0 & 0 & s_z \end{array} \right]\left[ \begin{array}{c} x \\ y \\ z \end{array} \right] = \mathbf{S}_{s_x,s_y,s_z} \vec{p} $$

Rotation

Rotations are specified by an axis and angle. For example, a rotation by an angle $\theta$ about the z-axis as in the image on the left below. For a vector $\vec{p} = (x, y)$ with length $\ell$, we need to figure out the new coordinates ($\vec{p}^$) after a rotation by $\theta$. Note that we can express $\vec{p} = (x, y) = (\ell\cos\alpha, \ell\sin\alpha)$. The length of the vector doesn't change when it is rotated by an angle $\theta$. The new coordinates of the point $\vec{p}^$ are then:

$$ \begin{array}{l} x^* = \ell\cos(\alpha + \theta) = \ell\cos\alpha\cos\theta - \ell\sin\alpha\sin\theta = x\cos\theta - y\sin\theta, \\ y^* = \ell\sin(\alpha + \theta)\ = \ell\sin\alpha\cos\theta + \ell\cos\alpha\sin\theta = x\sin\theta + y\cos\theta.\end{array} $$

Thus the transformation can be expressed as:

$$ \vec{p}^* = \left[ \begin{array}{c} x^* \\ y^* \end{array} \right] = \left[ \begin{array}{ccc} \cos\theta & -\sin\theta \\ \sin\theta & \phantom{-}\cos\theta \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right] = \mathbf{R}^{2d}_{\theta,z} \vec{p}. $$

In 3d, we have the following rotation matrices about the $x$, $y$ and $z$ axes (by some angle $\theta$), respectively:

$$ \mathbf{R}_{\theta,x} = \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & \cos\theta & -\sin\theta \\ 0 & \sin\theta & \phantom{-}\cos\theta \end{array}\right]$$	$$\mathbf{R}_{\theta,y} = \left[ \begin{array}{ccc} \phantom{-}\cos\theta & 0 & \sin\theta \\ 0 & 1 & 0 \\ -\sin\theta & 0 & \cos\theta \end{array} \right], $$	$$\mathbf{R}_{\theta,z} = \left[ \begin{array}{ccc} \cos\theta & -\sin\theta & 0 \\ \sin\theta & \phantom{-}\cos\theta & 0 \\ 0 & 0 & 1 \end{array} \right]. $$

To rotate about an arbitrary axis, you can calculate a change-of-basis which aligns your axis with one of the x, y or z-axes, perform the rotation using one of the three matrices above, and then perform the inverse of the change-of-basis to go back to your original coordinate system.

One important note about rotation matrices is that they are orthogonal, which means that the inverse of a rotation is the transpose. Intuitively, the inverse of a rotation by some angle $\theta$ is a rotation by some angle $-\theta$. Please try replacing $\theta$ with $-\theta$ in the equations above - you should get the transpose of the original rotation matrix.

Translation

A translation consists of a shift by some point $(t_x, t_y, t_z)$. We would like to also express translations as a matrix transformation similar to our previous transformations. We can do so by adding another column to our matrices. That is, instead of working with 2x2 matrices in 2d or 3x3 matrices in 3d, we will work with 3x3 matrices in 2d and 4x4 matrices in 3d. This also means that we need to augment our 3d coordinates to 4d, which we will do using homogeneous coordinates, as described below.

Homogeneous coordinates: append a 1 to your points.

Homogeneous coordinates are a trick that allows us to represent translations as a transformation matrix, similar to what we did for scalings and rotations. This is useful, because it means that any transformation of a point or vector can be represented by a matrix multiplication, instead of doing special tricks for each type of transformation. It also means that we can compound transformations into a single matrix and forget about the specifics of what it represents.

The main idea of homogeneous coordinates is to append a 1 as a new homogeneous coordinate to any point. That is, a point in 3d $\vec{p} = (x, y, z)$ would be represented in homogeneous coordinates as (with the superscript $h$ standing for homogeneous):

$$ \vec{p}^h = \left[\begin{array}{c}x \\ y \\ z \\ 1\end{array}\right]. $$

What does this get us? Well, let's consider the following transformation of $\vec{p}$:

$$ \underbrace{\left[\begin{array}{cccc}1 & 0 & 0 & t_x \\ 0 & 1 & 0 & t_y\\ 0 & 0 & 1 & t_z\\ 0 & 0 & 0 & 1\end{array}\right]}_{\mathbf{T}} \left[\begin{array}{c} x \\ y \\ z \\ 1\end{array}\right] = \left[\begin{array}{c} x + t_x \\ y + t_y\\ z + t_z\\ 1\end{array}\right]. $$

This is a translation of $\vec{p}$ by $\vec{t} = (t_x, t_y, t_z)$! A general transformation will have the form:

$$ \left[\begin{array}{cccc} m_{0,0} & m_{0,1} & m_{0,2} & t_x \\ m_{1,0} & m_{1,1} & m_{1,2} & t_y\\ m_{2,0} & m_{2,1} & m_{2,2} & t_z \\ 0 & 0 & 0 & 1\end{array}\right] $$

where the nine $m_{i,j}$ ($0 \le i, j \le 2$) values represent scaling, rotation (and maybe a reflection) and the $t_{i}$ ($0 \le i \le 2$) values represent the translation that follows after performing the scaling, rotation, etc. When building these, remember that the bottom-right entry is 1.

Vectors in homogeneous coordinates

If points have an extra coordinate equal to 1 in homogeneous coordinates, what is this extra coordinate equal to for vectors?

Solution

For a point $\vec{p}$ with homogeneous coordinates $\vec{p}^h = (x_p, y_p, z_p, 1)$ and $\vec{q}$ with homogeneous coordinates $\vec{q}_h = (x_q, y_q, z_q, 1)$, the vector $\vec{u} = \vec{q} - \vec{p}$ from $\vec{p}$ to $\vec{q}$ in homogeneous coordinates is $\vec{u}^h = (x_q - x_p, y_q - y_p, z_q - z_p, 0)$. So the value of this homogeneous coordinate is 0 for vectors.

Compounding transformations: matrix multiplications read from right-to-left!

So we know how to build up these individual transformation matrices, but we might want to combine a few transformations. Think about first transforming a point $\vec{p}$ by $\mathbf{M}_1$. That gives us a new vector $\vec{p}^{\ast}$. Then maybe we multiply $\vec{p}^*$ by a new transformation matrix $\mathbf{M}_2$ and get $\vec{p}^{**}$, and so on:

$$ \mathbf{M}\vec{p} = \mathbf{M}_n\cdots(\mathbf{M}_3(\mathbf{M}_2(\mathbf{M}_1\vec{p}))) = \left(\mathbf{M}_n\cdots\mathbf{M}_3\mathbf{M}_2\mathbf{M}_1\right)\vec{p}. $$

This means that we can combine transformations by continuing to multiply the result by another transformation. Or, written an alternative way, we can just multiply all the matrices and then take the resulting transformation matrix and multiply $\vec{p}$ by that one.

Pen and paper example

Given an arbitrary shape centered at the point $\vec{p} = (p_x,p_y)$ shown below, we would like to scale it by some factor $s$ along the axis shown in dashed (but no scaling in the direction perpendicular to the axis). Determine the transformation matrix that performs this scaling.

Solution

We will let $\mathbf{S}$ represent the scaling along the direction of the axis, $\mathbf{T}$ to represent the translation from the point $\vec{p}$ to the origin and $\mathbf{R}$ to be the rotation by an angle $-\theta$. $$ \mathbf{S} = \left[ \begin{array}{ccc} s & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right], \quad \mathbf{R} = \left[ \begin{array}{ccc} \phantom{-} \cos\theta & \sin\theta & 0 \\ -\sin\theta & \cos\theta & 0 \\ 0 & 0 & 1 \end{array} \right], \quad \mathbf{T} = \left[ \begin{array}{ccc} 1 & 0 & -p_x \\ 0 & 1 & -p_y \\ 0 & 0 & 1 \end{array} \right]. $$ The full transformation is given by first translating to the origin, then rotating to be in line with the scaling axis, then scaling, and rotating back by $\theta$ and then finally translating back to $\vec{p}$. This is expressed as $\mathbf{M} = \mathbf{T}^{-1}\mathbf{R}^{-1} \mathbf{S}\mathbf{R}\mathbf{T}$.

Programming example

Let's now practice with a programming example. We would like to animate the rotation of the Earth. At any particular time in the animation, we are given the angle of rotation theta. We also know the center of the earth (center) and the point on the surface (point) we want to transform. Please complete the transform function below which will return the transformed version of point which is a rotation by an angle theta about the y-axis with center as the center of rotation.

The Normal Matrix: transforming normals on a model.

The transformations discussed above are applicable to points and tangent vectors. However, normal vectors behave a bit differently. The reason is because we need these normal vectors, even after they are transformed, to be perpendicular to our surfaces. Unfortunately, applying the transformations above may not respect that (except for some transformations like scalings and rotations). To account for this, and ensure our normal vectors are always perpendicular to our surfaces, we need to calculate a new matrix $\mathbf{N}$ to transform normals, which we will call the normal matrix.

In order to derive this, we know two things: (1) points and vectors are transformed by some transformation $\mathbf{M}$ and (2) normal vectors are perpendicular to the surface. This means that the dot product between the tangent ($\vec{t}$) and normal ($\vec{n}$) should always be zero, before and after the transformation:

$$ \vec{n}^* \cdot \vec{t}^* = (\mathbf{N}\vec{n}) \cdot (\mathbf{M}\vec{t}) = 0 \quad (\mathrm{perpendicular}). $$

We then have

$$ (\mathbf{N}\vec{n}) \cdot (\mathbf{M}\vec{t}) = (\mathbf{N}\vec{n})^T (\mathbf{M}\vec{t}) = \vec{n}^T \underbrace{\mathbf{N}^T \mathbf{M}}_{\mathbf{I}} \vec{t} = 0. $$

Since $\vec{t}$ and $\vec{n}$ are perpendicular, then the stuff in the middle must be the identity matrix:

$$ \mathbf{N}^T \mathbf{M} = \mathbf{I} \quad \rightarrow \quad \mathbf{N} = (\mathbf{M}^{-1})^T. $$

In other words, normal vectors are always tranformed by the inverse-transpose of whatever transformation you are performing on your surface points.