Our main goal today is to start using a new rendering technique called rasterization. The main difference between rasterization and what we have been doing so far with ray tracing is that ray tracing is a pixel-first approach to rendering, whereas rasterization is object-first. Recall that, in ray tracing, we start by sending a ray through a pixel, see what it hits (if anything), and then calculate the pixel color using some lighting model. Here are the steps we had in Lecture 3:
1. set up your image and place your observer.
2. for each pixel in your image:
a. create a ray originating at your camera position that passes through this pixel.
b. determine the closest object in your scene intersected by the ray.
c. determine the color of the pixel, based on the intersection.
Rasterization, on the other hand, starts with the objects in our scene, projects them onto our screen, determines which pixels our projected object covers and then applies some lighting model. Here is what these steps look like:
1. set up view and place your observer.
2. for each triangle in your models:
a. project triangle into viewing space.
b. determine which pixels this triangle covers.
c. for each pixel covered by the triangle:
i. determine the depth of this object fragment.
ii. if (depth < minimum depth of this pixel):
A. determine the color of this pixel.
For us, the newest part of this process is the projection of the model triangles (or other geometric shapes like lines or quads). Everything else, such as setting up a view and shading our models is similar to what we did in ray tracing.
In practice, modern graphics APIs define the steps above as a sequence of stages and provide us with ways to inject our own code into these stages. This code we will inject is called a shader. We'll issue commands to the API that will (1) write the necessary data to memory and (2) invoke the rasterization pipeline in order to actually draw our scenes. Before proceeding, it's important for us to understand some of the terminology of these stages:
This pipeline is shown below where the gray boxes show the three main stages and the white boxes show which data is an input/output to each stage.
The first stage involves projecting the vertices of our geometric primitives to the screen, as shown by the dashed lines below (some of the dashed projection lines are omitted for clarity). In the example below, there are two primitives: one blue triangle and one red square. The rasterization process will also keep track of the depth of each fragment so that we can correctly render the blue triangle in front of the red square below.
After the vertex data is projected to the screen, the rasterization process determines which pixels are covered by our geometric primitives. Let's just focus on the blue triangle for now. Remember barycentric coordinates (
As mentioned earlier, rasterizers also keep track of the depth of each fragment so that we can correctly render objects in order. To do so, we need to talk about the math behind projection, as well as the full vertex transformation pipeline.
Our ultimate goal is to transform the surface points of our objects onto the screen. For models represented with a mesh, this involves transforming the vertices onto the screen. To do so, we'll break up the full transformation into 4 stages. The first stage involves placing the model in the scene and will feel familiar to what we did in Lab 4. However, the last three stages are essentially the reverse of what we did in ray tracing. Remember that, in ray tracing, we were trying to express pixel coordinates in terms of how the scene is defined (the world space). In rasterization, we need to express our models in screen space, which first involves a transformation to the frame of reference of our camera.
Before going into the four stages below, I want to make one more point about homogeneous coordinates. Previously, we had seen that homoegeneous coordinates are a convienient way to express all our transformations as a single 4x4 matrix. Position vectors are represented in homogeneous coordinates by setting the fourth coordinate to 1, whereas direction vectors have a fourth coordinate of 0.
The property I want to highlight right now is the idea of homogenization. This refers to the process of dividing the first three coordinates by the fourth one. The following two points should be understood as representing the same Cartesian point using homogeneous coordinates, and we will thus interpret them as being equivalent:
When we imported a mesh (e.g. from a .obj
file), the model vertices are defined in object space. Similar to how we rotated the bird in Lab 04, we can transform the model to place it into the world space of our scene. This is the global 3d coordinate system where all objects in the scene are layed out. We had previously called this the model matrix, which we will do again. So the first transformation is a transformation from object space to world space, which we will denote as
As mentioned earlier, our ray tracers required us to express pixel coordinates in world space, and we set up a change-of-basis matrix (with a translation by the eye $\vec{e} = (e_x, e_y, e_z)$) to do that (using the glMatrix
targetTo
function). Our camera matrix (in Lecture 4) was previously a combination of a change-of-basis
Our goal is a bit different now. We want to express the world coordinates of a model surface point in the frame of reference of the camera. This view matrix
Since
This can also be interpreted as first translating to the eye and then rotating to align with the camera axis vectors.
So we have our models in the frame of reference of the camera. Now our goal is to project these points to our image plane. We also want this projection to work with our 4x4 transformation matrix framework. Specifically, we will use a perspective projection, where the center of projection is the origin of the camera coordinate system (the eye), and the plane on which we are projecting is our image plane which is a distance
The coordinates
Therefore,
Oh no, how are we going to express this using a transformation matrix?? The
Aha! If we always remember to homogenize our points after transforming them, then we get the projection we want. In the exercise below, try returning the projection matrix we just derived as a mat4
. The aspect ratio of the canvas is 1 in case you need it, and the field-of-view (FOV) is set to 90 degrees. Try setting undefined
will signal the exercise to use a more complete projection matrix (see below).
What do you notice about the result? While the mechanics of the projection work, we're projection EVERYTHING to the same plane. This means we can't tell what's in front of what.
Instead of projecting everything to a viewing (image) plane, we will use a viewing volume as outlined in light gray below. We will now call the image plane the near plane (hence, the
where
$$
x_p = -\frac{2d_n}{w z} x = \frac{1}{a\cdot\tan(\frac{1}{2}\alpha)}\left(-\frac{x}{z}\right), \quad y_p = -\frac{2d_n}{h z} y = \frac{1}{\tan(\frac{1}{2}\alpha)}\left(-\frac{y}{z}\right)
$$
since
Note that this means we plan to homogenize by
$$
z_p = \frac{1}{-z}\left(\alpha z + \beta\right) = -\alpha -\frac{\beta}{z},
$$
we can apply the fact that at
$$
\alpha = \frac{d_n + d_f}{d_n - d_f}, \quad \beta = \frac{2d_nd_f}{d_n - d_f}.
$$
Note that both of these values are negative since the near plane is closer than the far plane. Also note that this viewing volume is defined with a left-handed coordinate system.
The last step is to transform our near plane (in the view space) which has corners at HTML
canvas) which has the origin at the top-right corner (with
glMatrix
and nomenclature.I strongly recommend using glMatrix
as much as possible to avoid bugs involved with creating the matrices above, specifically:
mat4.lookAt
(see implementation here).mat4.perspective
(see implementation here).Which are further documented here. Please go back to the first exercise and try returning the result of mat4.perspective
using a FOV of 90 degrees, an aspect ratio of 1, a near plane of
Some intermediate matrices are also used frequently, so it's good to know about some naming conventions:
Remember to read transformations from right-to-left!
WebGL
- the rasterization API we we will use.WebGL
(Web Graphics Library) provides an API to rasterize our models using the GPU. This makes it very fast and will allow us to develop some more interactive applications. We won't cover everything in WebGL
and we'll mostly focus on writing shaders in our course. It will still be important to understand how a WebGL
application is built from start to finish, and I'll provide a lot of starter code for doing this. In the coming weeks, we'll also discuss how to upload data to the GPU and issue rendering calls to WebGL
.
One of the central concepts in WebGL
is the idea of a context. Just like the "2d"
context we used to assign pixel colors in an HTML
canvas, we can retrieve the webgl
context using:
let gl = canvas.getContext('webgl'); // the gl object is a WebGL context.
You can also pass "webgl2"
to access the WebGL2
API. The context has several functions which will allow us to write data to the GPU, create shader programs, and issue rendering calls.
As mentioned earlier, WebGL
offers us the ability to inject our own programs into the graphics pipelines to (1) process vertex data and (2) process fragments. These two steps are done using shaders which are essentially programs that will run on the GPU. Shaders are written in specific languages - for WebGL
, they are written in the GL Shading Language (GLSL
). The syntax of GLSL
is similar to C
, and there are some custom types built into the language to help us with linear algebra. Actually, a lot of the way we manipulate vectors and matrices in GLSL
will feel more natural than the way we've been doing it in JavaScript
with glMatrix
, primarily because of the way operators are overloaded.
For example, the following is valid GLSL
to manipulate 3d vectors and 3x3 matrices:
vec3 u = vec3(1, 2, 3);
vec3 v = vec3(4, 5, 6);
vec3 u_normalized = normalize(u);
float u_length = length(u);
vec3 u_scaled = 2.0 * u; // 2 * u will result in a compiler error (2 is an integer, but 2.0 is a float)
vec3 u_plus_v = u + v;
vec3 u_minus_v = u - v;
float u_dot_v = dot(u, v);
vec3 u_cross_v = cross(u, v);
vec3 u_times_v_componentwise = u * v;
vec3 reflected = reflect(u, v); // special built-in function to reflect the first vector across the other!
vec3 p = vec3(1, 2, 3); // some position vector
vec4 p_homogeneous = vec4(p, 1.0); // homogeneous representation of p
float u_x = u.x; // or u[0], also u.r
vec2 u_xy = u.xy;
mat3 A = mat3(1, 0, 0, 0, 1, 0, 0, 0, 1); // 3x3 identity, can also use mat3(1.0)
mat3 A_inverse = inverse(A);
mat3 A_transpose = transpose(A);
vec3 A_times_u = A * u;
Each shader should be understood as a mini program with an entry point at the main()
function. You can write additional functions to assist in your implementation just like you would in C
. Variables can also be declared globally so the entire shader can use them. In the case of the vertex shader, think of the input as some vertex data. At the very least, a vertex shader must write to a special variable called gl_Position
(a vec4
). In the case of a fragment shader, the input is a fragment, and the required output is the color we want to assign to the fragment, which is assigned in a special, reserved variable called gl_FragColor
. Note that gl_FragColor
is also a vec4
where the first three components correspond to the RGB values (between 0-1). The fourth component of gl_FragColor
controls the transparency (0 for transparent, 1 for opaque).
To practice with GLSL
, let's write a ray tracer in a fragment shader! This is done by rendering a full-screen quad that covers the entire near plane with corners at
// exercise:
// return color from ambient + diffuse reflection
// see lightPosition (vec3) above
float t = -B - sqrt(disc);
vec3 p = eye + t * r;
vec3 n = normalize(p - center);
vec3 l = normalize(lightPosition - p);
vec3 cl = vec3(1);
vec3 Id = km * cl * max(0., dot(n, l));
return km * ca + Id;
In the coming weeks, it will be very important to always ask yourself which frame of reference you are in when doing a particular calculation. The data you pass from a vertex shader to a fragment shader might be in world space, camera space or projection space. Lighting calculations will typically be done in camera space, so vertices should be transformed by the model-view matrix and the normals should be transformed by the inverse-transpose of the model-view matrix before doing a lighting calculation.
WebGL
does the viewport (screen) transformation for you, so the gl_Position
output (a required output of a vertex shader) should be transformed by the MVP (model-view-projection) matrix.
Also, please remember what you are/are not allowed to do in each programming language we are using, specifically when it comes to linear algebra. GLSL
has built-in types and functions for vectors and matrices, with operator-overloading to make expressions more intuitive (and more like how it is written mathematically). JavaScript
(with glMatrix
) does not.