Class 21

Applications: Image processing

Objectives for today

  • Implement nested loops to perform “2-D” computations
  • Utilize an existing helper class for building a program
  • Implement some common image processing techniques

Download the following files:

  • middimage.py: image processing library we will use.
  • class21.py: starter file for today where you will write code.
  • puppy-front.jpg: a picture of my dog (Leila) from 2020.
  • puppy-side.jpg: another picture of Leila we’ll use for testing the blurring technique.
  • rose.jpg: a picture of a rose we’ll use for the sepia technique.
  • giraffes.jpg: a picture of giraffes we’ll use for edge detection.

Images as a 2-D structure

We can think of images as a “2-D” structure, i.e., the image has a width and a height and each pixel has a position described by specific row and column. For our purposes, let’s assume that (0, 0) is the top-left corner of the image. We could imagine a 3×3 image as having the following structure. The pixel in the exact middle would have row and column indices of 1, and in this example, a value (190, 177 168). In this example, the values are 3-tuples, representing the red, green and blue (RGB) color components of that pixel (each component is the range [0-225]). An all black pixel would have the value (0, 0, 0), an all white pixel (255, 255, 255), a red pixel (255, 0, 0), etc.

Indices Col
0 1 2
Row 0 (108, 85, 71) (149, 131, 121) (210, 201, 196)
1 (106, 87, 73) (190, 177 168) (220, 215, 211)
2 (103, 87, 74) (208, 199, 190) (223, 219, 216)

Today we will store and manipulate images as 2-D structures using the Pillow Python Imaging Library. To run today’s code you will need to have the Pillow library installed. When you did the Python setup on the first day of the semester, the Pillow library should have already been installed, but just in case you cannot run import pillow in the shell, then try to exit() to a Terminal (or go directly to a Terminal by clicking Terminal -> New Terminal) and type:

python -m pip install pillow

Instead of directly interfacing with Pillow, we want to focus on writing nested for loops to process pixels in an image. We’ll use our own module called middimage to do this. The middimage modules provides two classes: Pixel and Image. The Pixel class simply stores the RGB components corresponding to a particular pixel and provides some overloaded operators for convieniently processing images:

  1. + (via __add__) to obtain a new Pixel with components that are the sum of the components of two Pixels.
  2. - (via __sub__) to obtain a new Pixel with components that are the components of the first Pixel minus the second one.
  3. * (via __mul__ and __rmul__) to obtain a new Pixel with components multiplied by some number. The number can be on the left (performed by __rmul__) or on the right (performed by __mul__).
  4. ** (via __exp__) to obtain a new Pixel with components raised to some power.
  5. / (via __truediv__) to obtain a new Pixel with components divided by some number. This function is useful for averaging.

The Image class provides some methods to load an image from either a URL or local file, or an empty image that is simply initialized with a desired width and height. There are also methods for getting and setting pixels at specific rows and columns. For example, the following code loads an image from a URL (a picture of my dog when she was a puppy), and then “red shifts” one pixel by adding 100 to the red-component.

img = Image("https://philipclaude.github.io/csci146f25/classes/puppy-front.jpg")
# Retrieve Pixel at row 100, column 120
pix = img.get_pixel(100, 120)
# "red shift" pixel by adding 100 to red component
pix.red += 100  
# Modify image by overwrite pixel at row 100, column 120 with new value
img.set_pixel(100, 120, pix)
img.show()

The result does not look that much different (we just changed one pixel) but if we applied to the same transformation to all the pixels, we should see something like:

Original image “Red-shifted” image

Transforming 2-D structures with nested loops

A common task is to perform an operation with or to each pixel. We will need a loop, and since the number of pixels is known at the start of the loop, a for loop is a natural choice. Performing an operation to each pixel can be implemented by performing an operation for all possible combinations of row and column positions, i.e., (0,0), (0, 1) … (2, 2) in the 3×3 example above. All combinations of multiple sequences, in this case, the row and column indices, is readily implemented with nested loops. For example the following code

ROWS = 3
COLS = 4

for row in range(ROWS):
    for col in range(COLS):
        print("Row:", row, "Col:", col, "Linear index:", row * COLS + col)
Row: 0 Col: 0 Linear index: 0
Row: 0 Col: 1 Linear index: 1
Row: 0 Col: 2 Linear index: 2
Row: 0 Col: 3 Linear index: 3
Row: 1 Col: 0 Linear index: 4
Row: 1 Col: 1 Linear index: 5
Row: 1 Col: 2 Linear index: 6
Row: 1 Col: 3 Linear index: 7
Row: 2 Col: 0 Linear index: 8
Row: 2 Col: 1 Linear index: 9
Row: 2 Col: 2 Linear index: 10
Row: 2 Col: 3 Linear index: 11

Here the outer loop iterates over all rows, while the inner loop iterates over all columns. Since the loops are nested, we only advance to the next row, the next iteration of the outer loop, after we have iterated through all columns (for that row). In the output we should observe all combinations of row and column indices. We also see a common pattern for computing a “linear” index, that is an index if we traversed all the elements in row-order. The last is common way of iterating through a “2-D” structure stored in a list or other “1-D” structure. The loop below does the reverse mapping, i.e., translating from a linear index to the associated rows and columns. The loop below will print the same values as above.

ROWS = 3
COLS = 4

for i in range(ROWS * COLS):
    row = i // COLS
    col = i % COLS
    print("Row:", row, "Col:", col, "Linear index:", i)
Row: 0 Col: 0 Linear index: 0
Row: 0 Col: 1 Linear index: 1
Row: 0 Col: 2 Linear index: 2
Row: 0 Col: 3 Linear index: 3
Row: 1 Col: 0 Linear index: 4
Row: 1 Col: 1 Linear index: 5
Row: 1 Col: 2 Linear index: 6
Row: 1 Col: 3 Linear index: 7
Row: 2 Col: 0 Linear index: 8
Row: 2 Col: 1 Linear index: 9
Row: 2 Col: 2 Linear index: 10
Row: 2 Col: 3 Linear index: 11

Finishing Red shift

Using the nested loop structure above, we can readily “red-shift” the entire image by applying the transformation above to every pixel, not just one. We will use similar nested loops, but the ranges are determined by the height and width of the image (not by constants).

def red_shift(in_file, out_file):
    """Red shift image, saving result to local file

    Args:
        in_file: String with URL or filename to original image
        out_file: String filename with image extension to save modified image
    """
    # Load image
    img = Image(in_file)
    out = Image(img.get_size())

    # Iterate over all pixels, shifting red component by 100
    for row in range(img.get_height()):
        for col in range(img.get_width()):
            pix = img.get_pixel(row, col)
            out.set_pixel(row, col, Pixel(pix.red + 100, pix.green, pix.blue))
    
    # Save output image to local file
    out.save_image(out_file)

Yes! The Pillow library doesn’t directly expose the image as a NumPy array, but it does enable us to easily construct such an array, e.g.,

import numpy as np
import PIL.Image

image_array = np.array(PIL.Image.open("puppy-front.jpg"))
image_array.shape, image_array.dtype
((320, 240, 3), dtype('uint8'))

Here we have loaded the image as a 170×130×3 array of 8-bit unsigned integers (i.e., each pixel is represented as 8 bits or a single byte). The 170×130 represents the size of the image (170 rows, 130 columns). The “3” dimension represents the individual color components, i.e., red, green, and blue. In this context we could thinking of red shift as image_array[i, j, k] += 100 where k is 0 and i and j are all valid row and column indices. We can perform this with NumPy as

image_array[:,:,0] = np.clip(image_array[:,:,0].astype(np.uint32) + 100, 0, 255)

and the resulting image is just as we expect. But there is a subtlety, indicated by the use of the clip function. An 8 bit unsigned integer can only represent the numbers 0-255 (i.e., \(2^8 -1\)). If adding 100 would result in a value larger than 255, the result “wraps around” (equivalent to (val + 100) % 256). To prevent that “wrap around” we convert to a 32-bit integer (which has larger range) and then clamp the result values to be within 0-255 (via np.clip). PIL handled this for us previously, but when working with “raw” arrays, we are responsible for these details.

Inverting

Inverting an image consists of assigning each component to 255 minus the original component value.

def invert(in_file, out_file):
    """Inverts the image, saving result to local file

    Args:
        in_file: String with URL to original image
        out_file: String filename with image extension to save modified image
    """
    # Load image
    img = Image(in_file)
    out = Image(img.get_size())

    white = Pixel(255, 255, 255)
    for row in range(img.get_height()):
        for col in range(img.get_width()):
            pix = img.get_pixel(row, col)
            new_pix = white - pix
            out.set_pixel(row, col, new_pix)
    
    # Save output image to local file
    out.save_image(out_file)

Original image Mirrored

Sepia

Sepia is a very common photo filter which mimics the effect of an old chemical process that left black and white images yellowish in appearance. We can achieve this effect by computing new \((R^*, G^*, B^*)\) values from the original \((R, G, B)\) values of each pixel as follows (from here):

\[ \begin{align} R^* &= 0.393 R + 0.769 G + 0.189 B \\ G^* &= 0.349 R + 0.686 G + 0.168 B \\ B^* &= 0.272 R + 0.534 G + 0.131 B \end{align} \]

def sepia(in_file, out_file):
    # Load image
    img = Image(in_file)
    out = Image(img.get_size())

    for row in range(img.get_height()):
        for col in range(img.get_width()):
            pix = img.get_pixel(row, col)
            r, g, b = pix.red, pix.green, pix.blue
            new_r = int(0.393 * r + 0.769 * g + 0.189 * b)
            new_g = int(0.349 * r + 0.686 * g + 0.168 * b)
            new_b = int(0.272 * r + 0.534 * g + 0.131 * b)
            out.set_pixel(row, col, Pixel(new_r, new_g, new_b))

    # Save output image to local file
    out.save_image(out_file)

Original imageSepia

Mirroring

We can also apply a “mirror” effect such that the left half of the image is reflected across a line down the center of the image.

def mirror(in_file, out_file):
    """Mirror left portion of image to right, saving result to local file

    Args:
        in_file: String with URL to original image
        out_file: String filename with image extension to save modified image
        window: number of pixels in horizontal direction in which to average
    """
    # Load image
    img = Image(in_file)
    out = Image(img.get_size())

    for row in range(img.get_height()):
        for col in range(img.get_width() // 2 + 1):
            pix = img.get_pixel(row, col)
            out.set_pixel(row, col, pix)
            out.set_pixel(row, img.get_width() - 1 - col, pix)
    
    # Save modified image to local file
    out.save_image(out_file)

Original image Mirrored

Nesting more loops!

To create a blurring effect, we can average a window of pixels (here, in the horizontal direction). This means we’ll have another nested for-loop, this time over a range of horizontal pixels. For a pixel with a row of row and a column of col and given an input window of pixels to compute the average, try to average window pixels (along the same row) centered on (row, col). Specifically if window was 5, each pixel would be the average of itself and the the 2 pixels immediately to the left and right.

def blur(in_file, out_file, window = 8):
    """Apply horizontal blur filter to image, saving result to local file

    Args:
        in_file: String with URL to original image
        out_file: String filename with image extension to save modified image
        window: number of pixels in horizontal direction in which to average
    """
    # Load image from URL
    img = Image(in_file)
    out = Image(img.get_size())

    for row in range(img.get_height()):
        for col in range(img.get_width() - 1):
            pix = Pixel(0, 0, 0)
            n = 0
            for idx in range(-window // 2, window // 2):
                sample = col + idx
                if 0 <= sample < img.get_width():
                    pix = pix + img.get_pixel(row, sample)
                    n += 1
            out.set_pixel(row, col, pix / n)
    
    # Save output image to local file
    out.save_image(out_file)

Original image Image with horizontal blur using window = 33

Edge detection

Another useful image processing technique is to detect edges which is useful for object detection. This can be used in the medical imaging, fingerprint reading, and vehicle detection (source).

We can detect edges by first transforming the image to a grayscale image and then applying a filter to detect strong changes in the pixels. The grayscale image can first be computed by assigning each pixel to be the average of the (R, G, B) values. For example, if some pixel in the original image is (144, 32, 83), then the average is \((144 + 32 + 82) / 3 = 86\) and the grayscale pixel would have a color of \((86, 86, 86)\).

Next, imagine we want to process interior pixels with a row of \(i\) and col of \(j\). We are processing interior pixels because the stencil we will implement will reach into neighboring pixels, so we want to stay in bounds of the image dimensions. Denote the color of pixel \((i,j)\) as \(p_{i,j}\). We need compute:

\[ \begin{align} a_{i,j} &= (p_{i-1,j+1} + 2p_{i,j+1} + p_{i+1,j+1}) - (p_{i-1,j-1} + 2p_{i,j-1} + p_{i+1,j-1}) \\ b_{i,j} &= (p_{i-1,j-1} + 2p_{i-1,j} + p_{i-1,j+1}) - (p_{i+1,j-1} + 2p_{i+1,j} + p_{i+1,j+1}) \\ d_{i,j} &= \sqrt{a_{i,j}^2 + b_{i,j}^2} \end{align} \]

The output pixel would be the grayscale color \((R,G,B) = (d_{i,j}, d_{i,j}, d_{i,j})\). Since the Pixel class of middimage overloads various mathematical operators, you can apply the formulas directly using the pixels surrounding pixel \((i,j\)).

Here are some examples:

Original imageEdge detection

Original imageEdge detection

def edges(in_file, out_file):
    # Load image
    img = Image(in_file)
    out = Image(img.get_size())

    # First convert the image to grayscale
    for row in range(img.get_height()):
        for col in range(img.get_width()):
            p = img.get_pixel(row, col)
            avg = (p.green + p.blue + p.red) // 3
            img.set_pixel(row, col, Pixel(avg, avg, avg))

    for row in range(1, img.get_height() - 1):
        for col in range(1, img.get_width() - 1):
            aij = img.get_pixel(row - 1, col + 1) + 2 * img.get_pixel(row, col + 1) + img.get_pixel(row + 1, col + 1)
            aij = aij - img.get_pixel(row - 1, col - 1) - 2 * img.get_pixel(row, col - 1) - img.get_pixel(row + 1, col - 1)
            bij = img.get_pixel(row - 1, col - 1) + 2 * img.get_pixel(row - 1, col) + img.get_pixel(row - 1, col + 1)
            bij = bij - img.get_pixel(row + 1, col - 1) - 2 * img.get_pixel(row + 1, col) - img.get_pixel(row + 1, col + 1)
            dij = (aij ** 2 + bij ** 2) ** 0.5
            out.set_pixel(row, col, dij)

    # Save output image to local file
    out.save_image(out_file)