An Idiot, a digital camera, and a PC +

My name is Seung Chan Lim, a clueless noob trying to make sense of the subject of computational photography by re-teaching the material learned in class....

Homogeneous coordinates: more warping

posted by dJsLiM on Monday, October 10, 2005 ::
click for class reference material ::

Now we all had fun with those 2×2 matrices in the previous entry, right? ;) (boo....) Before you think to yourself that that's the end of matrices, lemme warn you that this entry will involve even more. *SIGH* But, I promise you I won't turn this into a math blog!! =) In all seriousness, I assure you it's nothing too complicated.

So! As simple and nice as those 2×2 matrices were, the math geeks were not satisfied. Why not, you may ask. Well, the problem is

that although we can do all the mirroring, shearing, and rotating till the sun goes down, we cannot, for the love of God, move the damn image from one position to another. I mean... We defined image warping as a general notion of moving pixels around, yet we can't do simple horizontal/vertical sliding around of images?! That's just wrong, no?? Well, so there's a trick to doing this, and it's called homoegeneous coordinates.

Although I won't go into details on what exactly homogeneous coordinates are, or in what other context they're used, I'll at least say that as far as we're concerned, it's just a kludge that will allow us to represent image translation (sliding it horizontally and vertically) in matrix form.

In real simple terms, what we're going to do is add the number "1" as a new entry into our coordinate space like so:

[x
 y
 1]

Now, what that does is it forces us to turn our transformation matrices into 3×3 matrices, just to be able to get the matrices to multiply together. So if we had a 3×3 matrix like this one:

[x'  =  [1 0 tx  [x
 y'  =   0 1 ty   y
 1]  =   0 0 1 ]  1]

and solved the equation, like so:

x' = 1 × x + 0 × y + tx * 1
y' = 0 × x + 1 × y + ty * 1
1 = 0 × x + 0 × y + 1 * 1

You'll end up with:

x' = x + tx
y' = y + ty

Boo yah! You've got yourself a translation matrix that lets you add arbitrary scalar values to the x and y coordinates. =)

For a more in-depth intro to homogeneous coordinates, check this paper titled introduction to homogeneous coordinates. Oh, and when they mention some mumbo jumbo about a Euclidean plane, those pedantic fools math scholars are basically referring to a plane that's 2 dimensional and consists only of real numbers.

So now you can take all the 2×2 matrices we've shown in the previous entry, and turn them into 3×3 matrices that use homogeneous coordinates

Translate

[x'  =  [1 0 tx  [x
 y'  =   0 1 ty   y
 1]  =   0 0 1 ]  1]

Scale

[x'  =  [sx 0  0  [x
 y'  =   0  sy  0   y
 1]  =   0  0   1 ]  1]

Rotate

[x'  =  [cosθ -sinθ 0  [x
 y'  =   sinθ cosθ  0   y
 1]  =   0    0     1 ] 1]

Shear

[x'  =  [1  shx 0  [x
 y'  =   shy 1 0   y
 1]  =   0    0 1 ]  1]

Alright, now before we dive further into our newly found interest in 3×3 matrices, let's first make some notes about the propertis of the linear transformations discussed in the previous entry.

  • orgin maps to orgin, meaning that th pixel found at the origin (0,0) in the original image can be found at the origin (0,) in the transformed image
  • lines map to lines
  • parallel lines remain parallel
  • ratios are preserved
  • closed under composition - meaning you can combine them together to yield a single 2x2 transformation matrix

With that said, let's look at transformations that involve both linear and non-linear transformations. Let's take affine, for example.

Affine
[x'  =  [a b c  [x
 y'  =   d e f   y
 w]  =   0 0 1]  w]

This one is a combo of translation and transformation and it violates one of the properties of a linear transformation, which is that its

  • origin does not necessarily map to origin

Next we have projective transformation which takes the following form:

Projective

[x'  =  [a b c  [x
 y'  =   d e f   y
 w]  =   g h i]  w]

it's basically affine plus projective warps, and it violates 3 of the properties found in a linear transformation since

  • origin doesn't necessarily map to origin
  • paralle lines do not ncessarily remain parallel
  • ratios are not preserved

One thing to note is that since these matrices are closed under composition, we can multiply them together to get a single matrix that will carry out the entire series of transformations. Do keep in mind, however, that order is important.

So, armed with any of the above transformation matrices, we can iterate through all the pixels on a given image and find out where it would end up in the transformed image. This is known as forward warping. Watch out, though, cuz it can get interesting when the x, y values come out to be a fraction (i.e. in between two pixels). In this case we can use a technique called splatting.

Then there is also inverse warping where you iterate through all the pixels on the transformed image and multiply it by the inverse of the transformation matrix to get the pixel that

it would have originated from to find the brightness value to use. Of course, we can, again, run into cases where the originating pixel lands onn a non integer x, y value in which case we can use various interpolation techniques such as nearest neighbor, bilinear, bicubic, gaussian, etc... to pick a nearby pixel.

In general, inverse warping is more commonly used given that it makes sure that all pixels on the transformed image is covered. In a foward warp, you may run into cases where not all pixesl on the transformed image is accounted for. That means that you might end up with holes in your transformed image due to the fact that no brightness value has been picked for that pixel. Inverse warping isn't perfect, either. The problem in this case is that you have to assume that the inverse of the transformation matrix can be found. Unfortunately, this is not always the case.

Alright, so tune in next time for an interesting adventure into the world of image morphing!

Entering Warp Zone!

posted by dJsLiM on Saturday, October 01, 2005 ::
click for class reference material ::

In the previous entry we looked at image processing, which involved changing the brightness values of the pixels. This time let's look at image warping, which entails moving a pixel from one position to another.

Simply put, warping is the process taken to warp (DUH!) a pixel from its original x, y position to a new x, y position. In math talk, this is referred to as changing the domain of an image. The domain

basically denotes the range in which the values of x and y for the pixels inside an image reside within. For example, let's say the domain in which the pixels were laid out in the original image ranged from 0 to 10 in the x axis and 0 to 20 in the y. If we were to move this image by +5 pixels in the x-axis and +10 pixels in the y, then x in the warped image will end up spanning 5 to 15 and y, 10 to 20.

While we're in math land, let's venture in a lil further and visit some linear algebra concepts. Don't worry, it's nothing terribly complicated. =) it's just that warping can be represented quite nicely using a simple series of matrix operations since we're dealing with numbers in a discrete domain (remember that digital images are brightness values sampled at discrete intervals in space). I'll tell ya, math geeks definitely get off on the elegance of simple matrix operations. ;)

There are a few different types of image warps. Most common ones include translation, rotation, aspect, affine, and perspective. These are called parametric or global warping. I'll define that term in just a second, but first let's talk matrix.

The act of warping an image is also referred to as a transformation, and we call the function that takes an image to produce a new image, the transformation function. Given that, we can now represent this relationship as follows.


p' = T(p)

where p represents points on the original image and p' represents points on the transformed image.

If we were to dig one level deeper and break out the point into the x and y components and depict it as a 2 x 1 matrix, we'd get the following:

[x

 y]

We can then define the previous relationship again as a matrix equation:

[x'       [x
     =  M
 y']       y]

where M is the matrix that defines the transformation function T.

What this equation means is that for each and every pixel at x and y that resides within the original image p, we'll multiply it by the transformation matrix M to get new values of x and y, x' and y', that are on the new image. The fact that the transformation gets applied uniformly across all pixels is why this type of warping is called parametric/global warping

You're dying for some more matrices, right?! Soooo... Let's start with scaling. Scaling is an operation that resizes an image to make it bigger or smaller. What this really means is that the x and y components are multiplied by a scalar value so that the distance between two

neighboring x values and two neighboring y values are increased, ending up with an image that has an onverall dimension that is either bigger or smaller. In a uniform scaling operation you multiply the x and the y component by the same scalar, and in a non-uniform scaling operation you multiply them independently with different scalars. This simple multiplication can be expressed in matrix form as follows:

[x'    [a 0    [x
     =       *
 y']    0 b]    y]

where a and b are the scalars to be used to multiply x and y by respectively.

Let's quickly refresh our memories on how we multiply matrices together. You take the first row and match it up with the first column to get

x' = a × x + 0 × y then you take the second row and match it up with the first column to get

y' = 0 × x + b × y

Next we have 2D Rotation which entails moving a pixel at position x and y to point x', y' so that the vectors from the origin to the two points make a certain angle, say θ. I won't go into the good ol' highschool trig you had painfully memorized only to quickly forget after that dreaded IB exam, but the matrix equation comes out to take the following form.

[x'    [cos(θ) - sin(θ)   [x
     =
 y']    sin(θ)   cos(θ)]   y]

Then we have 2D shear which is represnted as follows:

[x'    [1  shx [x
     =
 y']    shy 1]  y]

And 2D mirroring about the Y axis

[x'    [-1  0  [x
     =
 y']     0  1]  y]

and about (0, 0)

[x'    [-1  0 [x
     =
 y']     0 -1]  y]

Well, I think that's enough matrices for you to chew on for the night. In the next entry things get more interesting as we look at some of the more combinatorial warping. ;) Stay tuned!