Tuesday 21 April 2009

Adventures in 3D: Part IX - A Bit Of Perspective

Stick with it 3D fans, we're getting there.

One thing we cheated at way back in Part I was perspective. Until now, everything has been drawn using parallel projection. That is, there is no perspective, everything appears the same size regardless of how far away it is. That works fine when you're looking at a single object, where the difference in distance between the front and the back of the object is small enough to be negligible in terms of how your brain perceives the image, but when you start adding objects into the scene in the background, it's a problem.

Thankfully, perspective is very simple to do. We're actually going to do this twice. For a first pass, we'll do the simplest possible thing, which is to just do the calculations explicitly, and with a hardcoded viewpoint. Hopefully your alarm bells will be ringing at the sight of the work "hardcode", so then we'll look at the more proper solution, which involves our old friend, the matrix.

So, first solution. We already have a project() method in the Triangle class, which is used to convert the 3D model's x,y,z (double) coordinates into the screen's x,y (integer) coordinates. Remember that perspective does not affect the 3D model in any way, everything stays where it is. Perspective is simply an effect of projection, so this is exactly where we need to be doing the perspective calculations. And what does "perspective" actually mean for our projection? Think about a wireframe cube rendered in 3D with perspective. The back face of the box, which is at a greater Z distance, will appear smaller than the front face - the left and right sides of the back face have smaller x values (assume the x and y axes are through the centre of the box), and the top and bottom sides of the back face have smaller y values. So it's clearly an adjustment of X and Y coordinates as a function of Z, we just need to figure out what that adjustment is. Time for a diagram.



The vertical line in the middle represents the screen onto which the model is projected - the camera C is at some distance z' behind that screen, and the point P is at a distance z beyond that screen, and distance y above the axis. Drawing a line from C to P represents our line of sight, and you can see that it intersects the screen at a distance y' above the axis. Our job is to figure out the distance y'. Cast your mind back to school maths classes, and the idea of similar triangles. The theory of similar triangles says that two triangles with the same angles will have sides that are in proportion. Therefore, y'/z' = y/(z+z'), which rearranged slightly gives the equation

           z'
y' = y * -----
z + z'

From that, you can see that as z tends towards zero (i.e. the object moves nearer the plane of projection), the second term becomes z'/z', which is 1, and so y'=y. Working out z is fairly easy, we just need to remember that it's the distance from the screen to the object, which, if we decide the screen is somewhere other than z=0, is not the same as the z coordinate in the world space. In other words, z = zworld - zscreen . We also need z' - this is actually a fairly abitrary number, representing the focal length. The lower this number, the more pronounced the perspective effect.

So, let's stick that into some code. We define a viewpoint that represents the position of the viewer, and a focal length (z' from the diagram), in this case determined pretty much by trial and error - this value gives a decent sense of depth without looking unrealistic. As the focal length is fixed, and the viewpoint will potentially move, we calculate the position of our "screen" as being the position of the camera plus the focal length. Then, the relative z distance is calculated (z in the diagram), being the distance from the screen to the object. Finally, we use those values to calculate the perspective correction as defined above.


private Point viewpoint = new Point(0,0,-300);
double focalLength = 300;

public void project() {
double zScreen = viewpoint.z + focalLength;
for (int i = 0; i < 3; i++) {
double zDistance = z[i] - zScreen;
double perspective = focalLength / (focalLength + zDistance);
xPoints[i] = (int) (x[i] * perspective);
yPoints[i] = (int) (y[i] * perspective);
}
}

If you spin the scene objects now, you should get some sense of perspective. If you can't really see anything, you may want to lower the focal length value so the effect is more pronounced.

That's all well and good, but there's another way to achieve the same effect, and it's going to set us up a bit better for getting the camera moving around. We're going to use a matrix to perform the same sort of maths.

Last time we used a matrix it was for rotation, and was a 3x3 matrix which acted on 1x3 matrix (the point). Now we're going to use a transformation matrix to translate points, both in 3D and 2D. Recall that if we're working with a 3x3 matrix on a point x,y,z, then matrix multiplication means the output for each coordinate is of the form Ax + By + Cz. However, in the case of translation, we often need to just add or subtract a constant that is not a function of position. This may sound familiar, for this is the definition of an affine transformation, which we've already been happily using to move the origin into the centre of the screen. To do affine transforms, we need to introduce homogenous coordinates. There is, I'm sure, lots of complicated geometry mathematics that can be used to describe homogenous coordinates - see that Wikipedia page for starters - but really you can just think of it as a hack to allow transformations of the form Ax + By + Cz + D. You do two things: add a column to the translation matrix containing the constants to add to each coordinate, and add a 4th row, with the value, 1 to the vector matrix. Easy. Here's an example:

|1 0 0 30 ||x|    |x + 30|
|0 1 0 10 ||y| |y + 10|
|0 0 1 -10||z| => |z - 10|
|0 0 0 1 ||1| | 1 |

Hopefully you can see how this can start getting us towards the idea of a moveable camera. The translation coordinates in the 4th column will come from the position of the camera, and the result will be to move the world coordinates to coordinates relative to the camera. We did the same thing in our first method in calculating (focalLength + zDistance), albeit only for the z axis. You can see that with the matrix method, we can very easily take the x and y coordinates into account as well.

Let's add some code. I create a new TransformationMatrix class, and simply have a static method worldToCamera(Point view) that, given a camera position, will return a matrix of the form:

|1 0 0 -view.x|
|0 1 0 -view.y|
|0 0 1 -view.z|
|0 0 0 1 |

That code is

public class TransformMatrix extends Matrix {

private TransformMatrix(double[][] data) {
super(data);
}

public static TransformMatrix getWorldToCamera(Point view) {
return new TransformMatrix(new double[][] { {1,0,0,-view.x}, {0,1,0,-view.y}, {0,0,1,-view.z}, {0,0,0,1}});
}
}


Note that the view coordinates are negated. If the camera is at z=10, and a world point is at z=20, the point will be 10 units from the camera i.e. z = zworld - zcamera. We'll pass in a matrix to the project() method to use for the transform from world to camera (don't forget to make that change in the Primitive interface too). For now you can just pick a camera position and hard code it in the call to project(). When we get round to moving the camera, that matrix will be recalculated each time.

So how does this help us with perspective? It doesn't yet. We also need to factor in that focalLength. In our world-to-camera transform, we're going to end up with a z-coordinate, z', that is the distance from the camera to the point. In the first effort above, we had zDistance, which was the distance from the screen to the point, and focalLength which was the distance from camera to screen. That means that:

z' = focalLength + zDistance

How very handy. The perspective calculation is now:

double perspective = focalLength / z';

We can express that in a matrix multiplication as well. The trick here is to use the homogenous coordinate (normally called w) to store that perspective calculation (w') and then it's a simple case of applying that to x' and y'. Just one other thing we have to think about - as we're multiplying matrices, we need to express the perspective as a multiplication of z' rather than dividing by it, so we simply turn it upside down, and instead of multiplying x' by w', we divide.

That means the perspective calculation can be applied as a matrix, although in our simple case it's nothing more than a way of dividing z' by the focal length. The benefit of using the matrix is that you could potentially encode other operations in there in future to apply different effects. Here's what the matrix looks like, and the result of applying that to homogenous coordinates:

| 1  0  0  0|| x' |    |  x'  |
| 0 1 0 0|| y' | | y' |
| 0 0 1 0|| z' | => | z' |
| 0 0 1/f 0|| w' | | z'/f | => wp

Let's recap:
  • Given a point x,y,z, we add the homogenous coordinate (which is just a 1) to give a vector matrix x,y,z,w.

  • The world-to-camera transform matrix is applied to give the coordinates x',y',z',w', which are the coordinates of the point relative to the camera position, and where w' is still just a 1.

  • The perspective matrix is applied to calculate wp, which is the perspective correction factor

  • Divide x' and y' by wp to give the final x and y coordinates


Sounds slight complicated, but it's really not doing anything more than we've already done. Again, the benefit is in being able to encode other transformations in the matrices, which should come in useful shortly.

In code, it's straightforward. We'll add a new method getPerspective(double focalLength) to the TransformMatrix class to return a matrix that divides z' by the focalLength:

public static TransformMatrix getPerspective(double focalLength) {
return new TransformMatrix(new double[][] { {1,0,0,0}, {0,1,0,0}, {0,0,1,0}, {0,0,1/focalLength, 0} });
}

Then in the Triangle class:

public void project(TransformMatrix worldToCamera) {

for (int i = 0; i < 3; i++) {
Matrix point = new Matrix(new double[][] { {x[i]}, {y[i]}, {z[i]}, {1}});
Matrix result = worldToCamera.times(point);

Matrix finalPoints = TransformMatrix.getPerspective(FOCALLENGTH).times(result);

xPoints[i] = (int) (finalPoints.get(0,0) / finalPoints.get(3,0));
yPoints[i] = (int) (finalPoints.get(1,0) / finalPoints.get(3,0));
}
}

Of course, result is just an intermediate, and the perspective matrix never changes given a fixed focal length, so if you're the sort of coder who hates to see waste, you can store the perspective matrix in the Triangle class, and do the whole lot in one go:

private final TransformMatrix persMatrix = TransformMatrix.getPerspective(focalLength);

public void project(TransformMatrix worldToCamera) {
...
Matrix finalPoints = persMatrix.times(worldToCamera.times(point));
...
}

Download the source and see for yourself.

One final thing for this episode - I promise. If you move your camera to a position that means objects going behind the camera, you'll see things go a bit pear-shaped because we're trying to render objects that should not be in the view. So there needs to be some sort of check to ensure polygons that are behind the camera are not drawn. That's easy enough, any object which has a negative z' (remember, z' is relative to the camera) should not be drawn. This is slightly tricky, because we need to tell the draw() method that. I'm going to hack it for now, and use an instance variable boolean draw = true;. So then in project(), we do the check:

if(finalPoints.get(2,0) < 0) {
draw = false;
return;
}

and in draw():


public void draw(Graphics2D graphics) {
if(draw == false) {
draw = true;
return;
}
...
}

Note that we reset the draw variable once we've decided not to draw the polygon, so that it can be considered for drawing again in the next frame.

That's a fair slice of stuff for what was really a quite simple bit of functionality. Next time we'll start getting that camera moving, and also think a bit more seriously about that last point.

No comments: