This article describes the theory and implementation of a 3d-viewing- transformation for an arbitrary view of a 3d-scene (3d-world). The reader should be familiar with some basic concepts of 3d-graphics like perspective projection and should also have some knowledge about vectormath and matrix operations. The shown viewing-transformation (VT) is a simplified version of a general VT (sometimes also called normalizing-transformation) that I developed for this article. For an excellent introduction to 3d-viewing look at 'Computer graphics, principles and practice' by Foley e.a., which most of you should own already.

Some words about the style of this article: The described theory of 3d-viewing is just math in principle, and could therefore be handled in a purely mathematical manner. But in my opinion it is more useful to explain the concept with words and not only formulas.


3d-viewing, as it is described here, is simply the projection of a 3d-scene on a 2d-plane, which resembles the screen of your monitor. 3d-viewing of this kind is needed, if you want to view 3d-scenes on some 2d-display.

You all know the most simple concept of 3d-viewing, where the objects of the 3d-world are viewed from a fixed direction (usually the observer is located on the Z-axis) and perspective projection is used to map the 3d-scene on screen. This method is quite limited, because you have to simulate the movement of the observer through transformations of the objects in the scene.

The Virtual Camera

The position of the objects in the 3d-world are specified in the so called World Coordinate System (WCS). In this article all coordinate systems (CS) are left-handed orthogonal coordinate systems:

                Y ^                          (figure 1)
                  -------> X

I prefer a left-handed over a right-handed CS for one reason: the z-axis is pointing to infinity (that is spatial infinity in this case), and infinity has to be positive, NOT negative (maybe a philosophical problem ;). So, always use your left hand in the 3d-finger game, thumb pointing to the X-axis, forefinger to Y-axis, and bad-finger to the positive infinity of the Z-axis (and don't dislocate your fingers).

To specify an arbitrary view of the scene we introduce the concept of the Virtual Camera which is the observer of the scene:

                    VUP(v)                   (figure 2)
                URES          VPN(n)
                        /   VRES
                    /        VRI(u)
            VP -/------------
               /-D      .
             /   .

The Virtual Camera basically consists of two main components (figure 2):

i) The so called View Plane (VP), sometimes also called Projection Plane, a 3-dimensional rectangular plane of width URES and height VRES. The position and orientation of the VP in the WCS is specified by a left handed CS, called Viewer Coordinate System (VCS). The origin of the VCS is located at the center of the View Plane. This vector, which I call CW, for Center of Window, is given in WCS-coordinates, and resembles the position of the viewer. The 3 orthogonal axis, called u,v and of the VCS, called u,v and n respectively, are defined by three unit vectors, called VRI (View Right Vector), VUP (View Up Vector) and VPN (View Plane Normal) respectively. The orientation of the VP is entirely controlled by the orientation of these three vectors.

ii) As the scene is projected onto the VP using perspection projection, one needs a center of projection, which I call FOCUS in my implementation. The FOCUS is located on the n-axis of the VCS at distance D from the origin CW. The coordinates of FOCUS in VCS-coordinates are (0,0,-D). If you need the position of FOCUS in WCS-coordinates you would simply calculate FOCUS|wcs = CW + -d * VPN (where FOCUS|wcs means the vector FOCUS specified in WCS-coordinates).

VP and FOCUS together are building the View Volume, a semi-infinite-quad- plane pyramid with apex at FOCUS (figure 2). The 3d-scene is projected onto VP with FOCUS as the center of projection. The View Volume is just the part of the scene, that is visible for the observer.

The introduced method using VP and FOCUS allows to specify an arbitrary view of the 3d-scene and is perfectly suited for operations like camera- rolls, panning etc. Moving around the scene is just translating CW, a camera roll can be accomplished by rotating the CVS-vectors around the n-axis (VPN) and so on. Just one more word for better understanding: the vector CW is in fact the position of the observer in the 3d-scene.

example 1) We set

VRI = (1,0,0) and FOCUS = (0,0,-d) in VCS-coordinates,
VUP = (0,1,0) and CW    = (0,0,0)  in WCS-coordinates
VPN = (0,0,1)

Further we set URES=320, VRES=200. Ok, right now you should get the point and remember the good old days, where you coded rotating ducks and stuff like that. ;) This case is resulting in the easiest form of 3d-viewing, where the observer is located on the negative z-axis and the VP is located in the XY-plane. The VP is directly corresponding to the physical screen in this example. The View Volume in this example has the Z-axis as its center-line. The slopes of its bounding faces are URES/(2*d) for the top bounding-face and VRES/(2*d) for the right one. One important property of the introduced viewing transformation is, that the bounding faces will have slope 1, making 3d-clipping simple and efficient (in that sense I've to mention clipping in homogeneous coordinates, the interested one should look at [1]). The view volume with this properties is called Standard View Volume, and the vectors inside can be projected directly with the perspective projection formulas (I use the term Standard View Volume for every View Volume with apex at the origin, the Z-axis as the center-line and slope 1 for the bounding planes. A Normalized View Volume would be a Standard View Volume uniformly scaled with the back-clipping-plane lying at z=1).

                    Y ^                      (figure 3)
                      |                     The standard view volume
                      |              /  Z
                      |          /
                  VUP ^
         (-d,-d,0)            VPN
            VP      O-------->--------->
                    /       VRI       X
               /        . (d,d,0)
             /   .
         FOCUS (at -d on Z-axis)

The above given values are a special case, showing that the concept of the Virtual Camera covers almost all possible tasks of 3d-viewing (at least the most general version) and therefore is a very flexible and fast solution.

The Viewing Transformation

After the Virtual Camera is now specifying the view of the 3d-scene, the task of the viewing transformation (VT) is now to transform the view volume of the Virtual Camera (that is located somewhere in the 3d-scene) into the Standard View Volume (figure 3). This viewing transformation can be split up into 4 seperate transformations:

1) Translate CW (Center of the View Plane) to the origin of the WCS. The translation is of course T1 = T(-cw.x, -cw.y, -cw.z), where

                1 0 0 x 
    T(x,y,z) =  0 1 0 y 
                0 0 1 z 
                0 0 0 1 -

In the implementation I'm using only 3x4 matrices, because the 4th row is always [0 0 0 1] for the needed transformations. I'm post-multiplying vectors with matrices. A translation of a point (px,py,pz) by T(x,y,z) would be for example:

     1 0 0 x   px     px+x 
     0 1 0 y   py  =  py+y 
     0 0 1 z   pz     pz+z 
     0 0 0 1 -   0 -     1   -

2) CW is now in the origin, but the VP is not coplanar with the YX-plane (the VP does not lie in the XY-plane). To accomplish this, the vectors VRI, VUP and VPN are rotated to lie on the X,Y and Z-axis respectively. This is the only part of the view-transformation that is a bit more tricky: the matrix for rotating three orthogonal unit vectors into the principles axis of the standard-coordinate system (of the same oriention, that is left handed in this case) is just the matrix having the three vectors as its rows! In our case the rotation matrix would be

         vri.x  vri.y  vri.z  0   which would rotate the VCS-axis to lie
    R =  vup.x  vup.y  vup.z  0   on the principle axis.
         vpn.x  vpn.y  vpn.z  0 
             0      0      0  1 -

Using R as the rotation matrix works because of the following reasons:

R is an orthogonal matrix. The rows are cotaining unit vectors, which pairwise dot-product is zero (they form an orthogonal coordinate system). Because of this fact

       R * R  = E  ( R is the transposed matrix of R )

is valid (with E as the entity-matrix). And the rows of E are just the result of multiplying R with one the vectors VRI, VUP or VPN. So R rotates them to the principle axis (E has the vectors of the principle axis as its rows). You got the point? If not, you could also imagine the rotation matrix for rotating a vector around the Z-axis. And then take the rows of the matrix as vectors and rotate them with the matrix itself, use some trigonometry and you will notice something (or use maple/mathematica instead). If you want a perfect understanding of the above you should read some good math-book.

3) Now we want the FOCUS to lie in the origin (CW is still in the origin) so we use T2 = T(0,0,foc.z) = T(0,0,d).

4) After the transformations of 1)-3) the View Volume has the right orientation, the FOCUS the right position for perspective projection, but the slope of the bounding faces is not 1! Until now the View Volume looks like this (view from the right side = pos. x-axis, onto the yz-axis):

          ^ Y               .
                     .       y=VRES/2           (figure 4)
                   .      . 
                 .     .    
               .    .       
             .   .          
           .  .             
           .       D        
      FOCUS .                |CW     Z
      at O     .             |
                  .          |(CW.z = D)
                     .       |
                        .    |
                           . | y=-VRES/2

You remember that (URES,VRES) were the dimensions of the recangular VP. For example to make the top-bounding face have unit slope in this example we have to scale about 2*D/VRES in the Y-direction (the resulting face is also shown in figure 4). The corresponding scale factor in X-direction would be 2*D/URES. After this scaling the VP would be a square with dimension 2*D. So the perspective projection would calculate values between -D and D in both dimensions which had to be rescaled to screen-dimensions again. To simplify this, we uniformly scale the View Volume with the factor 1/D. Now the VP-square has the edge-len 2 and the perspective projection will result in values from -1 to 1 for both the X and Y coordinates. So multiplying with the half screen-resolution will directly result in screen-coordinates (after adding the midpoint of the screen). The proposed solution seems to be one of the fastest to me.

The scaling matrix for step 4) is now:

                                                    2/URES   0     0  0 
    S1 = S( 2*D/URES * 1/D, 2*D/VRES * 1/D, 1/D) =    0    2/VRES  0  0 
                                                      0      0    1/D 0 
                                                      0      0    0   1 -

The final View Transformation Matrix is the product of these four matrices:

     MATRIX = S1*T2*R*T1

And view-transforming a vector (x,y,z) is done by multiplying the vector with MATRIX. This transformation takes 9 multiplications and 9 additions, the matrix has to be calculated only once for all vectors in the scene. So the advantage of the shown View Transformation is not only flexibility of the Virtual Camera concept but also speed.

Perspective Projection

After applying the View Transformation to a vector of the scene (for example a vertex) we can directly use the transformed coordinates for perspective projection. If (x,y,z) is the transformed vector, we can calulate the projected 2d-coordinates by:

          D * x              D * y
    xP = -------  and  yP = -------
            z                  z

(You might know this formulas as xP=(D*x)/(z+D), yP=(D*y)/(z+D) which are correct, if the center of projection is at (0,0,-D) on the z-axis)

Vectors inside the View Volume are projected on coordinates of range [-1..1] (or (-1..1] if you round up). Mapping them to screen-coordinates is now just a matter of multiplying with the half-screen resolution (the code for this is located in the class Render in the implementation).

If you are using float-vectors, the perspective transformation could be calculated by: (pseudo code ;)

      xres2 = ScreenXResolution/2;
      yres2 = ScreenYResolution/2;

      for all vectors to be view-transformed {
        get vector (x,y,z)
        div = D/z;
        xP  = (x * div * xres2) + xres2;
        yP  = (y * div * yres2) + yres2;

2d-clipping can be done as usual, but 3d-clippig can be done more efficiently in the standard view volume with bounding faces of slope 1. To check for example if a vector is located above the view-volume one has just to com- pare if the Y-coordinate of the vector is greater than the Z-coordinate, that's all. Besides the 4 bounding planes of slope 1, there are also 2 more bounding planes (even bounding squares in this case): the front and the back-clipping-plane. The back-clipping plane could be used for clipping far away objects together with fading object-intensities in z-direction. Front and back-clipping-planes have the some orientation as the VP but are located at different Z-coordinates. A side view of the Standard View Volume would look something like this:

        Y ^                      .                 .
                           .                      |       (figure 5)
                  y=z=D                           |
            y=z=F .                              
          ------------------------- ... ----------->
            .                                      Z
                           .                      |

The back-clipping has normally much higher z-coordinates than D or F.

There is a lot more to do with the Standard View Volume, like fast 3d-clipping. One also could distort the view-volume to a cube, what would make the 3d-clipping even easier (clipping in homogeneous coordinates is a possibility, too). But the size of this article is limited (as my time) and so I can't explain everything (maybe in a future article).

Some words on the implementation

The source is intended to demonstrate just the View Transformation, and therefore does not include complex nor fast rendering (in fact I used the Watcom graphics library, which you should never use if you want fast graphics ;). The only important part of the source is inside the class VCamera, which is an implementation of the Virtual Camera. It has some primitive functions for manipulation the camera and of course the function for calculating the View Transformation. The rest of the program consists of the class Object3d, which loads and preprocesses 3ds-asc-objects, and the class Render, that renders the object using the library functions. Inside the Render-class the View Transformation is applied to the vertices and the perspective projection and simple clipping is done. I've also included some simple movement for the virtual camera (roll, pitch, bank etc), but you should use a more complex method of movement control in your engine, like a quaternion/spline driven movement control. The View Transformation just takes the vectors of the VCS as input, how you move them is up to you! For further information read the file readme.txt or the comments inside 3dview.cpp.

Some words to the source-style: I'm often using references instead of pointers, because it a) looks better and b) offers some advantages (one example is setting up references to other objects inside the constructor of one object). The source isn't optimized much, because it's just a tutorial source (that's also the reason for the comment overkill). For one small example of optimization read the text optimize.txt. One more thing you might notice: the 3d-vector datatype is implemented as a class, but the 4x3 matrix as a typedef+structures. I've done this just to show the two possibilties (which are equally fast).

Closing words

This article was intended to explain a solution to 3d-viewing, that is fast AND flexible. I hope that my article was somehow useful to you, and enables you to include real 3d-viewing in your engine. This article could be at least a starting point for doing this. If you are new to 3d-graphics or not that experienced in vector math this article could have caused you some pain, but stay calm, and read it again. Or take a look at the source, and you will notice, that the actual View Transformation calculation is a quite simple and compact function. If you are an experienced 3d-coder, you did know all of this already (and are sitting in front of your screen with that typical arrogant coder smile). But even in this case it may have helped to improve your skills, because you had to transfer my notation to the one you know (for example FOCUS instead of PRP and so on). ;)

Some references

 [1] 'Computer graphics, principles and practice'
     Foley et. al, Addison Wesley, ISBN 0-201-12110-7
     (It has the imo best introduction to 3D-viewing)

 [2] 'Computer graphics, second edition'
     Donald Hearn and Pauline Baker, Prentice Hall, ISBN 0-13-159690-X
     (This book covers many fields, compact but detailed enough)

 [3] '3D computer graphics' by Alan Watt, Addison Wesley, ISBN 0-201-63186-5
     (I clearly prefer [1] over this one)

 [4] 'Ray shooting, Depth orders and Hidden Surface Removal'
     Mark De Berg, Lecture Notes in Computer Science, Springer Verlag
     ISBN 3-540-57020-9
     (This book is pure hardcore! If you are interested in advanced topics, you
     should take a look at this.)

- tryx/xography