Real Time Ray Tracing
Hiho, Thomas Ludwig (lycium of Quixoft, who? :) here.
I'm not really expertly qualified to write such an article (being only 17 I'm not expertly qualified to do anything except cause trouble :), but I'll make sure that it's as concise and informative as possible. I consider this article to be a work in progress, so if you'd like to know anything about real time ray tracing, don't hesitate to ask. Though I'm not 100% clear on some subjects (e.g. CSG and torus intersections), but I'll do my best to find out about it and perhaps write something on the subject one day. I'm personally interested in this subject myself, so whenever I learn something cool, I'll post something about it here. If anyone knows a lot about ray tracing, but is too lazy to write a tutorial, TEACH ME, and I'll do it :)
Please note that I assume you've implemented a basic ray tracer before, and wish to speed it up. If you haven't done this, you can check out some tutorials on Flipcode about it. I really recommend Tom Hammersley's introduction to ray tracing if you haven't learnt about this yet (for those of you who, like myself, don't have access to a decent scholarly library containing anything other than history and fiction books). If there's any demand (by email, hint hint :), I'll write a tutorial on introductory ray tracing (and don't do what I do: assume that someone else will email me, even though you want the doc. If everyone did this, it wouldn't happen :).
Enough babble, on with the tutorial.
Real Time Ray Tracing
When Whitted invented ray tracing (or at least the generalised rendering method it is today), it took quite a while before the first implementation of ray tracing was done. This is mostly because of the inherent computational complexity of the algorithm, but can also be attributed to the slow machines of the time. Already we can see two fundamental reasons why ray tracing (even today) is not nearly as popular as normal polygon-based rendering methods:
1.) Algorithmically, it's much slower than polygon-based rending methods.
Although these problems are very similar, they are in fact separate problems. You can only optimise the ray tracing pipeline so much, until the only way to speed it up is to:
a.) Throw more processing power at it.
Before we get to the various methods of improving ray tracing speed, we have to accept that we will have to sacrifice scene complexity in order to achieve decent frame rates (even on modern CPUs). This means that we will normally be dealing with less than 16 objects in the scene, and will have very few light sources (normally two).
We can do little about the second fundamental problem I listed above (high CPU requirements), since that's on the user's side. What we can do is take FULL advantage of the CPU power that we do have, for example using the MMX, SSE and 3Dnow! instructions present on many of today's modern CPUs. The problem with this is that it involves a lot of low-level code specific to only one part of the rendering pipeline, which in turn is specific to one type of processor. This basically means it's a LOT of work, so it really should be left as the last optimisation step.
Examples of real time ray tracing
There are actually very few demos and intros which showcase real time ray tracing, but the most notable ones are:
"Rubicon" and "Rubicon 2" by Suburban Creations
While it may not seem like it, Spot's ray tracing is really very amazing, when you consider the geometric complexity of the models it shadow checks (note that it isn't insane enough to trace the water's implicit surface, thankfully :).
Heaven seven is amazing in that it looks a lot better than it is :) It doesn't shadow check the CSG objects, it has only light source (seemingly; I expect to get some flames from the wonderful Exceed bunch on this :) active at any one time, yet it still looks massively impressive.
Rubicon and Rubicon 2 are examples of highly optimised ray tracing, without cheating. While this looks really good, it's still too slow to be considered real time (especially for the high demoscene standards). I've exchanged many emails with the intro's coder (crossbone of Suburban Creations, Eberhard Grummt, hiho mate! :) and I can assure you that this is probably the most algorithmically optimised ray tracer ever. In spite of this, it's still slow because it doesn't employ some of the cheats that Heaven seven and the Fresnel series do. What I gather from this is that you really have to cheat in order to get things running very smoothly. These are primarily optimised for quality (test your CPU and run the high quality version Rubicon 1 :).
Fresnel and Fresnel 2 go beyond cheating and actually use it as a feature: hardware-accelerated ray tracing. I know this sounds impossible, but it really is possible, and you stand to gain enormous performance improvements if you do (not to mention bilinear filtering, perspective correct texturing, sub pixel accuracy, the lot). These really are technical masterpieces, for reasons aside from the real time ray tracing (Wavelet texture compression [the textures aren't generated, they're actually hand drawn and compressed to ~2k each!], another SERIOUSLY neglected topic ripe for tutoring :). I've also exchanged some emails with Shiva of Kolor (the coder, respect!), and this is another really optimised ray tracer, and probably the best example for real time ray tracing to date. Minor flaw: the sub-sampling (cheat, feature, whatever) is done incorrectly at the edges of objects on the screen, producing a somewhat ugly effect. But at the lowest subdivision resolution, it's hardly noticeable.
You can grab all these demos from www.pouet.net (and are really encouraged to do so :).
So without further ado, let's see what we can do to improve the algorithmic efficiency of our ray tracer.
Many ray tracers use spacial subdivision techniques (e.g. octrees, BSPs, KD-trees, etc) to speed up ray tracing, the idea being to reduce the number of intersection tests performed. While this is an absolute must for large (more than ~500 objects) scenes, with small scenes such as the one's we'll be using, it's just overhead. This is a hotly debated subject; this is just my logical reasoning. I haven't performed any official tests to verify this, but I'm fairly confident that this is the case. If anyone has other results from tests, please enlighten me :)
So while spacial subdivision is good for large scenes (to the point where in really large scenes with high resolution output, scene complexity is an almost negligible performance factor, making it the future for rendering [this is my hypothesis, don't flame me please!] in years to come), it's really useless for the real time ray tracing we'll be doing for the next two years or so. Who knows after then?
So the next thing we can do falls into two sub-categories:
1.) Improve the efficiency of all intersection tests.
The solution to the first problem is very straightforward: just go through all your algorithms and equations, and make sure they're absolutely optimal. This normally involves special-casing equations for certain objects (e.g. not using a general quadric intersector to intersect a sphere) and factorising equations to minimise the number of operations required to perform the intersection. Also, sometimes you only need to know if an intersection occurs, and don't really care about where exactly it occurs, this can be really well optimised.
Now let me list some examples of exploiting spacial and temporal coherence.
If you trace your screen in horizontal strips from left to right, top to bottom (as most frame buffers are stored) and need to perform a shadow check for each intersection, you'll notice that if one point is in shadow, then next point will almost certainly be too. So to doing a shadow check for each object (and according to Murphy's law, the object that obstructs the light will be last on the list :) in the list sequentially, keep a pointer to the last obstructing object in the light's structure. A cheat (should be listed under the cheats section, but it's applicable here) would be to only do a shadow check every other pixel, and if one is in shadow then assume next pixel is too (it's hardly noticeable) and skip its shadow check.
If you're ray tracing implicit surfaces and do a piecewise approximation of the surface by interpolating along the grid using a quadratic or cubic interpolation scheme, you can take advantage of the fact that you're tracing along a planar surface (the view plane) and that the points you trace are equally spaced (unless you're using stochastic sampling, which you shouldn't and wouldn't be doing with real time ray tracing :) to make your intersection fast enough for real time purposes (you just need some initialisation per scan line of each block of the implicit surface's grid).
This is intentionally vague because I wanted to include a reference to the technique, while making it brief because I don't expect to see this in real time for another few years :)
Aside from improving algorithms and using the current machine more effectively via specialised instruction sets, you can also improve speed greatly by just using your head a little. If you see a part of your code that's taking a particularly long time to execute, ask yourself if maybe there isn't a way that this could either be removed, precalculated, simulated or otherwise avoided.
Some notable speedup tricks include:
First hit optimisation
If any of the values you calculate during the course of rendering involve the origin of the ray and some other value(s) that remain constant for each frame (sphere radii, object positions etc), you can precalculate those values and store them as private data members of the object's data structure, and just reuse them whenever you need to calculate stuff from primary rays (the rays spawned from the camera or eye). This only works for primary rays, because the reflected or refracted rays obviously do not propagate from the camera's position, which means you'll have to write a specialised intersection routine for primary rays (or, as you'll see later, rays with any fixed origin).
This is a real must for any real time ray tracer, as primary rays constitute a large percentage (relatively, compared to "normal" ray tracing) of the total rays fired, since the scene complexity and difficulty (erm, for want of a better adjective describing the number of reflective and refractive surfaces :) are much lower for this type of scene.
Shadow check optimisation
Shadow checks are very expensive when you have lots of objects in the scene (n objects = n shadow checks PER INTERSECTION). But since all these shadow rays propagate from a fixed origin (the point of intersection), you can precalculate the same values as mentioned above for the shadow rays' origin. This should save you an enormous amount of time.
Again, this really is a must, because it will greatly reduce the speed impact that additional objects incur when being added to the scene.
Render at 1/2 or 1/4 resolution and interpolate
This can be seen in Rubicon 1 and 2. Though it isn't really a trick, it's hardly a kosher way of improving rendering performance either :) You needn't interpolate in normal squares, I've heard of many triangular interpolation schemes (such as the one used in Spinning Kids' "I feel like I could" demo) which look considerably better than standard square-based interpolation. I'm not really sure if there's a speed hit or not when interpolating along triangles rather than squares, but I'm sure there must be (it's just simpler to interpolate along squares). Which raises the question of whether or not those cycles should be used to improve the interpolation, or reduce the amount of interpolation and use squares. Tricky, but I'm sure it wouldn't go that far... some concrete performance tests would be great, but there aren't exactly vast pools of real time ray tracing info on the net :)
Pre-calculate per-frame constants
As mentioned in the introduction, some values related to intersections are constant for the entire frame (that is, are independent of ray orientation), and can thus be pre-calculated. Vector dot products are really good pre-calculation fodder, since the amount of time taken to load the floating point values over the slow system bus (I'm using a Celeron, so I try to avoid using my 83.3mhz system bus and use the high clock speed instead :), to multiply them, sum them, then store them in a floating point register... this is no walk in the park. But loading a single float from RAM is hardly straining, and it also saves you tons of CPU work.
Reciprocals are also very good to precalculate if you divide by per-frame constants a lot.
Polygon mesh rendering
Have you got a nice 4k triangle cow model you'd like to import into a ray tracing scene (hi Cuban! :)? Apart from using a bounding box (and object-BSP), there are very few straight ways to improve the intersection time required for such an object. Solution: don't ray trace it. Just fill a z-buffer with the object as it would appear transformed to the eye's view, along with the intersection distance parameter ("t" as it is often called) and any other info you need (colour, transparency coefficients, normals [expensive memory-wise], anything really), using standard scan line algorithms. So you can just ray trace right over that z-buffer, and everything will be perfect.
Problems start when you need the thing to reflect. Now, if it's a convex object (it at no point mirrors itself), you can just skip the entire mesh in the reflection's intersection tests, and get REALLY cheap reflection calculation for the object! If the object is non-convex, then you'd better make sure that you have a BSP calculated for the object, and that not much of it reflects itself, otherwise things WILL get painful :) I've never seen or heard of this anywhere, it's just a crazy idea I came up with while I was in the shower some day, so I don't know how useful it'd be.
Shadow intersections aren't nice for this either. Make sure you have a BSP and a bounding box (NOT axis-aligned, you want a TIGHT fit, even at the cost of having an arbitrarily orientated bounding box) for this.
In general, polygon meshes aren't used in real time ray tracing, but with a lot of tricks and hacking, it should be possible.
Cast shadow flags
You can really bump up scene efficiency if you know beforehand which objects won't obstruct light cast onto other objects. This is really simple to implement (even when designing the scene), and really helps, so it's a must.
Unfortunately it's really difficult to automatically generate these flags (by checking all points on every surface on all time steps in the animation sequence), so the person creating the scene must set it.
Using variable precision iterative root finders
Normally when you use iterative root finders (for quintic equations and upwards), you hardcode the number of iterations (i.e. the accuracy of) the root finder. You could have some code which increments or decrements this based on the current vs. target frame rate. A slight problem might occur if the machine (or current scene :) is really slow, and the iteration count returns bad roots which cause the program to crash. Perhaps the best solution to that is to just exclude the higher order objects based on the result of a speed test when the demo starts. This also goes for Taylor series calculators for super quadrics, although these really munch CPU power and I doubt they'll be done in real time for a while.
As wonderfully suited to ray tracing OOP might be, it still adds massive (my recent measurements with VTune really scared me) overhead to the typical (classic?) implementation where every object is inherited from a base class defining several virtual functions which need to be implemented in every subclass.
Doing ray tracing without OOP feels really dirty and "hacked", but it's certainly faster: I'm working on separate implementations of my ray tracer for demos (i.e. real time) and for making production-quality images and videos (for my final Matric project, which will render movies on a 28 node Duran 800mhz 128mb LAN :)
At the core you'd just keep a list of the various objects by type (e.g. CSphere sphere, CMesh mesh, etc) along with the number of objects of each type. The number of calls to virtual functions for each pixel (if each one has ObjectNum intersection tests for reflections, then 2x or 3x ObjectNum shadow checks, it starts adding up VERY quickly!) are a very serious overhead which I failed to realise until recently, and is probably the most fundamental decision to be made when setting out to code your RTRT.
Don't use subdivision methods
For normal scenes (at the time of writing, Moore's law be damned! :), spacial subdivision doesn't yield enough of a performance increase to amortise the overhead incurred by using it.
Cheating or faking often produces the biggest speed improvements of all, usually at some expense to technical-satisfaction and image quality :) Since the demoscene lives and breathes on doing the impossible (by not actually doing it, Phong shading via env-mapping springs to mind), this is a very popular thing to do to speed up a slow ray tracer :)
This has to be the most popular cheat around, and is basically the reason why I'm writing this article. I think this is such a neat cheat that it really deserves to be implemented in all ray tracers everywhere. It actually has the potential to IMPROVE your image quality if you do it right.
Basically, what you do is define a grid of points at a resolution much lower (usually using 8x8 or 16x16 blocks) than your screen resolution. At each point on the grid, you do calculate your normal ray tracing, but with a twist: you don't actually draw anything. You just store the intersected object's ID, U and V texture co-ordinates for that intersection, the additive colour at that point (specular lighting + reflected colour + refracted colour) and the diffuse shade at that point (just the simple Lambertian shading co-efficient). Now, all you have to do is interpolate the texture along the square, then shade it by the interpolated diffuse co-efficient, and add the interpolated additive colour. That's it! Well, almost, there are some points to note:
Firstly, what if the object IDs aren't the same on all four corners of the square? Well, there are two things you can do at this point:
1.) The first (and much faster) method is to evaluate the complete colour (as you'd render to the frame buffer normally) for all the corners and just interpolate using standard Gouraud interpolation (or some quadratic interpolation, it's up to you really :) Fresnel 1 and 2 by Kolor use this method, and I'd like to thank Shiva of Kolor for explaining this one to me.
2.) The second (and slower, but much better looking) method is to subdivide and re-trace the square until all the subdivided squares either have the same object IDs or are one pixel in area (making sure to only trace the 5 new points :). Obviously for high resolutions, the latter case can be quite slow, but still not as slow as normal ray tracing! You can actually make this method faster than the above method by increasing the initial interpolated block size (24x24 or 32x32 even), but this starts to look really bad (especially around the sharp shadows, but this can be fixed by adding the shadow's ID to the object's ID [making sure that there aren't multiple combinations of object IDs and shadow IDs that can produce the same ID], thanks Dman for that tip!) after you really turn up the block size, and leads us to the second problem of this method. Heaven seven and just about all other real time ray tracers use this method (a notable exception being Rubicon 1 and 2).
The second problem with this method is that really small objects (smaller than your block size) that are completely contained within the block and not touching any of the corners will be missed. This really isn't an issue with real time ray tracing, since you really can't afford to have such small objects :)
So by all means, implement this. Since this interpolation is remarkably akin to what 3D cards have been evolving to do for ages (hint :), you can be smart like the Kolor bunch and use OpenGL or DirectX to draw your quads. This way, you not only massively improve your rendering speed, but you also get bilinear filtering, sub-pixel accuracy, high quality RGB Gouraud shading, blending, everything for free. Now that is really something, and should be standard with every real time ray tracing demo :) If you still decide to use software to do the interpolation, I'd highly recommend using at least MMX for this (RGB Gouraud isn't nice without MMX), especially if you use bilinear filtering in software (madness, rather turn up your detail settings and object count :)
I did some tests long ago with non-linear frame interpolation, and found that if the scene doesn't change too much from frame to frame (read: SLOW camera flybys :), then this looks really good (like an effect :) and can save you tons of time. What you do is make sure you've rendered at least one frame ahead, and just interpolate between the current, previous and next frames using a cubic interpolator (b-splines work well too, but are far too computationally expensive). Just make sure that you write your interpolator in MMX :)
Another simple trick would be to render TV-style: every other scanline, and just interpolate (preferably non-linearly, it really makes a difference) between the known and unknown scan lines. Tip: for interpolation speed (cache-wise), render vertical rather than horizontal strips.
It's apparently quite possible to fake quartic objects such as torii by using CSGed quadric objects. Gamma2 by mfx supposedly does this, but I haven't really confirmed this, and I didn't notice that it's faked, so it just goes to show how effective this can be (along with really strong blurring filters... I couldn't see much anyway :). Although not related to speeding up ray tracing, it's a cool hack if you can do it correctly.
Conclusion and links
Well, this document is still far from finished, but I'm really eager to get the first version out the door. Maybe I'll get some feedback, which will definitely result in updates. There isn't very much info on the net for ray tracing newbies who don't have a PhD in mathematics, English and computer software engineering, which is a pity.
Most of the documents I've read on high-level subjects such as ray tracing, radiosity, wavelets etc. are normally "encoded" by big shot professors so that only other big shot professors can read and understand them (e.g.: "image" = "discretely bounded uni-polar planar integer function", "blur" = "high frequency band cut-off homogenous scalar convolution in the spacial domain", etc... I should collect quotes like these, they're so damn funny), and I'm only now beginning to get used to their enigmatic "dialect".
Anyway, rant mode off. If you have any straight English ray tracing info (particularly that which is applicable to real time ray tracing), I'd really appreciate it if you could just drop me a line. In fact, email me anyway :)
Here are some links I've gathered on ray tracing, you'll find that many of them have tips on speeding the process up. Just be sure to ignore the spacial subdivision stuff :)
Ray tracing tutorials on Polygone:
Tom Hammersley's ray tracing doc:
Intersection library (lots of code):
RTRT realm (good source of demos):
Greets and credits
I'd like to thank all the friendly people who have helped learn about real time ray tracing, specially my then-mentor, crossbone of Suburban Creations for answering all of my really stupid and incessant questions. Respect! :) Other real greets (I'm probably forgetting 99% of them, but here goes):
Quartz of Kolor (Massively helpful with everything :)