3D Coding TAT - Sorting Out The Performance Problem
Whether you use the incredibly slow BUBBLE, RADIX or QUICKSORT method there is a lot of work to be done for each render cycle and the greater the number of polygons the greater the amount of sorting. As ALL forms of sorting involve lots of memory accessing this can be very costly in terms of CPU time. Techniques such as BSP (Binary Space Partition) can be used to cut down on the depth-sorting workload, but unless your 3d engine is completely deserted then you will need to sort the objects and inhabitants into their order of distance.
As far as I know BSP only works with stationary polygons (or at least polygons which don't move in relation to one another). This isn't much good for moving things is it?
1: Delta Sorting
One solution is what I call "Delta" sorting, using the previously sorted list as the starting point for the next sort. This can be extremely effective because objects tend to mostly change by small amounts so only a few items need to be re-sorted into order. In the very worst case of a 180° turn the entire list needs to be resorted but this tends to happen very rarely. And when it does you could reverse the entire list and apply a small amount of sorting. Using the change in viewing/rotation angles might be one way of estimating the amount of re-sorting which needs to be done. You could also keep this sort list updated by checking each object whenever it moves. If an object stays still then it's a fair assumption that it doesn't change its position within the sort list, so won't need sorting.
I have mentioned elsewhere in this document about "group" algorithms (hierachical data structures). These are very efficient ways of organizing data because it allows a quick scale to be imposed on a large amount of data. This can help us speed up 3d objects and sorting. Imagine having 3 (crude) monster 3d models each with 2 arms, 2 legs, 1 head, 1 torso and 2 weapons. This means a total of 24 parts and if we use a cube for each part then we suddenly we have 144 polygons and thats BEFORE we add the background, add more monsters or define decent 3d models. We need to sort these 144 polygons into some form of order before we can begin to draw them.
But why break each monster down into polygons? Instead sort each monster's world position and then sort their body parts individually. This is in affect the same as a radix sort algorithm where the most significant digits are sorted first and then the less significant ones. Of course this 'sibling sort' (where objects are only sorted within their own level) may cause glitches when one monster is close to another and the nearest one's arm should be behind the next.
If you do want proper depth-sorting for close monsters or objects that properly order tangled arms or legs then I would use another 'delta' trick. Instead of starting at the very first item in the sort list and searching for the place to insert a polygon (or whatever) or searching from the central point in either direction, why not try searching from the last inserted item. This should help speed things up a lot because we are going to be sorting fairly local groups of items such as monsters or furniture etc... where their coordinates only occupy a mere fraction of the entire sort range. (I.e. all of a monster's body parts are located within a short distance from its main torso).
0 .............................. max <- sort range
In the above diagram we need to sort 4 objects:
Each object has a varying number of items from 3 to 11 and each of these 26 items must be inserted somewhere in our sort list. As I said before a 'group' approach can be useful to some problems and it can to this one. We can take each object at a time and insert each item into the correctly sorted place in the list. As the items that make up each object tend to be very close to each other we can use the first item's place to start searching from. In the case of object 2 we would insert item 'g' and then search from that place for all the other items (h,i,j,k and l). As you can hopefully see this should be much faster than beginning at any fixed position (like the start or the middle).
The above method can be made a little more clearer by thinking of sorting in a hierarchical way. We can sort the world position of each object and use these starting places as origins for the sorting of each object's items.
One advantage over the radix sort is that it is very easy to implement this kind of 'delta-sort' on linked-lists, we just keep a pointer to our last modified position in the chain. Where as radix sort (I think) needs to resort the entire list everytime and must be able to randomly-access the sorting list. Linked-lists lend themselves very easily to local transitions (moving forwards or backwards by one element at a time).
Also remember that if we sort all the object's world positions into order first before we sort each one's items then we will be inserting object in a mostly sequential way, so the previous insertion point is a handy place to start searching from.
2: Minimal access - Index tables
It should be obvious that sorting and the reordering of data structures must involve the least number of memory accesses as possible. One way to rearrange chunks of data quickly is by using an 'index' or 'pointer' table and only ever sort the entries in that and don't ever swap big areas of memory around. Only the entries in the table are modified, not the data structure to which they point. This does have a few drawbacks.
1. A large enough table must be allocated for the maximum number of items to sort.
2. You must read a pointer from the table before each sort field can be compared.
3. Insertion & deletion takes far more time then a linked list method.
src = table[n]
flag = 0
for n = 1 to Number_Of_Entries-1
dst = table[n+1]
if dst[distance] > src[distance] then
table[n] = dst
table[n+1] = src
flag = 1
src = dst
if flag = 1 then goto Bubblesort
The above bubble sort is SLOW, but it hopefully shows how the index 'table' of pointers works. The good thing about this method is that each data structure can be of varying sizes as long as the [distance] field remains at a constant displacement.
Another way is to use linked-lists to chain all the objects to be sorted together. For collision detection and movement this can be useful because we can quickly discover neighbouring objects. A disadvantage with using linked-lists is that binary-searches or other random-access techniques can not be used because it is a serial structure (to find the Nth link you must follow N elements). So the index method is good for this reason, to find the Nth element we just index into the table, also on 80x86 CPUs the index table will cache in much better as only a single dword is needed for each entry and they are continious in memory which just how the CPU likes it). The linked-list method is slower for locating a certain element in the chain because the CPU must jump about through memory and this doesn't cache very well.
3: Squatting Objects
Another technique which can be employed to help speed up sorting is to place tags on the map to indicate where objects or creatures are. If parts of the 3d environment does not change position in relation to one another (99% don't) then as we rotate, project and sort the environment we are sorting the objects/creatures (tags) too. This can give rise to an extra performance gain in way of clipping. If a creature or object is in a room and the room is totally clipped (or hidden from our view) then we can reject both at the same time without the need to check the object/creature. We can associate places with their inhabitants and visa versa. In effect we have reduced objects and any movable items to that of just vertices. Once these position tags have been processed like the rest of the background we can decide whether an item needs to be processed any futher, otherwise we can reject the 3d model and all it's vertices and polygons.
Another reason for tagging what items are in certain locations is collision detection and movement. Together with the sort list we can confine our collision detection to neighbouring items rather than checking each and every one. For example if we had 100 items and we needed to check for item-to-item collision then it means a vast 4950 checks!!! (99+98+97....+2+1). If we confine our checks to neighbouring items then only 99 are needed. As our sorted item list of objects/creatures is based on distance this isn't too difficult. And if we only check objects within the same or neighbouring rooms (checking for closed doors etc.) and we only check items which HAVE moved then this can save even more processing time.