Quick Sort, A Fast Sorting Algorithm

Introduction_____

The quick sort algorithm, presented by C.A.R. Hoare, is used to sort an array and is recognized as one of the better sort algorithms. The other great sorting algorithm is radix sort, but it's used for small or medium-sized data. Almost every coder has implemented radix sort for his 3D-engine.

Quick sort is the best algorithm for big data, ASCII strings... Yep! Wait a moment, I need to sort strings! Then, quick sort is what you need.

This algorithm can be used for different things, but in this article I'll focus on the application that it has in BWT. You don't know yet what that is? If you want to know, visit some links from my homepage:

However, I'll present you the basic algorithm, and how to extend it so that it fits your data. If anyone has a better implementation or algorithm, let me know.

Here I'll present C code, which isn't mine at all, and Asm code for PC, under Protected Mode. (You know, those big pointers.) And that's mine.

The theory behind the algorithm_____

What do we need? To sort a given set of data. And what do we primarily want? Speed. The main idea of qsort is "Split and you'll win", so we just treat the data every time in a smaller range. We get a number, and then compare both numbers from the right and the left, avoiding the numbers that are 'sorted', i.e. are greater or lower respectively than our first number. Then for the split the only thing we have to do is to call the qsort again, but this time with a smaller range, and so on till we have all the array, the data set, sorted. That's why qsort is recursive, it calls itself.

Qsort in C_____

Here is the code in C:

``` /* Just some data */
unsigned char Vektor[];                   //the declaration of the array

/* The quick sort itself */
void QuickSort(int left, int right)  //if we have 6 elements, then (0,5)
{
int i;        //the pivot from the left
int j;        //the pivot from the right
int x;        //the number used to compare

i = left;     //initalise both pivots
j = right;
x = Vektor[left+right / 2];   //and the number, with the byte in the
do            //the main loop                  |-> half of the range
{
while ( Vektor[i] < x && i < right )
i++;   //we avoid the numbers that are lower than x
while ( x < Vektor[j] && j > left )
j--;    //the same, but with the greater numbers
if (i <= j)         //if we are still in a valid range
{
SwapNumbers(i,j);       //just swap both bytes
i++;                    //and update both pointers
j--;
}
}while (i <= j);              //repeat if still in valid range

if (left < j) QuickSort(left,j);
//continue scanning in the left range
if (i < right) QuickSort(i,right);
//continue scanning in the right range
}                    //of course only if there's still data to scan

/* The function to swap two elements in the array */
void swap (a,b)
{
{
unsigned char aux;   //help for swapping them
aux=Vektor[a];       //I think you don't need comments here
Vektor[a]=Vektor[b];
Vektor[b]=aux;
}```

For checking it, just call it with quicksort(0,ARRAY_MAX-1); OK? Now go to the debug mode and understand it.

Qsort in Asm_____

Here is where I do the real job.

First we must see if we have enough registers for ALL the data. We will have good luck if we have enough. Here it is how we'll use them:

``` ESI - Base pointer, pointing to the start of the vector (the data to sort)
EAX - Index pointer, i
EBX - Index pointer, j
ECX - X, the data that was in the half of the vector
EDX - free, for making comparisons and as 'aux'
EBP - left
EDI - right```

We can't use ESP 'coz we'll need it for the push'n'pops. Remember that this function is recursive, so we need to allocate the memory 'dynamically'. We cannot (and we shouldn't) use variables (like j, i, x...) for it. Well, let's start with the code:

``` qsort:

mov             eax,ebp       ;i=left initialise them
mov             ebx,edi       ;j=right
mov             ecx,eax       ;Get the X
add             ecx,ebx
shr             ecx,1
mov             cl,[esi+ecx]  ;x WARNING: it's a byte.```

So now we have initialised the data and can start with the loop:

``` @@qs_inner:

;-> while ( Vektor[i] < x && i < right ) i++;

@@qs_while1:

mov             dl,[esi+eax]  ;vector[i]
cmp             dl,cl         ;vector[i] < x
jae short       @@qs_no_1     ;jump if it's not true
cmp             eax,edi       ;i<right
jge short       @@qs_no_1     ;WARNING: The dwords are SIGNED

;-> once there, we have to execute the body of the while
inc             eax           ;WARNING, add to EAX the
;difference from one element
;and the other
jmp short       @@qs_while1   ;we are simulating a while

@@qs_no_1:```

This was the first while. Note again that the dwords are signed. I needed two hours of debugging to notice that. In fact I was searching the bug in the rest of the function. ;-(

``` @@qs_while2:

;->while ( x < Vektor[j] && j > left ) j--;

mov             dl,[esi+ebx]  ;vector[j]
cmp             cl,dl         ;x < vector[j]
jae short       @@qs_no_2     ;as the other
cmp             ebx,ebp       ;j > left
jle short       @@qs_no_2
dec             ebx           ;--j
jmp short       @@qs_while2
@@qs_no_2:```

We have half of the code. And now the swap part comes:

```        ;-> if (i <= j) swap()
cmp             eax,ebx         ;if (i <= j)
jg short        @@qs_no_if1
;-> swap them
mov             dl,[esi+eax]    ;vector[i]
push            dx
mov             dl,[esi+ebx]    ;vector[i]=vector[j]
mov             [esi+eax],dl    ;vector[i]
pop             dx
mov             [esi+ebx],dl    ;vector[j]=vector[i]
;-> i++ j--
inc             eax             ;update both pointers,
dec             ebx             ;remember that we comp. bytes
@@qs_no_if1:
cmp             eax,ebx         ;let's repeat everything
jle short       @@qs_inner      ;if needed```

It's not my intention to explain the whole algorithm, but note that we swap the registers when i=j. But then, you may ask, we are swapping the same element? Yes, we do that 'coz then the pointers begin updated, and we don't fall into an infinite loop. Now we must see if we have to recursive recall qsort, with different ranges:

```        ;-> if(left < j) qsort(left,j)  new range, the left one

cmp             ebp,ebx        ;(left<j)
jge short       @@qs_no_if2
;'-> qsort(left,j)
push            eax            ;save ONLY the registers
push            ebp            ;that we'll need
push            edi
mov             edi,ebx        ;right=j left=left
call            qsort
pop             edi
pop             ebp
pop             eax
@@qs_no_if2:

;'-> if(i < right) qsort(i,right)
;new range, the right one
cmp             eax,edi        ;(i < right)
jge short       @@qs_no_if3
;we don't really need to push any register
mov             ebp,eax        ;left=i right=right
call            qsort
@@qs_no_if3:
ret                            ;the end of the qsort```

Ok, you've understood everything? I hope so, 'coz it's not very difficult.

Extending it_____

For example if you need to extend it to ints, the only thing you have to change is the way the comparations are done. Or if you have more complicated data structures, the sort algorithm is the same. Say we have this data:

` <int><pointer>`

When comparing we'll compare the <int>, and when needed to swap, we'll copy <int> and <pointer>. This is an idea for extending the qsort.

Sorting strings_____

Now let's do something we'll need, sorting strings. But we'll not directly sort strings, we'll sort pointers. The actual comparison will be done with the strings themselves, but we'll swap the pointers. As you see this is totally BWT focused. Did you ever have a look at the C qsort? It calls a function of your choice and returns a value depending if it was greater, lower... This is the way we'll do it, a function that compares strings.

Ok, now the values to compare will be strings (we'll now do the function) and the element to swap will be the pointers.

The conversion is easy, or at least it seems so. When incrementing i:

` AFTER: inc eax  NOW: add eax,4`

That is 'coz the difference beetwen one element and another is 4 bytes. Remember that we are using PMode pointers (dd).

When copying:

` AFTER: mov dl,[esi+eax]  NOW: mov edx,[esi+eax]`

And a warning about a bug I had (if your qsort hangs, then first check that): When choosing the middle element, we must first divide (shr) by 4 (2) and then multiply (shl) by 2 (1). Note that we CAN'T do shr 1!!

Example:

``` 0Bh - shr2 -> 02h - shl1 -> 04h        ;Ok
0Bh - shr1 -> 05h                      ;WRONG```

Remember that the value x (ESC) will now store a POINTER. We can't compare it directly. Once you have implemented these little things, we can do the string comparison. We MUST scan the whole string if needed, we MUST stop at the end of the string, and we MUST return some value that tells when the string is greater, lower or equal. We must use the registers in a way that we don't have to push a lot of them and it can be easily called from qsort. Let's think...

``` ECX - pointer to the first string
EDX - pointer to the second string
EAX - used for the comparations
EBX - counter, so we'll neither increment ECX nor EDX```

Then the code starts:

``` qs_comp_strings:

push            ax ebx     ;Push what we'll use
mov             ebx,-1     ;to properly enter the loop```

This is when we initialise the registers. Now the main loop:

``` @@qscs_more:
inc             ebx                  ;next byte in the string
cmp             ebx,qscs_string_max  ;Don't comp. beyond the end
je short        @@qscs_equal         ;if it's equal, then exit```

I have to say something: qscs_string_max is a VARIABLE, and it must be equal to string_lenght+1.

```        mov             al,[ecx+ebx]        ;get byte from the first
cmp             al,[edx+ebx]         ;compare with the second
je short        @@qscs_more          ;while equal repeat```

Once there, they are different, so we check the flags and return.

```        ja short        @@qscs_above
jb short        @@qscs_below```

Quite easy, isn't it? If it doesn't jump, they are equal:

``` @@qscs_equal:
xor             edx,edx            ;they are equal, return 0
jmp short       @@qscs_end```

And if they aren't equal:

``` @@qscs_above:
mov             edx,1          ;if ECX is greater, then 1
jmp short       @@qscs_end

@@qscs_below:
mov             edx,-1         ;if ECX is lower, then -1

@@qscs_end:
pop             ebx ax
ret```

This is the string comparison function, quite easy. Now let's use it. I'll give you an example, and you will do the rest. I think I have put too much Asm code here. For example in the first comparison.

``` - Vektor[i] < x
- mov     dl,[esi+eax]
cmp     dl,cl
jae short  @@qs_no_1```

We do:

```        mov             edx,[esi+eax]    ;the pointer from [i]
call            qs_comp_strings ;ecx already has the pointer
cmp             edx,1           ;X must be above, so it should return 1
jne short       @@qs_no_1       ;if not, then jump out of the while```

OK? Then do your (own?) implementation.

Using it with BWT_____

You have 90% of the comparison function. Now you only have to care about keeping track of what was the number of the pointer first. I recommend you that you have in memory:

` - pointer1 (dd) - position pointer1 (dd) - pointer2 - position pointer2`

So when swapping, you just swap two consecutive dwords, and nothing else. Remember that in BWT, you don't do 'length' strings, you only have pointers. But what happens when you rotate them?

``` String: abc
Rotated: bca cab```

Beyond the position of 'c' there's nothing else, so you may do two things:

``` 1- Substract the string lenght from the pointer: (string rotated: cab)
You read 'c'
Then you are in the end, substract 3 from the pointer
Read 'a'
Read 'b'```

``` 2- Write the file to memory twice:
Real string in memory: abcabc
Read 'c'
End, no problem, the pointer is in the 'start' of the string
Read 'a'
Read 'b'```

I had the second idea while thinking about qsort. It has some drawbacks. Well it has only ONE drawback: You will need twice the mem for the input file. I think this is not a real problem. For example, bzip has 500k as the maximum block size, so we'll only need 500k more, but nothing else, the rest of the data structures remains unchanged. And now the good things: It will make sorting FASTER, and implementing it will be easier. So, and having in mind that sorting a lot of strings is slow, I think it's a good solution. I'll use that in my own implementation of BWT.

Closing words_____

Now you should implement it. Well, in fact you only have to cut'n'paste the code, but I know you will try to understand the code - at least I hope so.

Now I must implement BWT. In the three last days, I've learned BWT, RLE, Mtf and qsort. I've implemented all of them but BWT. Now I'm tired of writing. So I better code more.

Don't forget to visit my home page for more articles and links at:
http://www.geocities.com/SiliconValley/Byte/6789/

If you have any doubt, email me at dario_phong@mixmail.com.