ASM Implementation of Blasting the Bounds

Yesterday I read Adok's article in Hugi 27 about how to use bigger integers than supported in a compiler. IMO his implementation was very far from the optimal. First, I'd make it use the full bytes, rather than the way he's doing it, storing a decimal char in each byte. His way is a waste of space and make's the implementation quite slow. There's a huge difference using the string "18446744073709551615" (20 digits/bytes) than the bytes 0xFFFFFFFFFFFFFFFF (8 bytes). Then we can process the number in dwords (on a 32-bit machine) or qwords (on a 64-bit machine). That means that either 1 or 2 passes is made on the number, instead of 20. Second, I'd limit the number to a fixed number of bits, for example a 128-bit number. That means an easier implementation and faster code. You must know how many bits you're going to use, if you're not programming a scientific calculator. Third, I'd implement it in ASM. That means a more elegant implementation and faster code.

IMO, 64-bit numbers are enough for most things done, so here goes:

Implementation of adding two 64-bit numbers, a and b, to each other:

eax contains low dword of a, ebx the high dword
ecx contains low dword of b, edx the high dword

eax contains low dword of output, ebx the high dword

That's it. Similarly, subtracting two numbers:
(same inputs and outputs)

SUB eax, ecx
SBB ebx, edx

Multiplying two 64-bit numbers:

esi contains low dword of a, edi the high dword
ebx contains low dword of b, ecx the high dword
x and y are variables

MOV eax, esi
MUL ebx
MOV [x], eax
MOV [y], edx

MOV eax, edi
MUL ebx

MOV eax, esi
MUL ecx

x contains low dword, y high dword

Theory of this: This is all elementary school math. Suppose we want to multiply two two-digit numbers:
59 * 28 = 9 * 8 + 5 * 8 * 10 + 9 * 2 * 10 + 5 * 2 * 100
The last product can be excluded, because it will not affect the final two-digit result.
59 * 28 = 9 * 8 + 5 * 8 * 10 + 9 * 2 * 10
The same principle can be used on 64-bit numbers, one digit becoming one dword.

Dividing a 64-bit number by another 64-bit number:

This is really tricky. You can't do it the same way as in elementary school, because in elementary school math you'll have to handle more than one digit as one number. I couldn't make my own implementation, so I searched for and found some code on the 'net:

```;
; _ulldiv divides two unsigned long long numbers, the dividend and
; the divisor resulting in a quotient and a remainder.
;
; input:
;   edx:eax = dividend
;   ecx:ebx = divisor
;
; output:
;   edx:eax = quotient of division of dividend by divisor
;   ecx:ebx = remainder of division of dividend by divisor
;
; destroys:
;   eflags
;

PROC    _ulldiv NEAR

test    ecx, ecx         ; divisor > 2^32-1 ?
jnz     \$big_divisor     ; yes, divisor > 32^32-1
cmp     edx, ebx         ; only one division needed ? (ecx = 0)
jb      \$one_div         ; yes, one division sufficient
mov     ecx, eax         ; save dividend-lo in ecx
mov     eax, edx         ; get dividend-hi
xor     edx, edx         ; zero extend it into edx:eax
div     ebx              ; quotient-hi in eax
xchg    eax, ecx         ; ecx = quotient-hi, eax =dividend-lo

\$one_div:

div     ebx              ; eax = quotient-lo
mov     ebx, edx         ; ebx = remainder-lo
mov     edx, ecx         ; edx = quotient-hi(quotient in edx:eax)
xor     ecx, ecx         ; ecx = remainder-hi (rem. in ecx:ebx)
ret

\$big_divisor:

push    esi              ; save temp
push    edi              ;  variables
push    edx              ; save
push    eax              ;  dividend
mov     esi, ebx         ; divisor now in
mov     edi, ecx         ;  edi:ebx and ecx:esi
shr     edx, 1           ; shift both
rcr     eax, 1           ;  divisor and
ror     edi, 1           ;   and dividend
rcr     ebx, 1           ;    right by 1 bit
bsr     ecx, ecx         ; ecx = number of remaining shifts
shrd    ebx, edi, CL     ; scale down divisor and
shrd    eax, edx, CL     ;   dividend such that divisor
shr     edx, CL          ;    less than 2^32 (i.e. fits in ebx)
rol     edi, 1           ; restore original divisor (edi:esi)
div     ebx              ; compute quotient
pop     ebx              ; get dividend lo-word
mov     ecx, eax         ; save quotient
imul    edi, eax         ; quotient * divisor hi-word (low only)
mul     esi              ; quotient * divisor lo-word
add     edx, edi         ; edx:eax = quotient * divisor
sub     ebx, eax         ; dividend-lo - (quot.*divisor)-lo
mov     eax, ecx         ; get quotient
pop     ecx              ; restore dividend hi-word
sbb     ecx, edx         ; subtract divisor * quot. from dividend
sbb     edx, edx         ; 0 if remainder > 0, else FFFFFFFFh
and     esi, edx         ; nothing to add
and     edi, edx         ;  back if remainder positive
add     ebx, esi         ; correct remaider
adc     ecx, edi         ;  and quotient if