The Hidden Power of BCD Instructions

St0Ne

While programming a bit with x86 ASM, I discovered that AAx instructions are just great. Yes, I'm talking about those weird BCD (Binary Coded Decimal) instructions made for programmers who can't count in hex...

Most if not all Intel processors can count in decimal, storing integers between 0 and 9 in four-bit units, and performing add, sub, mul and div with them. In many books or asm tutorials, those instructions are considered 'inefficient' or 'completely useless'.

Let's take a look at them:
AAA : ASCII Adjust after Addition
AAS : ASCII Adjust after Subtraction
AAM : ASCII Adjust after Multiplication
AAD : ASCII Adjust before Division

Although they are second rank instructions, Intel reserved four opcodes (actually six if you include DAA and DAS) and one flag (auxilliary carry) for them... that deserves some attention.

Unfortunately, none of these instructions is correctly documented by Intel! You may be already aware of that, but I'll explain what those instructions do exactly.

Then, we will look at what can be done with those pretty nice instructions, especially when doing size optimization...

AAA - AAS

These instructions 'ASCII adjust AL after addition/subtraction, using AF to determine whether there was a carry'.

Let me explain what this AF flag is. AF stands for Auxiliary Carry. The AF flag is set when an arithmetic operation generates a carry or a borrow out of bit 3 of the result.

For example 5+2=7: AF is cleared.
However, 9+8=17=11h, which is greater than 0Fh: AF is set.

According to Intel documentation, AAA/AAS are only useful after an ADD/SUB instruction; actually we can use AAA and AAS after any arithmetic instruction (they all modify AF), including INC/DEC and CMP.

So, let's take a look at what AAA does. Here is the pseudo-code equivalent (from Intel):

        IF ((AL AND 0FH) > 9) OR (AF = 1)
        THEN
                AL = (AL + 6) AND 0FH;
                AH = (AH + 1);
                AF = 1;
                CF = 1;
        ELSE
                AF = 0;
                CF = 0;
        FI;

That's true for a typical Intel documentation compliant use like this one:

        mov     al,6
        add     al,9    ;al=15=0Fh
        aaa             ;al=21=15h  =>  it's in decimal!

But what happens if you write something like:

        mov     ax,00F6h
        add     al,9    ;ax=255=00FFh
        aaa             ;ax=507=0205h

If you stick to Intel documentation, we should have ax=0105h at the end, not 0205h! Well, the Intel pseudo-code is wrong in that case. I think it was the way the 8086 worked, but newer processors use this pseudo-code:

        IF ((AL AND 0FH) > 9) OR (AF = 1)
        THEN
                AX = (AX + 0106h);
                AF = 1;
                CF = 1;
        ELSE
                AF = 0;
                CF = 0;
        FI;
        AL = AL AND 0FH;

For AAS, it's exactly the same problem, you just have to replace the '+' with a '-'.

What about the flags? AF and CF are set as described above. OF stays unchanged, SF is cleared.

ZF is set if AX=0, PF is set depending on AL (pretty logical, isn't it?).

AAD - AAM

These are the most interesting instructions and are described by Intel as 'ASCII adjust AX before division/after multiplication'.

They take 'x' as a byte argument. It used to be undocumented, but it's now in Intel's documentation.

AAD : AL = AH*x + AL AH = 0
AAM : AH = AL/x AL = AL mod x

What Intel doesn't say is that all flags (SF, ZF, PF, OF, CF, AF) are set depending on the result, exactly as with any other arithmetic instruction.

For AAM, the flags reflect the result of the 'mod' operation (AL mod x). It means that SF, ZF and PF are set according to AL, and that OF, CF and AF are cleared.

For AAD, the flags reflect the result of the '+' operation (AH*x)+AL.

So, when using 8 bit values, AAD and AAM are good alternatives to MUL and DIV: shorter code and opportunity to use flags.

Used with parameters like 0 or 1, they are often useful short ways to do ADD, CMP and MOV combinations:

           +----+-------+-----------------------+
           | AH |  AL   | Equivalence           |
   +-------+----+-------+-----------------------+
   | AAD 0 | 0  |  AL   | mov ah,0 ; cmp al,0   |
   | AAM 0 |    |       | Division by zero      |
   | AAD 1 | 0  | AL+AH | add al,ah ; mov ah,0  |
   | AAM 1 | AL |   0   | mov ah,al ; and al,0  |
   +-------+----+-------+-----------------------+

Simple Examples

So, what can we do with those AAx instructions? Here are some simple examples.

1. Test the value in AL

To test whether AL is 0 or 1, one single-byte instruction is enough:

        aaa
        jz      @zero

2. How to test bounds

If you want to test AL against two values, you would do something like:

        cmp     al,MIN
        jz      @bound_reached
        cmp     al,MAX
        jz      @bound_reached

Using AAM, it's shorter :

        sub     al,MIN
        aam     MAX-MIN
        jz      @bound_reached

3. Discrete homotethie AL=AL+k[AL/x]

The combination of AAM and AAD is perfect in this case: after those two instructions, AL is increased by K times the integer part of (AL/x).

        aam     X
        aad     X+K

It's often a nice way to replace a CMP/Jcc/ADD instruction block.

For example, if you want to change the case of an ASCII character in [0-9][A-Z][a-z], you could write:

        aam     40h
        aad     60h     ;to lower case

        aam     60h
        aad     40h     ;to upper case

Complex Example

Well, if you are still not convinced by the AAx instructions, here is an example that might be very useful.

Imagine we know AL is one of those ten (randomly chosen) values: 01h, 25h, 39h, B3h, 78h, C4h, 6Bh, A6h, 12h, and F0h. We want to test in which set it is: A={25h, 6Bh, 78h, B3h, C4h} or B={01h, 12h, 39h, A6h, F0h}?

Of course, we could compare AL with each value in A:

           cmp     al,25h
           jz      @set_A
           cmp     al,6Bh
           jz      @set_A
           cmp     al,78h
           jz      @set_A
           cmp     al,0B3h
           jz      @set_A
           cmp     al,0C4h
           jz      @set_A
   @set_B: ...
           ...
   @set_A: ...
           ...

But we can also do this using AAD and AAM. We also need a flag set by the AAD instruction with a probability of about 1/2: both SF and PF are suitable.

So, we just need a dummy little C program to try all 256*256*4 possibilities and find one solution of the following form:

        aam     X
        aad     Y
        jcc     @set_A

with jcc being jp, jnp, js or jns.

This program looks for solutions using the parity flag or the sign flag...

        #include <stdio.h>
        #define byte unsigned int
        #define NR 10

        byte al,ah,x,y,k,found;
        //Values array
        byte set[NR]={0x01,0x25,0x39,0xB3,0x78,0xC4,0x6B,0xA6,0x12,0xF0};
        //Results array : 0 for set_A, 1 for set_B
        byte res[NR]={1,   0,   1,   0,   0,   0,   0,   1,   1,   1};
        //P flag array
        byte p[NR]  ={0,   0,   0,   0,   0,   0,   0,   0,   0,   0};
        //S flag array
        byte s[NR]  ={0,   0,   0,   0,   0,   0,   0,   0,   0,   0};

        main () {
          for (x=1; x<256; x++) {
            for (y=0; y<256; y++) {
              for (k=0; k<NR; k++) {
                p[k]=0;
                al = set[k] % x;
                ah = set[k] / x;
                al = (al+ah*y)%256;
                ah = 0;
                if (al<0x80) s[k]=0; else s[k]=1;
                while (al) { p[k]+=al%2; al>>=1; }
                p[k]%=2;
              }
              found=1;
              for (k=1; k<NR; k++) {
                if ((p[k]^res[k])!=(p[0]^res[0])) found=0;
              }
              if (found) {
                printf ("X=%d Y=%d ",x,y);
                if (p[0]^res[0]) printf ("using JNP\n");
                else printf ("using JP\n");
              }
              found=1;
              for (k=1; k<NR; k++) {
                if ((s[k]^res[k])!=(s[0]^res[0])) found=0;
              }
              if (found) {
                printf ("X=%d Y=%d ",x,y);
                if (s[0]^res[0]) printf ("using JNS\n");
                else printf ("using JS\n");
              }
            }
          }
        }

Here we get 37 solutions; this one for example:

        aam     0Dh
        aad     11h
        jp      @set_A

You want to verify ? Here we go:

        AH:AL   00:01 00:25 00:39 00:B3 00:78 00:C4 00:6B 00:A6 00:12 00:F0
        aam     00:01 02:0B 04:05 0D:0A 09:03 0F:01 08:03 0C:0A 01:05 12:06
        aad     00:01 00:2D 00:49 00:E7 00:9C 00:00 00:8B 00:D6 00:16 00:38
        PF      1     0     1     0     0     0     0     1     1     1

The weakest point is that it doesn't help making the code clear and easy to understand... or is this a nice feature?

Conclusion

I hope it's now obvious that AAx instructions are powerful. Of course, AAM and AAD are slow, and if you need speed, in most cases you have to forget it. But they are actually very short macro instructions: 2 bytes for a MUL-ADD, that's fantastic!

Credits: http://www.x86.org/secrets/opcodes.htm

For any comments, suggestions or questions, contact me.

st0ne . almeras@via.ecp.fr