Line Drawing in Mode-X

Introduction

This article will hopefully explain how to speed up your line-drawing code in the mysterious "MODE-X". Perhaps the mystery is why anyone is still using this old mode... after all, everyone has a super fast SVGA cards with VESA 2 support, right? Well, no.

Perhaps you don't or can't use a VESA driver for some reason, maybe you are coding a nice, small intro or demo, or you are just learning about 80x86 Assembly language.

A few years ago after finding out about Mode-X from a book (back then I didn't have access to the Net) I found some instruction timings for the various chips, and to my horror, on a 80486 an OUT instruction could be 10 times (or more) slower than a single MOV instruction - this was nearly twice as long as the rest of the line drawing code. =(

Clearly there was a really BIG need to minimize the OUT instructions to zero, or as close to this as possible.

Warning

This is NOT a tutorial on either the Bresenham scan-converting algorithm for lines, or how the MODE-X works (and how to set it up). There are plenty of other documents and code libraries to help with both of these, so please look elsewhere before reading this article, it will then make some sense.

This just describes an idea I had to speed up the line-drawing and help to remove as many OUT instructions as possible.

If you don't know what "majorinc", "minorinc" or Mode-X means then please do not read this article because you probably won't understand it.

What's the problem?

Well, drawing a line (or "scan-converting") is a kind of interpolation. You start with two sets of co-ordinates (x1,y1) and (x2,y2) which are the two end points of the line. Plotting these points is the easy bit, it's the plotting of all the pixels between them which is the difficult part. What we need to do is to interpolate the X,Y co-ordinates and plot pixels.

In Mode-X horizontal pixels are not stored in a linear way, they are split up in different bitplanes. To access a bitplane the MAP-MASK register must be used (port 3C4 hex, index 2) and this requires an OUT instruction. So to move left or right along the X axis means a change of bitplane, and hence an OUT instruction.

Moving vertically (up or down on the Y axis) doesn't require a change of bitplane, so the MAP-MASK register doesn't need to be updated.

Because the (x1,y1) and (x2,y2) co-ordinates of our line can have any value (within the screen limits, of course the gradient of the line which we need to plot can have any value. As we plot using a Bresenham like method we move along either the X or the Y axis one pixel at a time based upon the descision variable 'd'. In affect we don't know when we will move along the X axis until we come to plot the pixel there, and so we have no real way of knowing when a bitplane step (and an OUT instruction) will occur.

A typical Mode-X code fragment

This code assumes Real/V86-mode is being used, X is the major axis and drawing takes place from left-to-right and from top-to-bottom.

All the various pixel calculations (screen pixel address and bitplane mask) have already been done, so too has all the Bresenham stuff (inc1 and inc2). The only thing remaining to do is to prepare the Sequence-Controller to make writing values to the Map-Mask register easier.

E.g.

              mov     dx, 3C4h        ; Sequence Controller port
              mov     al, 2           ; index 02 = Map-Mask register
              out     dx, al          ; make it ready, for OUT 3C5h,x

So now whenever we wish to write a value to the Map-Mask register, all we need to do it to output the byte to port 3C5 hex.

E.g.

              AL = byte to output (our bitplane/pixel mask)

              mov     dx, 3C5h
              out     dx, al

Here is a snippet of code from a normal Mode-X line routine. It's not really important that you understand it fully, it's the OUT instruction which is the important bit.

              [DS:DI] --> screen pixel address
              AH = pixel color
              BP = decision variable
              AL = bitplane mask for MAP-MASK register
              DX = 3C5 hex (Sequence controller, Map-Mask ready)

 xloop:       out     dx, al          ; select bitplane (MAP-MASK)
              mov     [di], ah        ; write the pixel
              sub     bp, inc1        ; d - inc1
              jg      short notyet    ; major step only?
              add     bp, inc2        ; d + inc2
              add     di, 80          ; step down 1 line (Y+1)
  notyet:
              rol     al, 1           ; rotate the map-mask
              adc     di, 0           ; advance DI+1 every 4th pixel

As you can see we are using an OUT instruction for EVERY iteration of the line drawing loop, so if the line was 300 pixels long we would need 300 OUT instructions.

Improvements

As we don't know when we will advance along the X-axis until we actually come to draw each pixel, this makes it tricky to optimize the number of OUT instructions.

The idea I had was to divide the line drawing into two stages. More memory access is needed, but you only need 4 OUT instructions no matter how long the line actually is.

The Bresenham-like algorithm tells us when we need to update the X axis coordinate (cross into the next bitplane), it also tells us the address of each pixel, and these two things are all we need. The two stage process works like this:

  1) Store the address of each pixel into one of four stacks.
     (The stack is determined by the "X MOD 4", i.e. the bitplane.)

  2) For each of the four stacks select the bitplane, then plot a pixel
     at every address stored in the stack.

Basically each stack is a list of pixel addresses for a particular bitplane.

Now after we have these four stacks we can use a single OUT instruction for each and then plot pixels at every address in each stack.

The first stage could look something like this:

This would be for a line where X length > Y length and we are drawing the line from top to bottom and left to right.

NOTE:
These stacks expand downwards in memory (just like SS:SP). this method was chosen to keep the format of all four stacks identical. Of course you could use expand up for the first three stacks (BP, DI, SI) but NOT for SP!!

              [ds:bp] --> stack 0     (bitplane 0)
              [ds:di] --> stack 1     (bitplane 1)
              [ds:si] --> stack 2     (bitplane 2)
              [ds:sp] --> stack 3     (bitplane 3)

              [??:bx] --> pixel address
              dx = decision variable
              cx = loop counter
 plane0:
              sub     bp, 2
              mov     [bp], bx        ; push BX onto stack 0
              dec     cx              ; loop count - 1
              jz      done            ; done?
              sub     dx, minorinc
              jg      plane1          ; no change in Y axis?
              add     dx, majorinc
              add     bx, 80          ; Y+1 (down 1 line)
 plane1:
              sub     di, 2
              mov     [di], bx        ; push BX onto stack 1
              dec     cx              ; loop count - 1
              jz      done            ; done ?
              sub     dx, minorinc
              jg      plane2          ; no change in Y axis?
              add     dx, majorinc
              add     bx, 80          ; Y+1 (down 1 line)
 plane2:
              sub     si, 2
              mov     [si], bx        ; push BX onto stack 2
              dec     cx              ; loop count - 1
              jz      done            ; done?
              sub     dx, minorinc
              jg      plane3          ; no change in Y axis?
              add     dx, majorinc
              add     bx, 80          ; Y+1 (down 1 line)
 plane3:
              push    bx              ; push BX onto stack 3
              dec     cx              ; loop count - 1
              jz      done            ; done?
              sub     dx, minorinc
              jg      plane4          ; no change in Y axis?
              add     dx, majorinc
              add     bx, 80          ; Y+1 (down 1 line)
 plane4:
              inc     bx              ; X+1 (advance by 1 after 4th pixel)
              jmp     plane0
 done:

Now these four stacks BP, DI, SI, SP have the pixel addresses of the entire line. Each stack has a variable number of addresses stored on them. As each stack contains only the pixel addresses for that particular bitplane we can use perform just 1 OUT instruction and then plot pixel using each stack.

              AL = pixel color to plot
              [DS:CX] --> stack 0
              [ES:0000] --> Video memory

              sub     cx, bp         ; <-- calculate how many pixel addresses
              shr     cx, 1          ;     have been stored on stack 0.
              jz      empty0         ; none?

              mov     dx, 03C4h      ; <-- sequence controller port
              mov     ax, 0102h
              out     dx, ax         ; <-- write to bitplane 0 only

 plot0:
              mov     bx, [bp]
              add     bp, 2          ; pop pixel addr from stack 0
              mov     ES:[bx], al    ; plot the pixel.
              loop    plot0
 empty0:

The above will plot ALL the pixels for bitplane 0 using stack 0. This should be repeated for the other three stacks, SI, DI and finally SP.

The code to perform the final stack, SP, looks like this:

              AL = pixel color to plot
              [DS:CX] --> stack 3
              [ES:0000] --> Video memory

              sub     cx, sp         ; <-- calculate how many pixel addresses
              shr     cx, 1          ;     have been stored on stack 3
              jz      empty3         ; none?

              mov     dx, 03C4h      ; <-- sequence controller port
              mov     ax, 0802h
              out     dx, ax         ; <-- write to bitplane 3 only

 plot3:
              pop     bx             ; <-- we can use POP for SP
              mov     ES:[bx], al    ; plot the pixel.
              loop    plot3
 empty3:

Closing Words

I hope you understood the above. The code is probably a little difficult but the basic idea is very simple, group ALL the pixels for a particular bitplane into a list (a stack). This means we only ever have to use a maximum of 4 OUT instructions even for a 1024 pixel long line rather than upto 1024 (for the worst case).

Of course you would not use "LOOP", but "DEC CX + JNZ" instead. As we already know how many pixel-addresses have been stored in a particular stack we could unroll the plotting loop and plot four or eight pixels per loop. This would help speed up things a little more.

Also you should try to use registers to hold the "majorinc" and "minorinc" values instead of memory locations.

I am sure this method of grouping bitplane plots together could be employed in other graphical routines.

Enjoy.

Regards

TAD #:o)