A Simple (And Very Fast) 2-D LZP Variation

TAD

Introduction

During the Hugi size-optimization compo #7 I tried quite a number of different compression ideas, and towards the end I came up with this variation on the normal LZP algorithm (which I posted to the e-groups mailing list).

Anyway, here is the algorithm again with a little more code and documentation for those people who are interested, or new to Hugi.

I will not explain the LZP algorithm here, but instead focus on my very simple and very fast algo. For newbies this article should be easy enough for them to code their own version of this, understand the basics of prediction and then progress onto the slightly more detailed LZP without too much effort.

Overview

It is nothing more than a byte predictor using a 256x256 byte table. (I didn't say it was difficult, did I?)

The normal LZP method creates a prediction based upon the previous N bytes. Normally 2 or 3 bytes are hashed together (using XOR for example) and then a table of address-pointers is indexed using the hash value. The number of matching bytes is encoded together with a single-bit prefix to denote it as a correct prediction. Usually a 4-bit count field is used to encode the number of matching bytes, this means we can encode between 1 and 16 bytes in just 5 bits.

Well, this variation uses only a single bit flag for each byte, NO matching length count field is encoded. The match is always a single byte, no more and no less. This means that with a correct prediction a saving of 7 bits per byte occurs, whereas when an incorrect prediction is made a saving of -1 bits is made (i.e. the byte is encoded using 9 bits instead of 8).

Scoreboard.

Here is a table of scores so you can compare the good and bad results between this algo. and the standard LZP with a 4-bit match count field.

        Matching Bytes           Bit saving            Bits per byte
        --------------           ----------            -------------
            0                        -1                     9
            1                        +7                     1
            2                       +14                     1
            3                       +21                     1
            4                       +28                     1
            5                       +35                     1
            6                       +42                     1
            7                       +49                     1
            8                       +56                     1
            9                       +63                     1
           10                       +70                     1
           11                       +77                     1
           12                       +84                     1
           13                       +91                     1
           14                       +98                     1
           15                      +105                     1
           16                      +112                     1

Here is the scores for a LZP with a 4-bit match-count field.

        Matching Bytes           Bit saving            Bits per byte
        --------------           ----------            -------------
            0                        -1                     9
            1                        +3                     3
            2                       +11                     2.5
            3                       +19                     1.66666..
            4                       +27                     1.25
            5                       +35                     1
            6                       +43                     0.83333..
            7                       +51                     0.71428
            8                       +59                     0.625
            9                       +67                     0.55555..
           10                       +75                     0.5
           11                       +83                     0.45454..
           12                       +91                     0.41666..
           13                       +99                     0.38461..
           14                      +107                     0.35714..
           15                      +115                     0.33333..
           16                      +123                     0.3125

It is interesting to note that it takes LZP a match of 6 bytes before it can beat the 1-bit-per-byte saving of my variation. This is quite important for difficult files in which there may be only a few, small matches in the file.

On the down side, if long runs of bytes frequently occur then the normal LZP would win.

The Simple Algo.

Here is some pseudo-code for the 2-d LZP variation. It takes an 320x200 image and using a 256x256 array attempts to predict the current byte. If successful it outputs a single '1' bit, otherwise it outputs a '0' bit followed by a raw/literal byte.

  Byte predictions(256,256)

              pos = 0

              While pos < ImageLength

 1)               If Pos >= 1 then Alpha = Image[pos-1]
                              else Alpha = 0

 2)               If Pos >= 320 then Beta = Image[pos-320]
                                else Beta = 0

 3)               Predict = predictions(Alpha,Beta)
                  CurrByte = Image[pos]
 4)               predictions(Alpha,Beta) = CurrByte

 5)               If Predict = CurrByte then Output_Bit(1)
 6)                                     else Output_Bit(0)
                                             Output_Byte(CurrByte)

                  Pos + 1

              Wend

Here are some comments about the six marked stages in the above algo.

1. Fetch the first dimension of the predictor (i.e. the byte from (x-1,y) ).
2. Fetch the second dimension (i.e. the byte from (x,y-1) ).
3. Using the two dimensions (alpha,beta) predict the current byte.
4. Update the prediction table.
5. If the prediction was correct output a '1' bit.
6. Otherwise output a '0' bit followed by the real current byte.

As you can hopefully see, a 16-bit hash value is created from two surrounding pixels (x-1,y) and (x,y-1). A 'normal' LZP algorithm would simply use the last N bytes (in this case(x-1,y) and (x-2,y) ).

Some 80386 code.

The following code assumes a 320x200 (VGA mode 13h) image is being compressed (and remember this is FLAT mode, but you could use Real/V86-mode).

               [es:edi] --> byte stream buffer (for raw/literal bytes)
               (ecx) = ImageLength

               .DATA?
 Image          db 64000 dup (?)        ; The image to be compressed
 predictions    db 65536 dup (?)        ; 256x256 prediction array
 bitstream      dd (64000/4) dup (?)    ; space for 64000 bits

               .CODE
               sub ebx, ebx             ; EBX=Pos=0
 again:
               sub edx, edx             ; DL=Alpha=0,   DH=Beta=0
               cmp ebx, 1
               jb  short less1          ; is Pos < 1 ?
               mov dl, Image[ebx-1]     ; else Alpha = Image[pos-1]
 less1:
               cmp ebx, 320
               jb  short less320        ; is Pos < 320 ?
               mov dh, Image[ebx-320]   ; else Beta = Image[pos-320]
 less320:
               mov ah, predictions[edx] ; AL=Predict=predictions(Alpha,Beta)
               mov al, Image[ebx]       ; AH=CurrByte
               mov predictions[ebx], al ; Update prediction array

               bts [bitstream], ebx     ; Output_Bit(1) - assume match
               cmp al, ah
               jz short correct         ; correct prediction?
               btr [bitstream], ebx     ; else Output_Bit(0)
               stosb                    ;      Output_byte(AL)
 correct:
               inc ebx                  ; Pos + 1
               cmp ebx, ecx
               jnz short again          ; loop while Pos < ImageLength

So What?

The algorithm is interesting for two reasons.

1. It is VERY fast and VERY small.

2. It only takes up 64Kb of memory for a 65536 element prediction table. The 'normal' LZP would require 131072 or 262144 bytes (65536 pointers).

The Bad News.

Like I said before, the compression rate isn't that great, but the speed of operation is.

As soon as I manage to get my "grubby mits" on a decent set of test files to compress I will write a short article for the next Hugi comparing this algo. against the normal LZP one.

Possibly the worst thing about this algo. is that it ALWAYS outputs a prefix bit for every byte in the input data-stream. The normal LZP does not because of use of null pointers where a particular hash-value hasn't been encountered before. In the worst possible case every byte would be encoded using 9-bits instead of 8.

This can be avoided by either using the least frequent/unused byte value in the data-stream to initialise the 256x256 prediction array (if this value is then predicted we ignore outputting a prefix bit and just output the raw/literal byte), or we could use a bitmap of 65536 flag bits to indicate whether a hash-value has been encountered before, if not we don't need the prefix bit. But of course this requires some more memory...

Improvements

Here's one fairly, useless bit of information about the LZP algo in general.

Whenever you get a mis-prediction (i.e. a 0-length match) you know that the raw/literal byte is 1 out of 255 possible values, rather than 1 out of 256 possible values. In short a raw/literal byte can NOT be the predicted byte. If it was you would encode it as a correct prediction match.

This could be useful for EOF or some other purpose. All you need to do is to encode a raw/literal byte which is identical to the predicted byte. Then in your depacker compare a raw/literal byte against the predicted byte, if you get a match then you know it's a special byte like EOF.

You can of course use it to change the LZP parameters during decompression in a similar way to the LZW algorithm (flush dictionary). For example you may wish to switch the hashing function for one which better suits a particular part of the data-stream being encoded.

Closing Words

Ah well, this algo is pretty poor compared to other schemes, but maybe someone else can find a good use for it. I think the 2-d method 'should' give better results for graphics because it doesn't rely on the previous N bytes on the current scan-line, but instead takes into account the vertical pixels above too. If an image has many vertical patterns then this should give better results than one which solely uses horizontal predictions.

But hey, I could be wrong (I usually am).

Happy packing...

Regards

TAD #:o)