Image Compression: BTC (Block Truncation Code)

1. Introduction

This doc explains the BTC scheme for compressing image data. Recently I was speaking with Biki and Aslan in #coders channel. We were talking about digital image compression. Biki and Aslan (as other coders) told me they were looking for docs explaining algorithms to compress image data... and they found NOTHING... I'm a lover of image compression algorithms and I know a lot of them... so I promised these coders to help with these algos. I think this DOC will be helpful for coders... I hope :)

2. BTC

First of all, BTC is a (new?) image compression scheme that was proposed by Delp and Mitchell, whose perfomance is comparable to the DTC, but which has a much simpler hardware implmentation. In Block Truncation Code, an image is, firstly, segmented into NxN blocks of pixels. Well, in the most cases the size is 4x4, but we can choose another size, like 8x8... Sencondly, a two level output is choosen for every block and bitmap is also encoded using 1 bit for every pixel. The main idea is that the two levels are selected independently for every NxN block. So, encoding the block is a simple binarization process because every pixel of the block is truncated to 1 or 0 binary representation choosing the reconstruction level(the two level output).

Example: encoding 16 bytes (4x4 pixels block) of an image, we should compute an 'statistics' of this block (the two level output) and then, we should binarize this block with our two level output assigning a 0 or 1 binary... the results (if we're working with 256 grays levels) compressed:

2 bytes indicating the two levels output
2 bytes indicating the encoded bitmap (2bytes = 8+8bits = 16bits)

Decoding is very simple because we only place the appropiate reconstruction level for every bit indicating 0 or 1.

Consider the following 4x4 block of pixels:

     | 79 80 75 77 |
 X = | 77 79 79 79 |      <--- This is our original image
     | 72 72 79 80 |
     | 77 79 79 87 |

Think that our "quantizer" computes the mean of the block and the two levels output ('a' and 'b'). The mean is 78 (in the block). So we can choose the two levels output by pure inspection (looking at the original image). To be realistic, the two levels are also computed using an algorithm. Look the next block and tell me if it is representative... Note that 1=a and 0=b.

     | 1 1 0 0 |
 Y = | 0 1 1 1 |          <--- This is our 'encoded' image
     | 0 0 1 1 |
     | 0 1 1 1 |

... where 1 is indicating that the pixel is greater than the mean and 0 indicates that it's below the mean.

Finally, take a look the reconstructed block:

     | 80 80 73 73 |
 X'= | 73 80 80 80 |      <--- This is our 'decoded' image
     | 73 73 80 80 |
     | 73 80 80 80 |

These are the statistics of the block (this was really computed):

 Mean       = 78
 Variance   = 3.351772
 Q          = 10
 Level 1(a) = 73
 Level 2(b) = 80

As I said before, we have 256 possibles values (grays), 256=2^8... this means we have 8 bits to represent every pixel... this tells us that the total bit rate is: (8+8+16)/16 = 2.0 bits / pixel ==> 2.00 bpp

And it also tells us that a 64k bitmap can be encoded to 16k only... yup :) ... but you should note that this basic BTC will introduce what I call artifacts (something similar to noise, but different), but don't worry because the high frequencies of the image (SHARP) will be encoded with high fidelity. Other algorithms I know typically blur sharp edges. Well, it's time to speak about Quantizer Design... let's go!

2.1. Quantizer Design

In our example we made the thresholding process (to choose 2 levels output) by pure inspection. The quantizer we want to made MUST preserve the 1st and 2nd moments of a particular block minimizing the mean-squared reconstruction error. In fact, there are A LOT of ways to make a quantizer.

Most BTC algos I know use 'moment preserving' quantizers. This means these quantizers preserve a number of moments in the block (i.e: the mean and the variance, desviation...). These quantizers are designed for finding the most important parameters in the image, the most important visual information... ya know: the MEAN. There is no problem to use the MEDIAN instead of the MEAN... this would be another quantizer that, probably, will work better than the algo. that uses the MEAN.

Well, at this point a basic MATH will be used. I hope you dont worry for this... in fact, REAL PROGRAMMERS AREN'T AFRAID OF MATH. So here are the equations to preserve the MEAN and VARIANCE of a particular nxn block of a image:

 m = n^2
 X1, X2, X3, ..., Xm       are the pixel values

        -          m
        X  = (1/m) SUM Xi
                   i=1

        --         m
        Xý = (1/m) SUM Xi^2
                   i=1

            -      /-\ ^2
       å^2 = X^2  - ( X )
             |       \ /    <----- This is the MEAN, and THEN SQUARE
             |
             À------------------- First SQUARE, then MEAN

Now, i'll define the threshold value (Xt) to binarize later:

                              |  a, if Xi < Xth
                        Xi =  |
                              |  b, if Xi >= Xth

                        where i= 1, 2, ..., m

To preserve the firsts moments, where q is the number of Xi's greater than or equal (>=) to the threshold (the mean)... in other words:

                         -
                        mX  =  (m-q)a + qb

                         --
                        mX^2 =  (m-q)a^2 + qb^2

... and solving this system for 'a' and 'b' we have: (note that sqrt = square root)

                               -
                          a  = X - SUM sqrt(q/(m-q))

                               -
                          b  = X + SUM sqrt((m-q)/q)

Now, we must send the nxn bitmap saying which pixels to reconstruct with the 'a' outlevel and which with the 'b' outlevel and also we have to include some info saying the two levels output: 'a' and 'b'.

At this point, we send 'a' and 'b' directly using 8 bits for everyone (if we're using 256 grays levels). As I said before, you'll obtain a 4:1 factor compression (2.00 bpp) because 8 bits are used for 'a' and 'b' and also 16 bits for the bitmap = 8+8+16 = 32 bits = 4 bytes.

A 64K image will be compressed (ALWAYS) to 16K only.

Ok, this is ALL you need to make a simple BTC encoder... it will work well, but... continue reading and you'll learn a lot MUCH MORE about BTC fascinating scheme for digital compression image. :)

2.2. Designing A Fast Quantizer

I think that currently you have noticed that squaring and square root operations are required to implement the quantizer... and I think that you know that these kind of operations are slow (...and you would use the FPU... and this is NOT what one desires).

An approach that avoids these operations is what is called as AMBTC, which stands for Absolute Moment Block Truncation Code. The quantizer is designed to preserve ABSOULUTE moments. I'll give you the equs:

                        m = nxn = n^2

                        -          m
                        X =  (1/m) SUM  Xi
                                   i=1

                                  m          -
                        à = (1/m) SUM  |Xi - X |
                        

                            -
                        a = X - (mà)/2(m-q)


                            -
                        b = X + (mà)/2q

Remember that q is the number of Xi's greater than or equal (>=) to the threshold (the mean).

3. Increasing The Performance Of The BTC Algo.

We'll see some non standard BTC scheme variations... these variations, of course, will increase the perfomance of our encoder (in bpp), but also will increases the costs and the complexity of our algorithm, naturally. A very simple technique to reduce the bit rate is to use fewer than 8 bits for the 2 level output (or the mean and the variance)... for example using only 6 bits for each. This will result lower quality in the reconstructed image.

3.1. Bitmap Omission

This is a very simple way to reduce bit rate... this is based on the fact that in some images certain regions (blocks) exist, where the variance is very little (of no meaning for our eyes)... these regions are zones of the images where is not a considerable color change. In other words, in some blocks of the image, 'a' and 'b' are very approximated (variance is little) and, for this reason, there is NO need to send 'a' and 'b' together, we should only send 'a' or 'b' with the block bitmap encoded. These techniques will work very well with images with large uniform zones.

3.2. VQ For The Bitmap

VQ stands for Vector Quantization, and this is another digital image compression scheme. I assume you know this scheme, if you dont know email me, maybe i'll write another DOC explaining this scheme.

This method will consider the bitmap as a m-dimensional vector. I remind you that m=NxN. For example for 4x4 blocks, we have a 16-dimensional vector. It will contain 16 values (0 or 1): b0, b1, b2, b3, ..., b15. Well, this technique is based in considering that, in fact, when we're encoding the bitmap (the 16 bit vector encoded) we should note that in the whole image, these bitmaps will be repeated (or very approximated). This means you CAN GENERATE a CODEBOOK with a training set of bitmap vectors. Using this method you'll achieve 1.5 bpp, which I think is a good bit rate. I think this method (combined with my series calculus algorithm) is better than the others schemes i'll comment here.

J&I proposed me to send the 'a' and 'b' of each block of the whole image first and then send the bitmaps of the whole image because doing it we would make a simple Huffman encoding for the bitmaps and the bit rate could be reduced a lot. Think that bitmaps could be repeated A LOT in the encoding process:

 52,64, 23,54, 52,64,  23,67,  23,54.................
               ^               ^
               repeated        repeated

Now, I'll propose a simple BTC combined with VQ to make the text more easy to understand it.

Example: we're compressing a 64K image. We want to use BTC combined with VQ, so we start encoding the bitmap and also finding 'a' 'b' and when we have encoded the bitmap, then we'll compare this encoded bitmap with all of our bitmaps that are in our particular codebook (containing a 256 m-vectors, 8 bits). Finally, we'll select the BEST aproximation (usually this will be a 8 bits index indicating the bitmap in codebook). We'll send 'a' 'b' and the 8 bit index.

Got the idea? Very easy, I think.

3.3. Joint Quantization

I'll NOT explain extensively this method... but, take a look: this is based on the fact that exact value of the mean isn't important in a high variance area (when variance varies a lot), but it's important when variance does NOT vary. In other words, when variance is LITTLE we have a couple of possible values for the mean, but when variance is GREATER we have only a few values for the mean. Using this technique, the mean and the variance is jointly quantized using only 10 bits for both. Got the idea?

I dislike this method.... using it you'll obtain a bit rate of 1.63 bpp.

4. ABS-BTC

ABS-BTC stands for Adaptive Block Size - Block Truncation Code and, how the name says, this scheme consists on considering the block size when we're encoding the bitmap... coz fixing the block size to 4x4, for example, will work well... but it's also certain that some specific images could compress much more than using a fixed nxn for the entire bitmap. As I said before, the High Freq. zones in the image (sharp) will require more info encoding (precission) and the Low freq. zones in the images can be encoded which lower precission.

This means we can vary the size of the block depending on what we're encoding.

These are the steps to encode using ABS-BTC:

Segment the image in 4x4 blocks.

Compute 'a' and 'b'.

Now, every block will be classified according with the 'a' and 'b' levels output. We'll use 3 mode operations:

Mode 1

Th1, where Th1 is a threshold if

                        |a-b| <= Th1

We'll skip the bitmap and we send only the mean of the 4x4 block. Mode used in Low Freq. zones in the image.

Mode 2

Th2, Th3, where Th2 and Th3 are both thresholds but different if:

                        Th1 < |a-b| < Th2

... AND...

                   ---                           ---
                   |                               |
                   |    SUM |Xi-a| +   SUM|Xi-b|   |
                   |         -                -    |
                   |  zz Xi< X       zz Xi >= X    |  <= Th3
                   |                               |
                   ---                           ---

where zz = for all the possible values.

Then we'll send 'a', 'b' and the bitmap.

Mode 3

We'll select this mode when Mode 1 and Mode 3 are NOT satisfying. We'll segment the 4x4 block to 2x2 block, then we'll send 'a', 'b' and the bitmap for the two segmented blocks. (Used in the High Freq. zones in the image, we need precission.)

If your encoded bitmap is:

1101000100101110 = 16 bits = 2 bytes

and you selected mode 1 then:

0101000100101110 = 16 bits = 2 bytes
(01 = 1 - field)

... and NO considerable artifacts were introduced (we changed only a bit).

5. Some Basic Code?

I'll write some basic code for help you to get the basic idea... so don't expect so much. I'll write the standard BTC. I'm assuming we have read a file, and have stored it in a bitmap pointer with reserved mem. I also assuming our bitmap is 320x200 (64k). I put C language here coz many people know it, but I prefer Assembler, of coz ;)

 unsigned char *bitmap;
 unsigned int x,y,mean,a,b,q;
 float temp,a,b,sig;

 btc()
 {
     mean=0;
     es1=0;
     es2=0;

     for(y=0;y<4;y++){
         for(x=0;x<4;x++){
             mean+=bitmap[(y*320)+x];
             es1=es1+bitmap[(y*320)+x];
             es2=es2+pow(bitmap[(y*320)+x],2);
         }
     }

     mean=mean/16;
     es1=es1/16;
     es2=es2/16;
     sig=sqrt(es2-pow(es1,2));

     /* Compute q */
     q=0;
     for(y=0;y<4;y++){
         for(x=0;x<4;x++){
         if(bitmap[(y*320)+x]>=es1){ q++; }
         }
     }

     /* compute a and b output levels */
     if(q!=0){
         if(q!=16){ temp=sqrt(q/(float)(16-q)); }
         else { temp=4; }
         temp=temp*sig;
         a=(int)(es1-temp);
         temp=sqrt((float)(16-q)/q);
         temp=temp*sig;
         b=(int)(es1+temp);
     }

         /* Now, you have the 'a' and the 'b' output levels.
            So I leave for you to send the 'a' and the 'b' and also
            the encoded bitmap (2 bytes) to another buffer (or the
            same buffer, bitmap, if you want).
         */
 }

6. Last Nodes

Well, at this moment I'm very tired...
If you have a doubt or want to ask something, then email me:

email: dave@bglinks.com or dave_es@yahoo.com

If more than 10 coders email me answering for VQ or other known / unknown digital image compression schemes, I promise to release more DOCS... ...but, please, dont ask me about my series calculus algo. for compressing images coz for this moments it's a bit secret (and, of course, currently I'm working in it). I'm a regular on #coders, so maybe you'll find me in that channel for talking about it.

7. Greets

Special Greetings go to: Gandalf, Shock, Gyr, Kalms, Ravian, Altair, Yoghurt, Legend, Biki, Aslan, DjBlaze, Maza, Funktion, HarriH, ...

... and, of course, greets for all regular people in #coders and #trax channels.

P.S.: Hey J&I! When We start a REAL project (a game, for example :). X-Static: I hope you'll make some cool tunes and graphics for next Euskal Party. ;)

- DAVE/PhyMoSys