Move To Front: A Transformation Algorithm
MTF is a method used for transforming data and making some bytes more common, more redundant. You'll see how this works. Why should you learn this algorithm? Because it can help you, if you want to make a compressor, or at least it will let you see compression from another point of view.
Move To Front Encoder_____
The theory is easy. You have an array of 256 byte entries, initalised, so every byte has its own entry in the table. (00h->array, 01h->array,...) Every time you get a byte, the MTF encoder outputs it as the position in this table, not as its real value. Then just after that, you put that byte in the position 0 of the array (array) but first move the others bytes to the right (array->array, array->array,...) so you always have all the possible bytes in the array.
Actually this is not a compression method, it's a transformation method. It will make the more often repeated bytes have low values, so it will be easy to compress, for a Huffman method (instead of putting the symbol you put a little code based on the probability that the symbol has, the more the probability, the fewer bits it uses).
Good things about MTF:
1- It's fast, very fast.
2- It needs little memory usage. Only 256 bytes, and the current byte. (So ONLY 257 bytes.)
3- It will make the data more redundant, so Huffman, Shannon-Fano or an artihmetic encoding will compress it better.
4- It's easy to implement. I implemented it in two hours. (In Asm!)
5- It will be one of the methods to use if you combine it with BWT.
Bad things about MTF:
1- It doesn't compress the data.
2- It will make lz parsers TOTALLY fail, so don't try to compress it once transformed, because there will not be the data that lz compressors can compress.
I think MTF is good enough to implement it. If you think so too, continue reading.
The first thing to do is to have an array of 256-bytes entries, something like this: unsigned char bytes_order;
Once you have it, you have to initialise it. The code is very simple:
//Do I really have to explain that?
Then you start with the main loop, get a byte, and search for it in the array:
//scan the whole array, no need of end condition
//till we find it
//then, break the for
Once you are here, you output the byte, and then update the array, moving all the bytes till the position of our match, and then putting the byte in the order 0:
//scan from the position of the byte to 0
//move all of them to the right
//update the order 0
This is the end of the main loop, repeated as often as the file has bytes. This is so easy that I recommend you to implement it in Asm. I will not put my Asm code here, 'coz C is enough good for examples.
Well, once you have done the encoder, the decoder is very easy, even easier than the encoder. Like in the encoder, initialize the table, then the main loop, get a byte, this is the position on the table, and get the real byte from the table. Remember that position is the byte from the compressed file.
//get the byte
Now update the table, like the encoder does:
//the position to start
There's no need of doing anything else, MTF is already implemented.
As I said, this is not an standalone scheme, this MUST be combined with other schemes to compress the data. Good candidates are: BWT, Huffman, Shannon-Fano, or arithmetic coding. If anyone knows who may be another candidate, let me know. If you want to get info on them, just follow some links from my homepage.
Well, here a very easy transformation method has been presented. If you want more articles about lossless compression (maybe in the future lossy) visit my home page at http://www.geocities.com/SiliconValley/Byte/6789/. Here you can find a lot of useful compression related links.
If you still have any doubt or want to share experiences about compression, email me at: firstname.lastname@example.org
- DARiO PhONG, Barcelona 14/02/1999