Compression: Current States of Art, and Future

Written by Dario Phong

Do you like compression? I do. Maybe you've read some of my articles, otherwise do so (as long as you want to learn compression). Perhaps you have coded something about compression, and I'm sure that everyday you use a program which directly or indirectly uses compression. Maybe an archiver (like PKZIP, ARJ or RAR) or an image editor which surely supports a lot of file formats (and most of them use compression). MPEG and JPEG are standards today, but what is state-of-the-art? And what will happen in the future?

One can try to predict of course, if you know something about compression (or even if you don't) you'll know that no prediction has 100% probability, so maybe I'm wrong about something. But here go my predictions, and some facts from the compression world.

Archivers, everyone has used one of them. Most (I could say that almost 90%) of them use one of the lz variants, usually in the form of lz77 or lzw. It all started with ARC which used lz77 (as far as I remember), pkzip also uses lz77, as does ARJ. This is what is used, mainly because they have high speeds, and when finely tuned, a good compression ratio. However, they are far from state of the art. PKZIP has 2.6 bpb (a 8k file will compress down to 2600 bytes) while running at high speeds, and current state of the art is ppm, with 1.9 bpb (ppmz, presented by Charles Bloom), but it's considerably slower. For compressing around 3 MByte, WinZIP takes 11 seconds, while ppm may take up to 90 seconds. That's why it's not widely used in archivers, as the end user wants speed.

I thought the lzp family would start to be used, however that's not the case. Instead programmers have chosen bwt (like bzip) and some variants (like szip). They are fast and achieve around 2.2 bpb, while taking almost as much time as WinZIP does. Unless lzp is further improved (I'm working on it) the archivers from the near future will use bwt. However, when CPUs are faster and have more memory (in a future also near) some archivers will start using ppm, I'm sure of it.

Compression of images: Most of the lossless standards to date were rather bad from the compression point of view. RLE, as used in PCX and BMP, is very inefficient for most of the images. GIF and TIFF use lzw, but how can they use patented algorithms? (The story here is a little bit long, so I'll tell it in another time.) Newer formats like PNG seem to be better. PNG uses prediction based on the neighbours of the pixels. One of the state of the art lossless image compressions is CALIC.

Of course on the other hand we have lossy algorithms. Even this field isn't very exploited. Once ISO did the JPEG standard, which is a dct based process (discrete cosine transform), based on FFT (fast fourier transformation), which transforms a function (a pixels block) from a time domain to frequency domain and it makes it easier to compress. Usually one quantizes the data (divides it, to make them fit in less bits) and then entropy codes it (assign less bits to more usual bytes).

But dct isn't the best way for analysing data. Wavelets are superior in almost all of the aspects, they are state of the art and will be used in the future standard such as the JPEG-2000 standard. Also they are fast.

Of course there's something else in lossy image compression: fractal compression. Or how to try become rich with one algorithm. They patented all the forms of fractal compression, and it's rarely used. Of course the compression ratios are quite amazing (and the time needed for compressing also is.) You can compress a still image (real life photo) of 64k down to 2k with a good quality, far better than JPEG at those ratios.

Sound compression: The current standard (the one which is most used) is mp3, based on the audio properties, and how we hear them. They mainly try to discard what doesn't really matter. Another scheme based on this is the u-law. Other forms of sound compression are based on predictions, like adpcm or lcp. Unfortunately I'm not an expert in sound compression, so I can't give you more information. (Maybe Decoy will finally finish his pages about audio compression.)

This is what you can find out there. If after reading this you are interested in compression programming you should visit my homepage, where you can find more information than you need (at least now). Good packers should be developed by demo sceners and for demo sceners (like 624). For demos, perhaps an exe packer would be interesting. However, a lot of questions arises: For Windows? For DOS? I think that the best will be doing some code (for C or perhaps Assembler) and a packer, and let the coders put them in their intros and depack all the data at run time, so we can have more demos on our HDs. This is just an idea, maybe some day it will become code, however this is up to you.

Hope that you've got an overview of compression and had some fun. I think it's better to do an overview rather than giving my opinion about mp3, which nobody cares about. See you in my next article! (And I hope to read some from you.)

- Dario Phong, Barcelona 06/07/1999