An Introduction to PPM Encoders



In this article I'm going to attempt to introduce you to the PPM family of encoders, used in data compression. The PPM family are very powerful, and one of them, a variant called PPMZ is state of the art at the moment, meaning that it achieves the best compression rates over all on the Calgary Corpus (which is a set of test files for data compressors). PPM has lots of different implementations called PPMA, PPMB, PPMC, PPMD...

The only thing you really need to know to understand this article is roughly how arithmetic encoding works, reading my article on terminology would give you enough information.

(PPM, Prediction by Partial Matching)

Introduction to PPM

Under an arithmetic encoding model the number of bits assigned to a symbol is -log2(P(x)) (or very close to it) where x is a particular symbol, and P(x) is the probability of that symbol.

We can see that under a non-order-0 context, the probability of a symbol occuring can rise dramatically, with respect to this context. Here is a classic example of how non-order-0 contexts increase compressibility, the probability of 'u' occuring in a section of english text may be quite low, however, given the context of 'q', the probability increases enormously, thus an arithmetic encoding technique would achieve better compression.

Take this example, if we had an order-0 encoder, one which assumes no relationship between the symbols, and we had "aaaaaaaaaa", the encoder would only gradually increase 'a''s probability, if it started at 1/256 it would work its way up to 10/266, which is still pretty puny. A way to encode the stream in much fewer bits is to define higher order contexts, if we had an order-1 context, it would start with all probabilities of the symbols under this context as 0. When we read the first two 'a's it would set the probability that 'a' follows 'a' to 1/2, and then after the next one 2/3, and then after the next one 3/4 and continue like this. Now we can encode the symbols using standard arithmetic encoding in much fewer bits. In practice we use higher context orders than 0 and 1, usually about order-3 is the maximum but we don't only have an order-3 context we also store the order-2 and order-1 contexts too.

Context Orders and Escaping

Continuing from above, PPM tries to estimate the probability of the next symbol using the highest order context first, but what if the symbol has never occured before and hence has a probability of 0. PPM escapes to a lower context order, then tries the same process and escapes again until it can encode the symbol. The context orders stored by PPM are order-n (the highest order) down to order-(-1). The order-n context can fail if the symbol has never occured before under this context, similarly with the order-(n - 1) context (if n >= 1), the order-0 context fails if the symbol has never occured before, the ordern context can never fail because it holds a predefined, probability for each symbol agreed upon by encoder and decoder.

Above I said PPM "escapes" to a lower order, meaning it emits a special arithmetic code called an escape code which tells the decoder it's trying a lower order context. This escape code has a certain probability that it will be emitted, and that is called the escape probability, and is one of the problems with PPM encoding, to estimate the escape probability.

Finishing up

Just to use an example, normally an arithmetic encoder compresses data just relying on the frequency or probability of symbols, however using contexts for arithmetic encoding results in better compression and is called PPM.

Here's an example:

"After he said it, he heard the bang"

A normal arithmetic encoder (a two-pass order-0 one), would see that there are 5 'e's in the text, and assign them a probability of 5/34, and expresses it in -log2(5/34) bits.

A PPM encoder sees the 'e's but it also sees the 'h's before them, and sees that 'h' is followed by 'e' with very high probability, 4/5, and expresses 'e' in -log2(4/5) bits. It has done better than the arithmetic encoder :)


There probably wasn't enough information here to implement PPM, and if you did so using only the above information you would get a pretty bad implementation, since I have left out topics like exclusion and escape probability estimation.

paradox / vivid