When Fibonacci And Huffman Met

This article is aimed at a programmer who has implemented Huffman and wants it to be bug-free.

Fibonacci series

I'll only explain the basics of it, if you want to know how to make an efficient implementation of it, look for an article about it at my h-page (section "others"). Fibonacci series are defined in the following way:

- Fib(0)=0

- Fib(1)=1

- Fib(n)=Fib(n-2)+Fib(n-1)

So, if we calculate till fib(7), we get: 0,1,1,2,3,5,8,13

Fibonacci and Huffman

The problem comes when we have a set of probabilities which is a Fibonacci series. Look at these probabilities:

Symbol Probability a 1/19 b 2/19 c 3/19 d 5/19 e 8/19

These probabilities match Fib(6), as the 0 probability is not used. This will lead to the following huffman tree (where n means node):

Root / \ e n / \ d n / \ c n / \ b a

With the following codes:

Symbol Probability Code a 1/19 0000 b 2/19 0001 c 3/19 001 d 5/19 01 e 8/19 1

This kind of tree is called lopsided, or unbalanced. As we can see the maximum length is alphabetsize-1, where alphabetsize=5 (alphabetsize is the number of different symbols). And here is the problem, we only have five symbols, but if we had 256 symbols our maximum code could be 255! And usually we design our codecs to work with codes of a length of 16 bits or even 32 bits, but never 255 bits.

Solution

Most implementations use 16 bit codes - no need of more - so I'll deal with how to avoid the problem with 16 bit codes. The maximum code length is - obviously - 16 bit, so we'll have a problem if we have a set of probabilities till fib(17), but for getting probabilities till fib(x) we need to get y bytes, where y is the sum of all fib(i) for i=0 to i=x. With mathematical symbols:

So for getting the Fibonacci probabilities for fib(6), we'll need to get at least 0+1+1+2+3+5+8=20 bytes. And for fib(16+1) we'll need 4180. What do we have to do? We'll change the probabilities so that they don't lead to an unbalanced tree by dividing the probabilities by a factor. I chose 2 because this can be done by simply shifting right by one bit.

First check the probabilities. If all the probabilities added make 4180, change the probabilities by dividing each probability by 2. But be careful, no probability should go below 1, or you'd not assign any code to that probability. Once it's less than 4180 you can construct the tree.

My implementation

In fact, all this theory is ok, but this wasn't how I did it. As I wanted the maximum length of a code to be 16, I kept a flag. If any code went above 16, I set this flag to 1. When finished with the codes, I checked if the flag was set. If it was, I divided all the probabilities by 2. Remember to store the original probabilities apart (so they aren't changed while building the tree) and to overwrite the old probabilities with the new ones so that you don't enter an infinite loop in case the first division can't make your maximum code length 16.

Closing words

In the bonus pack of this Hugi issue there's also a little C program which prints fib(x) and also computes

Well, if your huffman code ever hanged and you didn't know why, this is probably the solution for it. If you've implemented huffman you should also do Canonical Huffman.