Faster Putbits

Dario Phong / PhyMosys

Almost three months have passed since the first version of this article, the time goes by quickly... but this is not the point of this article. Recently, I've been implementing a lot of things, and amongst these is a better putbits. It started when I was talking to Michael Schindler and he told me: "Why do you still loop over the bits? You should write them at once." It was then that I started to think about this, did some pseudo code, implemented it, and it worked perfectly as well as faster. I'll not give you the code straight away - you should be able to write it yourself. However, if you achieve a good implementation, email me, and let the ppl have a look at it. The idea is that instead of writing bit by bit, we'll perform an "or" and in this way put all the bits. First a bit order should be defined:

|0|1|2|3|4|5|6|8|

Where 0 is the first bit that a getbits must read. If we have a code like 01b it will be inserted that way: 01b, and when properly shifted:

01xxxxxx

Once you call it, there are different cases:

1. Needed_byte == bits_to_put
2. Needed_byte > bits_to_put
3. Needed_byte < bits_to_put

OK? Let's see then:

1. In the first case very everything is perfect, you just "or" the output with the input, and output the byte. Let's say we have 1101b, and we have to put 0011b, first you shift to the left (input_length times): 11010000b, and then "or" it, yielding 11010011b. This byte is ready for output.

2. The second case isn't a real problem. We just have to do the same than the first case, but we don't output the byte. We had 11b, and we have to put 101b. Shift to the left: 11000b, and "or": 11101b.

3. The third case IS the problem because we can't just put the low part of it, and then write in another. Instead, we have to put the HIGH part. (Remember, from right to left.) So if we have the code 111111b, and we have to write 0100b, it can't be 11111100b because the decoder will read first the 00 bits of the code, it must be 11111101b, and then in the next byte 000b.

A pseudo-code for this putbits will look like that:

 Init:

       Empty_bits=8
       Outputbyte=0

Then the putbits:

 putbits:

      empty_bits==bits_to_put?
      Yes
           shl Outputbyte bits_to_put (times)
           or Outputbyte inputcode
           output byte
           Empty_bits=8
           Outputbyte=0
           exit
      No
           Continue
      empty_bits>bits_to_put?
      Yes
           shl Outputbyte bits_to_put (times)
           or Outputbyte inputcode
           Empty_bits-=bits_to_put
           exit
      No
           Continue (else we have the 'worst' case )
      temp=inputcode   (first write the high part)
      shr temp (bits_to_put-empty_bits)
      shl Outputbyte empty_bits   (times)
      or Outputbyte temp
      Output byte   (now the low part)
      Outputbyte = inputcode  (high part will be finally shifted)
      empty_bits = 8-(bits_to_put-empty_bits)  (the bits to write)
      exit

If we were short of space, we could call putbits again instead of writing at the end. But in terms of performance, it's not a good idea. It would be better to compute (bits_to_put-empty_bits) just once to speed things up. Of course you also need an end_putbits, which should flush the buffer of bits and bytes, simply write the remaining bits. Note that any implementation may be f***ed up, if the bits which you don't have to write are 1. If we have to write the value 11b with 2 bits, and instead of 11b we pass it 1111b, it will mess up, because we do the "or"s with previous ands. This is not a real problem, though, unless we pass wrong bits.

Test of faster putbits

And here it is a test, so you can see that it works:

      empty_bits=3
      Outputbyte=11111b
      bits_to_put=5
      inputcode=01011b  ;code to write

 ------ We have to take the last case ----

      temp=inputcode
      temp=01011b

      shr temp (bits_to_put-empty_bits)
      shr 01011b (5-3=2)  =>  010b

      shl Outputbyte (bits_to_put-empty_bits)(times)
      shl 11111b 3  =>  11111000b

      or Outputbyte temp
      or 11111000b 010b  =>  11111010b

      Output byte.
      put it in a buffer, increment pointer...

      Outputbyte = inputcode
      Outputbyte = 01011b    ;the 010 will be finally shifted

      empty_bits = 8-(bits_to_put-empty_bits)
      empty_bits = (8-2=6)

      exit.

I just showed the last case, but the other two are easier than this. If you do a good implementation, I'd be glad to see it.

Getbits for faster putbits

The way the getbits should get the value is the following: (4 bits value) 01234567 ;encoded byte 01234567 ;output value The body of the loop of the pseudo code will look like this:

       shl inputbyte 1 (to the carry)
       shl outputbyte 1
       or outputbyte carry (using adc)
       repeat till we have all the bits

This loops over the bits, however a decoder is fast, so we don't really need to make a fast getbits. Of course if you do a fast getbits, send it to me, but remember to do it without looping over the bits.

Closing words

Hope you've understood this article. You can also find it in HTML format at my h-page where you'll find my new articles and new versions of the old articles, too. If you need to contact me, try dario_phong@mixmail.com. See you in the next article.

- Dario Phong, Barcelona 16/5/1999