Ector's Crash Course in 32-bit color Math
Ector of Medieval
This document is a quick introduction to 32-bit color math and blending. Many of the code fragments will be unoptimized for clarity. Also, the inline assembler code is written for MSVC++.
I've noticed that a lot of new coders have trouble understanding how to manipulate colors. As 32-bit is the easiest mode, that's the one I'll cover here. For others, like 15 (why use that?) and 16, it's often just a matter of changing the bitmasks and shifts.
Anyway, let's start with some theory. You all know, I hope, that the 32-bit color format is composed of four bytes per pixel: ARGB (alpha, red, green, blue). (Due to Intel endian-ness they are actually stored in the reverse order, but when you're interpreting the colors as 32-bit ints that doesn't matter). The alpha byte is most often discarded, unless you're doing per-pixel alpha blending (which is cool).
You know how bitmasking work, dontcha? Great, skip to the next section. Otherwise, this is essential to be able to understand what's going on.
Hex and Bitmasking
First of all, you need to understand the hexadecimal numbering system. Counting in hex is just like in the normal decimal system, but there are 16 digits instead of 10. The extra 6 digits needed are usually expressed as A, B, C, D, E and F. A=10 dec, and F=15 dec.
In code, you can write hexadecimal numbers by prefixing them with 0x. Like this:
int temp = 0x1F;
Temp will equal 31, which is 1F hex.
As 16 * 16 = 256, and a byte is composed of 8 bits and thus can have 256 different values (2^8=256), two hexadecimal numbers represent one byte. This means that to get the blue component out of a 32-bit color value (an int), we can do this:
blue = color & 0xFF;
If you're a real coding newbie, you may ask: What is that & doing there? Well, it combines numbers with a binary AND. 01001001 AND 11000011 = 01000001. That is, where both bits are set, the destination bit is set, otherwise it's not.
As you may have figured out, a hex F = 1111 in binary. That's why we can mask out the blue part like I did above. You see:
A R G B
color = 00000000 10010110 00101010 11011001
AND mask = 00000000 00000000 00000000 11111111 // FF == 000000FF
result = 00000000 00000000 00000000 11011001
Only the blue part is left in the result! To get the green part, use 0xFF00, and to get the red byte use 0xFF0000.
There's another binary operator which is quite useful. It's called OR. I use it all the time to combine the color components back together (you can also often use adds, but I think ORs are more logical to use here). This code swaps the red and green components of a color:
int r=(color & 0xff) << 16;
int g=(color & 0xff00);
int b=(color & 0xff0000) >> 16;
color = r | g | b;
Why does this work? you may ask. Check it out (I'll leave out the blue in this example):
red = 10111010 00000000 00000000
OR green = 00000000 11010110 00000000
result = 10111010 11010110 00000000
If either of the bits are set, the result bit will be set too, otherwise it'll not be set. When doing operations of this kind, you can always replace the | with a +, it will do the same thing. But I think the | is more logical to use. I don't think there is any speed difference between the two, though.
The simplest color operation is possibly the color add. What we do is just add all color components of both colors (c1 and c2) and then saturate them (make sure they don't go over 255). This is a bad, slow but easy-to-understand way of doing it:
int b = (c1&0xff) + (c2&0xff); //split and add
int g = (c1&0xff00) + (c2&0xff00);
int r = (c1&0xff0000) + (c2&0xff0000);
if (b>0xff) b=0xff; //saturate
if (g>0xff00) g=0xff00;
if (r>0xff0000) r=0xff0000;
color = b | g | r; //combine them back
Easy, wasn't it? There are other ways to do this. The easiest is perhaps to use the MMX instruction PADDUSB:
Remember to put an emms (empty mmx state) at the end of a block of code using mmx (before doing any float operations):
Color addition is useful for stuff like particle systems (draw your particles additively), but can of course be used for many other things. If you want your particle system to look like smoke, you can try using subtractive blending, which is essentially the same thing. Use minus instead of plus, and clip r/g/b to 0 instead of 255. For the MMX implementation, simply replace the paddusb with a psubusb.
This isn't very complicated either. We just have to know that if we multiply two numbers between 0 and 255, we'll get any number in the range 0..65535. We'll have to divide the result by 256 to get the correct value. And as division is slow, we'll use shifts. You know that x/256 equals x >> 8 right? Good.
int b = ((c1 & 0xff) * (c2 & 0xff)) >> 8;
int g = (((c1 & 0xff00) * (c2 & 0xff00)) >> 8) & 0xff00;
int r = (((c1 & 0xff0000) * (c2 & 0xff0000)) >> 8) & 0xff0000;
color = b | g | r;
We have to AND R and G at the end to make sure no stray bits mess up the lower color components.
This multiplication code destroys whatever is in the alpha byte, so if you want to keep it you'll have to save it before multiplying, and then OR it back into the resulting color.
Most often, we'll only want to multiply a color with a 0-255 value. As it turns out, we can multiply the red and blue values together, because they won't blend into each other:
int g = (((c1 & 0xff00) * val) >> 8) & 0xff00;
int br = (((c1 & 0xff00ff) * val) >> 8) & 0xff00ff;
color = g | br;
Nice, huh? Two imuls for a color multiplication! And we're not even using MMX yet!
This is what you're all been waiting for...
As you may know, the formula for alpha blending goes like this (alpha=0 to 255):
color = (c1 * (255 - alpha)) + (c2 * alpha);
"Two multiplications per component? That'll be 6 for the whole blend!" you may think. But not so. I turns out that this equation can simply be rewritten into:
color = c2 + (c1 - c2) * alpha;
And, because this is almost the same thing as color multiplication, we can do alpha blending too in 2 muls/pixel! Nice! Take a look:
rb = ((c2 & 0xFF00FF) + ((c1 & 0xFF00FF) - (c2 & 0xFF00FF)) * alpha) >> 8;
g = ((c2 & 0xFF00) + ((c1 & 0xFF00) - (c2 & 0xFF00)) * alpha) >> 8;
rb &= 0xFF00FF;
g &= 0xFF00;
return rb | g;
Yes, c1 and c2 are ANDed by 0xFF00FF and 0xFF00 two times each, but I am quite sure most modern compilers will optimize that slight inefficiency away. Otherwise you can precalc those values. (or even better, turn the whole formula into ASM, preferably MMX).
50% Alpha Blending
This special case of alpha blending can be made very fast, if we can afford a one-bit precision loss (which we 99% of the time definitely CAN, we're in 32-bit dude). Look:
color = ((c1>>1) & 0x7f7f7f) + ((c2>>1) & 0x7f7f7f);
It can also be written as:
color = ((c1 & 0xFEFEFE) + (c2 & 0xFEFEFE))>>1;
which should be slightly faster.
I'll leave it as a little exercise for you to try to figure out why and how these work. With a little thinking, you should be able to convert the upper one into 75%, 25%, and other values that are 100% / (2^n).
Most of these color operations will benefit greatly from the use of MMX, but that's not what this tutorial mainly is about. Check out Rawhed's MMX tutor in Hugi 19 instead.
-Ector of Medieval (Ector^Mdl)