An Introduction to CD Error Correction

TAD

Introduction

This article was inspired by a repeated television programme I saw again the other day. It was interesting because it did not only show the mechanical and manufacturing technology behind CD creation, but it explained the encoding of sound information and how CD players deal with 1000s of data errors.

Why do we need error correction?

This was nicely demonstrated by Philips (the inventors of CD technology) by cutting 4 slots into a CD and then attempting to play it!

The sound quality was really amazing. Even with 4 huge slots cut into the CD, through which a large coin could be easily pushed, there were no audible problems, no hiss, no jumps, no noise!

All this was due to the error correction technology. A newly pressed CD, with its giga-bits of data pressed on it, may have 1,000s or 10,000s of errors. When you consider the depth of the "pits" on the CD surface is half the wavelength of the scanning laser light then you can appreshiate how difficult it would be trying to press a CD without a single error on it. Imagine having a 16-bit sample where the MSB bit is corrupt. Attempting to play this sample would result in very poor quality.

The programme showed a CD player with its error correction turned off with a new, clean CD (without the 4 slots cut into it). The manufacturing errors along with human fingerprints and other marks on the CD surface resulted in lots of noise and hiss. It sounded like a radio which isn't tuned into the station correctly. The error-correction was clearly THE most important component in CD technology, without CD-ROM and CD-players would be totally useless.

An overview of CD encoding.

There are 3 stages for the encoding process.

1. ATD (Analogue to Digital) converter
2. Error correction encoder
3. Modulation encoder

The first stage samples an audio signal and turns it into a stereo 16-bit sample (exactly what most/all sound cards do these days).

The second stage introduces some extra bits into the sample. These are used for the error correction system.

The third stage modulates the data stream to even out the number of 0s and 1s in the data stream. This data stream is written onto the CD.

The reverse happens when decoding.

1. Modulation decoder
2. Error correction decoder
3. D2A (Digital to Analogue) converter

One bit error correction.

Here is a simple scheme which encodes a 4-bit symbol into a 7-bit symbol.

The symbols labelled A,B,C,D are the 4 bits we wish to encode and e,f,g are the extra 3 bits used for the error correction code.

ABCD efg
xxxx yyy

The extra 3 bits are calculated like this:

e = (A xor B xor D)
f = (A xor C xor D)
g = (B xor C xor D)

To understand this a little more, draw a VENN diagram with 3 overlapping circles. Here is a really bad ASCII, but you can use your imagination and pretend that the below mess looks like 3 circles.

                                   ┌-----------------------┐
                                   │                       │
                ┌------------------┼----┐                  │
                │                  │    │            f     │
                │    g             │ C  │                  │
                │        ┌---------┼----┼--------┐         │
                │        │         │    │        │         │
                │        │         │ D  │   A    │         │
                │        │    B    │    │        │         │
                │        │         └----┼--------┼----------
                │        │              │        │
                └--------┼---------------        │
                         │                 e     │
                         │                       │
                         └------------------------

Place your 4 bits of data in areas A,B,C,D then apply the following rule for each of the 3 outer areas (e,f,g).

There should be an even number of 1s in each circle (okay, rectangle).

Let's use an example code of 1011 binary to demonstrate this.

                                   ┌-----------------------┐
                                   │                       │
                ┌------------------┼----┐                  │
                │                  │    │            f     │
                │    g             │ 1  │                  │
                │        ┌---------┼----┼--------┐         │
                │        │         │    │        │         │
                │        │         │ 1  │   1    │         │
                │        │    0    │    │        │         │
                │        │         └----┼--------┼----------
                │        │              │        │
                └--------┼---------------        │
                         │                 e     │
                         │                       │
                         └------------------------

The bottom circle contains 0,1,1, which is an EVEN number of 1s, so e=0.

The left circle contains 0,1,1, which is an EVEN number of 1s, so g=0

The right circle contains 1,1,1, which is an ODD number of 1s, so f=1

Now the final diagram looks like this:

                                   ┌-----------------------┐
                                   │                       │
                ┌------------------┼----┐                  │
                │                  │    │            1     │
                │    0             │ 1  │                  │
                │        ┌---------┼----┼--------┐         │
                │        │         │    │        │         │
                │        │         │ 1  │   1    │         │
                │        │    0    │    │        │         │
                │        │         └----┼--------┼----------
                │        │              │        │
                └--------┼---------------        │
                         │                 0     │
                         │                       │
                         └------------------------

Correcting a 1-bit error.

Now suppose that area C has an error, the diagram now looks like this:

                                   ┌-----------------------┐
                                   │                       │
                ┌------------------┼----┐                  │
                │                  │err │          f=1     │
                │  g=0             │ C=0│                  │
                │        ┌---------┼----┼--------┐         │
                │        │         │    │        │         │
                │        │         │ D=1│ A=1    │         │
                │        │  B=0    │    │        │         │
                │        │         └----┼--------┼----------
                │        │              │        │
                └--------┼---------------        │
                         │                 e=0   │
                         │                       │
                         └------------------------

By applying the same rule of having an EVEN number of 1s in each circle, we can check for, then correct any errors.

   Bottom circle B,D,A,e = EVEN                   - ok
   Left circle   g,C,D,B = ODD                    - ** ERROR **
   Right circle  C,f,A,D = ODD                    - ** ERROR **

By using the properties of the overlapping VENN diagram we can locate which bit is incorrect. In this case it is where the left and right circles overlap which is symbol C.

We can correct the error simply by inverting symbol C, making it = 1.

The important thing to remember is that the 4 data bits (A,B,C,D) are covered by 2 or 3 overlapping circles. Even if one the 3 extra bits (e,f,g) is wrong we can still detect if the data is valid by applying the EVEN rule.

NOTE: This method will only correct ONE error!

Finite Fields

The arithmetic used on CD is finite fields, this is basically normal arithmetic but is MOD 13, so only the numbers 0,1,2.... 12 are used.

ADDITION

   ┌-----┬----┬----┬----┬----┬----┬----┬----┬----┬----┬----┬----┬----┬----┐
   │  +  │  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │ 12 │
   ├-----┼----┼----┼----┼----┼----┼----┼----┼----┼----┼----┼----┼----┼----┤
   │ 0   │  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │ 12 │
   │ 1   │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │ 12 │  0 │
   │ 2   │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │ 12 │  0 │  1 │
   │ 3   │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │ 12 │  0 │  1 │  2 │
   │ 4   │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │ 12 │  0 │  1 │  2 │  3 │
   │ 5   │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │ 12 │  0 │  1 │  2 │  3 │  4 │
   │ 6   │  6 │  7 │  8 │  9 │ 10 │ 11 │ 12 │  0 │  1 │  2 │  3 │  4 │  5 │
   │ 7   │  7 │  8 │  9 │ 10 │ 11 │ 12 │  0 │  1 │  2 │  3 │  4 │  5 │  6 │
   │ 8   │  8 │  9 │ 10 │ 11 │ 12 │  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │
   │ 9   │  9 │ 10 │ 11 │ 12 │  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │
   │10   │ 10 │ 11 │ 12 │  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │
   │11   │ 11 │ 12 │  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │
   │12   │ 12 │  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │
   └-----┴----┴----┴----┴----┴----┴----┴----┴----┴----┴----┴----┴----┴-----

You perform the normal addition and then perform MOD 13.

For example 8 + 11 = 19 but after MOD 13 = 6, so 8 + 11 = 6. And 10 + 7 = 4.

MULTIPLICATION

   ┌-----┬----┬----┬----┬----┬----┬----┬----┬----┬----┬----┬----┬----┬----┐
   │  *  │  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │ 12 │
   ├-----┼----┼----┼----┼----┼----┼----┼----┼----┼----┼----┼----┼----┼----┤
   │ 0   │  0 │  0 │  0 │  0 │  0 │  0 │  0 │  0 │  0 │  0 │  0 │  0 │  0 │
   │ 1   │  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │ 12 │
   │ 2   │  0 │  2 │  4 │  6 │  8 │ 10 │ 12 │  1 │  3 │  5 │  7 │  9 │ 11 │
   │ 3   │  0 │  3 │  6 │  9 │ 12 │  2 │  5 │  8 │ 11 │  1 │  4 │  7 │ 10 │
   │ 4   │  0 │  4 │  8 │ 12 │  3 │  7 │ 11 │  2 │  6 │ 10 │  1 │  5 │  9 │
   │ 5   │  0 │  5 │ 10 │  2 │  7 │ 12 │  4 │  9 │  1 │  6 │ 11 │  3 │  8 │
   │ 6   │  0 │  6 │ 12 │  5 │ 11 │  4 │ 10 │  3 │  9 │  2 │  8 │  1 │  7 │
   │ 7   │  0 │  7 │  1 │  8 │  2 │  9 │  3 │ 10 │  4 │ 11 │  5 │ 12 │  6 │
   │ 8   │  0 │  8 │  3 │ 11 │  6 │  1 │  9 │  4 │ 12 │  7 │  2 │ 10 │  5 │
   │ 9   │  0 │  9 │  5 │  1 │ 10 │  6 │  2 │ 11 │  7 │  3 │ 12 │  8 │  4 │
   │10   │  0 │ 10 │  7 │  4 │  1 │ 11 │  8 │  5 │  2 │ 12 │  9 │  6 │  3 │
   │11   │  0 │ 11 │  9 │  7 │  5 │  3 │  1 │ 12 │ 10 │  8 │  6 │  4 │  2 │
   │12   │  0 │ 12 │ 11 │ 10 │  9 │  8 │  7 │  6 │  5 │  4 │  3 │  2 │  1 │
   └-----┴----┴----┴----┴----┴----┴----┴----┴----┴----┴----┴----┴----┴-----

Again you take the remainder of MOD 13 as your answer.

For example, 7 * 2 = 1, or 6 * 8 = 9

The nice thing about this field is that you can divide by any non-zero number. If you look down each column you should notice that every number (0..12) only appears once, and likewise across each row every number only appears once.

For example, 9 * 4 = 10 and 10 / 4 = 9

The Reed/Solomen code.

This example describes the code using a 13 element field.

      F   = { 0,1,2,3 .... 11,12 }
       13

So the field has 13 elements numbered from 0 to 12. The first 11 will be our actual data that we wish to encode and elements 11 and 12 will be used as two extra symbols for the error correction.

We take the 11 data elements (our audio data):

{ a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10 }

and convert them into 13 data elements:

   { c0, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12 }
    └-------------------┬----------------------- └---┬-----
                        │                            │
             the audio data to encode           extra symbols

There are two rules that the c field must follow:

1. The sum c0 + c1 + c2 ..... + c11 + c12 = 0
2. The sum of c1 + 2*c2 + 3*c3 + 4*c4 .... 11*c11 + 12*c12 = 0

The first rule is just the total sum of all the 13 elements. The second rule multiplys each element by its position within the field, so the symbol at position 5 is multiplied by 5, 6 by 6, 7 by 7 etc...

The two rules above determine the actual values of the c11 and c12 symbols.

Finding and Fixing an error

Suppose there is an error in the field of magnitude 'e'. Now performing the first rule we get:

c0 + c1 + c2 + c3 + c4 + c5 + c6 + c7 + c8 + c9 + c10 + c11 + c12 = e

(Note that we should get a value of 0 as the total sum.)

Now suppose that the error occured at position 'i' in the field. When the second rule is applied we get:

c1 + 2*c2 + 3*c3 + 4*c4 ... + 11*c11 + 12*c12 = i*e

So by using those two rules we can calculate both 'e' and 'i*e'. And we can perform a non-zero division using the finite field (MOD 13 thing).

i = i*e / e

At this point we have both the magnitude and the position of the error.

An example error field

Say we have our original data field c with the values:

field c = { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }

but we recieve the field r:

field r = { 1, 1, 1, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1 }

We need to find the error 'e' = 7 and position 'i' = 4

After performing the first rule of adding up all the elements we get:

r0 + r1 + r2 + r3 + r4 + r5 + r6 + r7 + r8 + r9 + r10 + r11 + r12 = 7

Then the second rule gives us this:

   r1 + 2*r2 + 3*r3 + 4*r4 ... 11*r11 + 12*r12  =  2 / 7
                                                = 28 / 7  (NOTE: MOD 13 !!!)
                                                = 4

Now we have found both 'e' and 'i' using the MOD 13 arithmetic, so we can correct it by doing this:

ci = ri - e

Basically we just subtract the error magnitude 'e' from the element at position 'i' in the field.

Closing Words

I hope someone out there has found this interesting. As soon as the programme is repeated again, or I come across some more detailed information concerning the burst-errors and delay lines in the encoding/decoding I will write another short article.

I did have a wacky idea concerning error-correction and compression where you could change a difficult symbol to produce a more 'match friendly' phraze. But this seems such a daft idea and would be probably very slow that maybe I should forget the entire thing and end the article now.

Regards

TAD #:o)