Mathematics‎ > ‎

Finite Fields, RS Codes and RS RAID

Finite Fields and RS Codes

You do not need to know much about finite fields to understand RS codes implemented in digital logic.

Finite fields exist with elements where is any prime number. For RS codes implemented in digital logic, we are only interested in finite fields where so the number of elements is .etc. Each element is a set of bits.

Every finite field has at least one primitive element, and most finite fields have many primitive elements. In some finite fields, all the elements except 0 and 1 are primitive. The following table gives some examples of the number of primitive elements in finite fields of various sizes.




It is customary to call any primitive element “alpha” or . When you multiply a primitive element by itself multiple times, it generates every nonzero element in the field.

Here’s an example of three Finite Fields with 16 4-bit elements.

In order for an error-correcting code to correct t symbol errors, 2t consecutive powers of must be roots of the generator polynomial, . For Reed-Solomon codes, where is the first root power which is normally  (because it simplifies the decoder somewhat) so the roots are . From now on , assume .

RS encoders input a set of data symbols and output a set of codeword symbols. If you consider the output symbols to be coefficients of a codeword polynomial, , then every codeword polynomial is a multiple of so that roots of are also roots of .

Since are roots of ,


                               - OR-



Now, let’s call the elements in the second column the locations, Li, of the codeword or received word symbols/coefficients as illustrated below where is the received word/polynomial possibly with errors added and is the syndrome. LN-1, LN-2, LN-3, … , L0 are constants.



where is the error polynomial or error pattern.

Since, as we’ve seen above, the syndrome for , the syndrome of only depends on the error pattern.

Suppose the error pattern has nonzero error values of Va and Vb in locations, La and Lb.

Then,
  1. S0 = Va + Vb 
  2. S1 = VaLa + VbLb 
  3. S2 = VaLa2 + VbLb2 
  4. S3 = VaLa3 + VbLb3
etc.

With two errors, there are 4 unknowns so we need 4 simultaneous equations to solve for the 4 unknowns. If the locations of two errors are known, then we only need 2 simultaneous equations since there are only 2 unknowns. In general, to correct t errors, 2t redundant symbols are needed. To correct s erasures, s redundant symbols are required, and to simultaneously correct t errors and s erasures, 2t+s redundant symbols are required.

RS codes are “almost perfect” so only a minimum number of redundant symbols are required.

RS decoders find the unknown error location and error values. For erasures, the error location is known and only the error value is unknown. It is possible for the error value to be 0 for an erasure, but not for an error.

It took many mathematicians many years to discover practical methods for solving the above simultaneous equations. The good news is that today a number of different practical methods exist that are easily implemented in digital logic. It is next to impossible for non-mathematicians to understand why the methods work, but it is easy for most engineers to understand how to implement the methods. Developers, manufacturers and users do not need to understand why methods work, they just need to be sure the methods work, and mathematicians have proved that the methods will always work.

ECC Tek makes use of Willard Eastman’s Euclideanized version of the Berlekamp-Massey algorithm to find error locations and values.

Most RS and BCH decoders make use of what is called the key equation/congruency. An error locator polynomial, L(x), and an error evaluator polynomial, V(x) are defined and the key equation is

.

The B-M algorithm finds the lowest degree L(x) and V(x) given S(x) (and possibly erasure locations when erasures are being corrected as well as errors) that satisfy the key congruency. Mod x2t means the highest degree of V(x) is 2t-1.

When the algorithm is finished, the roots of L(x) indicate the locations of the errors and a simple formula is used to compute the values of the errors.

For parallel RS implementations, a parity generator matrix is used for encoding and a parity check matrix for calculating the syndrome. Then the B-M algorithm is executed in a parallelized and pipelined fashion.

For serial RS implementations, operations are performed serially.


Distinction Between Errors and Erasures

As far as ECC Tek’s encoder and decoder designs are concerned, we make the following definitions for errors and erasures. These definitions may not be the same as what you may find in a textbook or papers on coding theory. Their definition of erasure may involve an additional erasure character.

For errors, both the location and value of the actual error are unknown, and the error value is not zero.

For erasures, the location of the “potential” error is known, the value is unknown, and the value could be nonzero or zero.


RS RAID Implementing RAID-2

RAID-2 corrects for 1 unknown component failure. The following figure illustrates one row of RS symbols across multiple components. Each RS row symbol is stored in a separate component. Both the location and value of the error are unknown. Since there are 2 unknowns, 2 simultaneous equations are needed to solve for the unknown error location and error value.



RS RAID Implementing RAID-3, RAID-4, and RAID-5

RAID-3, RAID-4 and RAID-5 correct 1 component failure when it is known which component has failed. As far as RS decoding is concerned, the location of the error is known but the value is unknown. The error location is known and the error value is S0.


RS RAID Implementing RAID-6

RAID-6 corrects for 2 component failures when it is known which 2 components have failed. As far as RS decoding is concerned, the locations of the errors are known, but the values are not. The error locations are known and the error values are determined by solving 2 simultaneous equations.


Scalable RS RAID

RS RAID systems tolerate s component failures and t errors in each row as long as 2t + s < R where R is the number of redundant components.