Iterative Decoding of 2D-RS Codes

Introduction

Within the last 15 years or so, Low-Density Parity Check (LDPC) codes with iterative decoding have become popular and are starting to be implemented in a number of communications and storage devices. There are positive and negative things about implementing LDPC codes with iterative decoding. LDPC codes come close to meeting some theoretical bounds which is good, but they also take a very large number of gates to implement, are difficult to understand and analyze, there are few if any standards, are costly in terms of dollars and time to implement and they tend to have fairly high error floors which is not good.

As mentioned later in this article, according to current coding theory literature, 2D binary BCH codes with iterative decoding have compared favorably with LDPC codes with iterative decoding and have outperformed LDPC codes in some cases which implies that 2D RS codes with iterative decoding may perform even better than 2D binary BCH codes and therefore compete favorably with LDPC codes.

“Iterative decoding” of 2D RS codes means that rows and columns of a 2D array of RS symbols are independent codewords and decoding is accomplished by first decoding columns then decoding rows and repeating column and row decoding a number of times. More and more errors are corrected as column and row decoding is repeated.

The major obstacle in implementing 2D RS codes with iterative decoding is the need to have an ultra-fast RS decoder to decode rows in a 2D array of RS symbols. ECC Tek’s parallel Reed-Solomon (PRS) system overcomes that obstacle.

The advantages of 2D RS codes over LDPC codes are that they are easy to understand and analyze, relatively easy and inexpensive to implement, gate counts are not extremely high, and error floors can be made arbitrarily low by correcting more errors. They also naturally correct clusters or bursts of errors which can occur when there are multiple bits per symbol in communications systems or multiple bits per cell in storage systems.

One of the most compelling reasons to choose 2D codes with iterative decoding over LDPC codes is that the theory of LDPC codes is very complicated and still evolving and some companies employee more than 50 people to develop LDPC systems, many with PhD Degrees. The cost of developing LDPC systems can be millions of dollars. On the other hand, developing 2D-RS systems is relatively inexpensive.

Another important reason to choose 2D codes with iterative decoding over LDPC codes is the time savings. A 2D-RS scheme could be implemented in as little as a few weeks while LDPC schemes would probably take months.

This article compares 2D-RS codes which are decoded using iterative decoding with 2D-binary BCH codes decoded using iterative decoding to show that 2D-RS codes are most-likely superior to 2D-binary BCH codes and that, as shown in a paper about Staircase Codes, 2D-RS codes are probably superior to Low Density Parity Check (LDPC) codes with iterative decoding in some cases and can be used in a wide variety of applications. 2D-RS codes can also be used to create fault-tolerant communications and storage systems with an arbitrarily level of fault-tolerance.

What are called “2D” error-correcting codes in this article are called “product codes” or “2D product codes” in coding theory literature. For product codes, the length of a product codeword is the product of the lengths of the component codewords and the rate of the product code is the product of the rates of the component codes. In this article, a 2D-RS code is a 2D-RS product code and a 2D-binary BCH code is a 2D-binary BCH product code.

It can be confusing to use the term “2D-RS code” because a 2D-RS code contains RS codewords as components. It might be helpful for some readers to think of a 2D-RS code with iterative decoding as a “2D-RS scheme” instead of a 2D-RS code.
Iterative Decoding

Iterative decoding means one thing for 2D-RS and 2D-binary BCH codes and something different for LDPC codes.

For 2D-RS, 2D-binary BCH and LDPC codes, iterative decoders make an initial guess as to what the transmitted codeword was, and, if that guess is not a valid codeword, another hopefully improved guess is made. That process is repeated a number of times. As long as each guess is an improvement from the previous guess, the process will eventually result in a valid codeword.

This article discusses iterative decoding of 2D-RS and 2D-binary BCH codes.

For 2D-RS codes and 2D-binary BCH codes, decoding iterations are performed on rows and columns of a 2D array. Assume columns are decoded first. After the columns are decoded, the rows are decoded. If errors remain after the columns and rows are decoded, the process is repeated until a valid codeword is found or until a specified number of iterations is reached.

Generally speaking, for LDPC codes, iterations are based on calculating probabilities. Within the last decade, modern coding theory has concentrated primarily on iterative decoding of LDPC codes.
Staircase Codes

A paper on Staircase Codes asserts that 2D-binary BCH codes with iterative decoding in some cases outperform LDPC codes. This article suggests that 2D-RS codes will outperform 2D-binary BCH codes, and therefore will outperform LDPC codes in some cases.

Example Codes

Consider implementing a 2D-binary BCH code and a 2D-RS code on a 1000x1000 array of bits as illustrated below.





Consider RS symbols to be 3x3 arrays of bits as illustrated below.




 

Binary BCH codewords are illustrated below.






Comparison of Codes

How does a 2D-binary BCH code that corrects 5 bits per row and 5 bits per column compare with a 2D-RS code that corrects 5 symbols per row and 5 symbols per column?
Amount of Redundancy Required

For the binary BCH code we need an “m” so that m is large enough to identify each bit in a codeword. Since there will be over 1024 bits in each binary BCH codeword, m needs to be 11 bits. The binary BCH codewords require mt = 11x5 = 55 bits of redundancy. (To be accurate, the amount of redundancy required by a binary BCH code is less than or equal to mt, but in most cases it is equal to mt.)
Each RS codeword requires 2t = 2x5 = 10 symbols or 10x3 = 30 bits of redundancy per row or column of bits – 25 bits less than the binary BCH codewords.

The 2D-RS code takes considerably less redundancy than the 2D-binary BCH code if they both have the same t value.



Error Correction Performance

If bit errors fall randomly on a 2D array of bits, once in awhile multiple bit errors will fall in the same RS symbol which makes the 2D-RS code look more attractive.

If errors naturally occur in bursts or clusters as might be the case if the bits in one RS symbol came from one memory cell, the 2D-RS code looks more attractive. (Some current MLC NAND Flash memory chips store 3 bits/cell so each RS symbol in this example could hold the data from 3 cells.)

On the other hand, random bit errors may cause more symbols to be in error in a RS codeword than bits in a binary BCH codeword which makes the 2D-binary BCH code look more attractive and indicates that the RS code may need a larger t value.

With iterative decoding, we have the option of using erasure correction in addition to error correction with the 2D-RS code which may boost the 2D-RS code performance. ECC Tek’s RS decoders will either (1) correctly correct the errors, (2) miscorrect, or (3) fail to decode. It may be advantageous to declare all of the RS symbols erased when the decoder fails to decode. Simulation would need to be done. It appears that the best thing would be to not declare all of the output symbols erased because only a small number of them are most-likely in error. Erasure correction at the input does seem like it would be a good thing because there may be some reliable way to know if input symbols are unreliable and therefore should be declared erasures.

In this example, we can increase the number of symbols corrected by the RS code to t=6, 7, 8 or 9 without using as much redundancy as the binary BCH code with t=5.



Conclusion

It appears that if a 2D-binary BCH code with iterative decoding can outperform a LDPC code in some cases, a 2D-RS code should be able to do even better without requiring any more redundancy.



RS Symbols

We have considered 3x3 RS symbols in this article, but 9-bit RS symbols could also be 1x9 or 9x1. 8-bit RS symbols could be 1x8, 8x1, 2x4, or 4x2. 7-bit symbols could be 1x7 or 7x1. 6-bit symbols could be 1x6, 6x1, 2x3, or 3x2, etc.

Also, RS row symbols do not have to be the same size as RS column symbols as illustrated below.




What 2D-RS Configuration to Use

2D-RS codes appear to be applicable to a wide variety of applications. They can be used with conventional wireless networks, space channels, HDDs, SSDs, MLC NAND Flash devices, optical networks and also special configurations can be used to create fault-tolerant communications and storage systems.

The 2D-RS example in this article is appropriate for use with MLC NAND Flash chips or for use with serial communications channels such as optical or wireless channels. In other words, the example in this article would be for use with serial channels which can be modelled as a binary symmetric channel.

In order to create fault-tolerant memory or communications systems, a configuration such as the one shown above would be more appropriate.


Implementation Complexity

For most serial channels, designing 2D-RS encoders and decoders in digital logic is not extraordinarily difficult. Designing 2D-RS encoders and decoders in digital logic for high-performance, fault-tolerant systems is difficult. ECC Tek put special emphasis on the design of parallelized and pipelined RS encoders and decoders and was granted a a US Patent on those types of designs in 1998 The parallel RS encoders and decoders developed by ECC Tek are probably the fastest anyone has ever designed. They can decode RS rows or columns as illustrated above as fast as they are received so that fault-tolerant systems can be designed with no performance degradation even with failed components.

The parallelized and pipelined RS encoder and decoder designs developed by ECC Tek have been licensed for use in 5 NASA missions, two of which are already in space so they are proven designs and there is almost no risk in implementing them in different configurations.


Correctable Error Patterns

There are an enormous number of correctable error patterns.

Assume up to t errors can be corrected in each row and column. Then all of the bits in t columns or all of the bits in t rows can be in error and are correctable.


Other Implementations

2D-RS Korea
2D-RS USA1
2D-RS USA2


Request

I would like to ask all of those who have the knowledge and skills needed to thorough analyze the performance of 2D-RS codes with iterative decoding to do so, publish their results and add a comment to this page to let us know how we can access those results. Thank you. ECC Tek specializes in the design and licensing of encoders and decoders - not on theoretical analysis.
Significance of Iterative Decoding of 2D-RS Codes

If, in fact, it proves true that 2D-RS codes with iterative decoding generally outperform LDPC codes, that fact will be extraordinarily significant because LDPC codes are currently being implemented in many if not most HDDs, many communications channels and have pretty much been the exclusive topic of interest of coding theorists for about the last 12 years or so.


Iterative Decoding Options


Several decoding options are possible when designing iterative decoders.

As an example, consider a 3-D configuration. Decoding can first be done in the “y” or “height” direction, then in the”x” or “width” direction and finally in the “z” or “depth” direction.

However, as with conventional RAID systems, if one decoding appears to have correctly corrected all of the errors, the remaining decodings do not have to be done. If, in the above example, the “y” decoding appears to have successfully corrected all of the errors, then the “x” and “z” decodings do not have to be done. That’s what is done in conventional RAID systems such as RAID-5.

Most likely when implementing a 2D-RS scheme with iterative decoding, the decoders will be designed so that error correction is not attempted if the decoder detects that more than t errors have occurred because, in that situation, if correction was attempted, the decoder will most likely insert t more errors into the received word which already contains T (T > t) errors in it. Designing RS decoders with this feature takes slightly more gates than designing them without this feature.
Using 2D-RS Codes to Correct Random “bits” in Error

2D-RS naturally correct clusters or bursts of errors, but for those channels that are not bursty and bits are randomly in error, 2D-RS codewords can be scrambled before transmitting or storing them and then unscrambled before decoding. That way, bits from RS symbols will be located in random positions in the transmitted or stored data.