2D-RS Schemes‎ > ‎

2D ECC Concepts

The acronym “RAID” stands for Redundant Array(s) of Independent/Inexpensive Disks/Drives/Devices. The original RAID paper can be downloaded by clicking the following link... Original RAID Paper.

MLC NAND Flash stands for Multi-level Cell NAND Flash. Multi-level Cell NAND Flash memory elements store multiple bits in each memory cell. The MLC idea is essentially the same idea used in communications systems that require the receiver to decide which of multiple possible phases or amplitudes were sent in a small time interval. It is the author’s strong opinion that MLC NAND Flash memories should implement nonbinary error-correcting codes such as a Reed-Solomon (RS) codes so that all of the bits from one cell are in one symbol. The communications industry has been doing that for decades, but the Flash industry has been implementing a scheme that forces the bits from one cell to be in separate records (pages) so that one cell failure can cause multiple binary symbol failures – which seems somewhat illogical and suboptimal. (See Correcting Cell Failures)

Performance of storage systems is measured in IO’s per second (IOPS) or number of input or output transfers per second. Performance is usually a function of record size. Because of the trend to put more and more video and audio on the web, many web server records are very long streams of data so it is important for future storage systems to be able to handle large block transfers.


Purpose of this Article

The primary purpose of this article is to describe at a high level a powerful way of designing ultra‑high‑performance RAID storage systems so that future RAID systems will be able to use fairly unreliable memory elements such as MLC NAND Flash chips with no loss of performance or data when numerous cells and chips fail.

It’s been more than 20 years since the original RAID paper was written, but the concepts involved in designing fault-tolerant storage systems remain confusing. Hopefully this article will clarify some of those confusing concepts.

Most-likely all future fault-tolerant storage systems will use some type of two-dimensional (2D) ECC scheme so it is beneficial to have a basic understanding of the 2D ECC concepts presented in this article.

This article is also an introduction to two other articles on 2D-RS solid state disks (SSDs) and 2D-RS hard disk drives (HDDs).


1D, 2D and 3D Encoded Data

Figure 1 illustrates how redundant data items are appended onto freely chosen data items in 1D, 2D and 3D ECC schemes.

 

Figure 1 Encoded Data for 1D, 2D and 3D ECC Schemes 


There are a total of 27 freely chosen data items illustrated in Figure 1. Figure 1 shows how those 27 data items would be encoded using 1D, 2D or 3D ECC schemes.

What are called “2D” schemes in this article are called “product codes” in coding theory textbooks and papers. They are two different terms for the same thing.


Original RAID Paper

The 2D ECC schemes presented in the original RAID paper were intended for use with storage systems that use HDDs as memory elements. At that time, MLC NAND Flash chips were not being used in mass storage systems.

In conventional RAID storage systems, the columns shown in Figure 1 are usually encoded using a RS error-correcting code inside each HDD, the rows are usually “bits”, and there is normally only one redundant column which is the XOR (modulo 2 sum) of the data bits in each row.


Hardware Encoding and Decoding of 2D Arrays

A block diagram of a storage device that encodes and decodes 2D arrays in hardware is illustrated in Figure 2.

 

Figure 2 Encoding and Decoding of 2D Arrays in Hardware 


When writing, rows are encoded by a row encoder, columns are encoded by multiple column encoders, and the 2D encoded array, as illustrated in Figure 1, is written to a set of memory elements. 

When reading, columns read from the storage elements are decoded by multiple column decoders and rows are decoded by a row decoder.


Conventional RAID Systems

In conventional RAID systems, column encoding and decoding is done inside the HDDs and row encoding and decoding is normally done in the controller software. Row decoding is always bypassed unless column decoding fails indicating that a memory element has failed.

In conventional RAID systems, it is assumed that HDDs which contain ECC are reliable and rarely fail. Without the row encoder and decoder, the hardware block diagram shown in Figure 2 is transformed into the block diagram shown in Figure 3 which is just an array of HDDs.

 

Figure 3 2D ECC Scheme without the Row Encoder and Decoder 


In most conventional RAID systems, multiple memory elements can be written and read simultaneously as long as memory elements do not fail. When memory elements fail, a failure recovery operation must be initiated. Multiple failure recovery operations could be happening simultaneously in the controller software.


Write Overhead Performance Penalty

In conventional RAID systems, in order to overwrite a column of existing data in a 2D encoded array with new data, the existing data in that column and the existing redundant column(s) must first be read. A new redundant column(s) is/are computed, and the new data and redundant column(s) are written. A small amount of computing is necessary to compute the new redundant column(s) from the existing data column, existing redundant column(s) and the new data column. The overhead needed to accomplish the overwriting of an existing data column is referred to as the “write overhead performance penalty”. The write overhead performance penalty increases significantly as the number of redundant columns is increased. For that reason, normally only one redundant column is used in conventional RAID storage systems rather than two redundant columns as illustrated in Figure 1.


Memory Element Failures

When a memory element fails in a conventional RAID system, action must be initiated to replace the failed memory element with a new one, and the data that should be in the new memory element must be reconstructed from the 2D array data stored in the other non-failing memory elements.

Obviously, there is a chance that one or more of the other non-failing memory elements may fail during the rebuilding process and, if that happens, data can be lost with no chance of ever recovering it.

More redundant columns could be added to the 2D encoded array so more memory element failures could be tolerated, but, as mentioned in the preceding section, more redundant columns significantly increases the write overhead performance penalty. There is a balance between level of performance and level of fault-tolerance – when one goes up, the other goes down.

The problem of reconstructing data has become increasingly severe as the capacity of HDDs continues to increase because the time required to rebuild the data on a new HDD has dramatically increased and the probability of other drives failing within the rebuilding time is significant. There is no way to solve this problem using the most widely implemented conventional RAID ideas.


2D-RS ECC Schemes

RS codes can be used on both rows and columns to form a 2D-RS ECC scheme. The resulting 2D-RS ECC scheme is much more powerful than existing 2D ECC schemes because RS codes can correct erasures as well as errors and the decoding of columns and rows can be iteratively repeated. These decoding techniques are not possible using conventional RAID ideas.

In order to take full advantage of the power of RS codes to correct both errors and erasures, RS row and column encoders and decoders are implemented in hardware as illustrated in Figure 2.

The features and benefits of 2D SSDs are described in the following article…2D-RS SSDs

The features and benefits of 2D HDDs are described in the following article…2D-RS HDDs


Conclusions

The ideas presented in the original RAID paper are appropriate for highly reliable, relatively low capacity HDD storage systems but are not appropriate for MLC NAND Flash memory systems. Doing row encoding and decoding in software is not practical when the memory elements are not ultra‑reliable because software is too slow, rebuilding data from failed devices takes too long, and the probability of data loss is too great. High-speed row encoding and decoding in hardware is needed.

Advancements in integrated circuit technology in the last two decades make it possible and practical to implement high-speed parallel RS row and column encoders and decoders in hardware that are capable of correcting errors and erasures caused by multiple cell and device failures on‑the‑fly with no loss of data or performance.

The time has come to implement powerful 2D-RS memory systems. 2D-RS systems will significantly outperform systems based on conventional RAID ideas and performance will not degrade with time or as components fail as long as failed components are periodically replaced.


Interesting Note

The 2D-RS ECC system can be configured to correct t symbol errors and s symbol erasures in each row as long as 2t+s < R where R is the number of symbols of redundancy (number of columns of redundancy). The minimal row configuration with R=1 is exactly equivalent to the most widely implemented conventional RAID scheme.