Binary BCH Design Options

This article discusses how ECC Tek’s binary BCH encoder and decoder designs can be customized to meet each licensee’s unique requirements.

The focus is primarily on the decoder because decoders are much more complex than encoders.

All of the binary BCH designs ECC Tek has licensed have the same basic decoder architecture as illustrated below.


SYN is the Syndrome Generator, DFIFO is the data FIFO, PPU stands for Polynomial Processing Unit, INIT stands for initialization of the error locator polynomial, L(x), EVAL stands for evaluation of L(x), and XOR stands for XORing the error pattern with the received word to correct the errors.

In all of ECC Tek’s binary BCH decoder designs before 9-1-11, all of the above functions use the same frequency clock and the input and output of the decoder are 8-bit values.

In order to optimize the design to get the best possible performance at the lowest cost in terms of gate-count (or area), this article discusses the use of multiple clocks, different levels of parallelism and variable data path widths. For example, SYN could have its own clock called SYN_clk, EVAL could have its own clock called EVAL_clk, etc.

If minimizing gate count is not important and m-bit values (m > 8) must be input and output from the decoder, then the above pattern for the decoder design can be used and each functional unit can be parallelized so that the decoder can input m-bit values. However, increasing the level of parallelism in each functional unit will require a fairly large number of gates.



Reducing Gate-Count

In order to minimize the gate-count, we first must determine the maximum frequency clock that can be used with the specific circuit technology that will be used to implement the design. We will then consider selecting various positive clock edges using select signals as illustrated below.





Alternatively, we can generate clocks using phase-lock loops as follows.



This web page does not discuss clock generation methods in detail, but there are a number of proven ways of generating clocks of various frequencies that are in phase and can be used inside FPGAs or ASICs.



SYN Parallelization Level

Each of the decoder functions can be implemented with a variable level of parallelism, p. If the function inputs and outputs one bit at a time, we say the parallelization level is 1. In the above decoder block diagram, the SYN module has a parallelization level of 8.

The lower the parallelization level, the fewer gates required, but the maximum rate at which data can be inputted is also decreased.

In order to minimize the gate count for SYN, we need to determine the lowest level of parallelism needed in order to get the input data rate we need. We start by synthesizing SYN with a parallelization level of 1 to see what input data rate we can achieve for a specific circuit technology.

Let’s say, for example, that we need to input 1 Gigabits/sec into SYN. If we synthesize SYN with p=1 and we are able to input at least 1 Gigabit/sec, then we can use the p=1 circuit for SYN which inputs 1 bit at a time.

If, however, we want the decoder to input 16 bits at a time, we can add an input converter circuit that converts 16 bits to 1 bit as shown in the following figure.



The above circuit will take far less gates than a SYN module with p=16 because the 16-to-1 converter circuit requires a very small number of gates.

In this example, the SYN module would be clocked using a very high frequency clock.


DFIFO

There are several different ways in which the DFIFO can be implemented...
  • As a ping-pong FIFO where writing and reading alternates betwen two RAM memories
  • Using a dual-port RAM memory
  • As registers
ECC Tek initially delivers a DFIFO model implemented as registers, but expects customers to replace that model with a more appropriate model of the customer's own choosing.  The initial register model is used for general simulation to verify functionality and to determine what size DFIFO is needed.

Most decoders can be implemented with a DFIFO's of various capacities.  The DFIFO must be large enough to hold at least one incoming record and need not be larger than what is needed to correct contiguous input records each with the maximum number of errors.  As a specific example, the DFIFO in one of our decoders which corrects t=12 bits in a block of 512 bytes can have a capacity of from 512 to 785 bytes.  If the capacity of the DFIFO is this specific example is < 785 bytes, the decoder may occasionally request the input to be paused.

It is easy to vary the parallelization level of the DFIFO because the parallelization level is just the width of the FIFO. Generally speaking, The gate count of DFIFO does not vary with the parallelization level. In the figure shown above, the DFIFO could be connected to input 16-bit values or 1-bit values.


PPU Parallelization Level

The parallelization level of the PPU can vary from 1 to R where R < 2t.  For all of ECC Tek’s designs, when p=1, the PPU operates on one coefficient of four polynomials simultaneously.  When p=2, it operates on 2 coefficients of four polynomials simultaneously, etc.  The four polynomials are the error locator polynomial, L(x), the auxiliary error locator polynomial, aL(x), the error evaluator polynomial, V(x), and the auxiliary error evaluator polynomial, aV(x).  Operations on the polynomials are modulo x2t so the highest coefficient is 2t-1. The gate count increases fairly rapidly as p is increased.  Unless decoder latency is of critical importance, the lower the p the better.

The level of parallelism needed in the PPU depends upon the decoder performance requirement.  For example, most of ECC Tek’s decoders are designed so that data can be input continuously into the decoders with no pausing or gaps between input records even if all of the input records contain the maximum number of bits in error.  Achieving that level of performance may mean p must be greater than 1.

Normally, you would not want to have a PPU parallelization level that is any larger than needed if gate count is important.

In some cases, the decoder must have “minimum latency” or minimum delay from when the last bit of an input block is received to when the corrected block begins to be ouputted.  In those cases, we must increase the parallelization level of the PPU to meet the latency requirement even though it requires a lot of gates.

In order to optimize the PPU design to minimize gate count, the PPU should be synthesized with p=1 to determine the maximum frequency a p=1 PPU can be clocked at for the circuit technology being used. One then has to determine what PPU parallelization level is needed to meet the decoder performance requirement.  As p is increased, the maximum frequency the PPU can be clocked at will decrease because the complexity of the PPU increases as p increases.


Variable or Fixed Delay PPU

The PPU can be designed with a variable delay so that the time it takes the PPU to compute the error locator polynomial, L(x), is a function of the number of bits that are actually in error or it can be designed with a fixed delay so that the time it takes is a constant equal to the maximum value.  Variable delay decoders require more gates than fixed delay ones.


Pipelining the PPU

If the parallelization level of the PPU is at a maximum of p=2t, then we have the option of also pipelining the PPU. A pipelined PPU design takes many more gates than one that is not pipelined, but pipelining can be implemented to increase the throughput of the PPU.   Normally pipelining is not required for binary BCH codes but is used in fully-parallelized Reed-Solomon decoders.


INIT Parallelization Level

A more complex INIT module is required if the decoder is programmable so that it can handle various generator polynomials, g(x)’s, and variable block lengths than if the decoder is designed to handle only one g(x) and one block length.

Generally speaking, it’s best to use a parallelization level of p=1 in INIT if at all possible otherwise INIT gets more complex and requires more gates.

INIT normally is not very complex and does not require much time to initialize L(x).

If needed, INIT can be parallelized.


EVAL Parallelization Level

EVAL requires a very large number of gates if the parallelization level is high.

EVAL can be implemented with a large number of gates or with a large number of wires.  One implementation may be better for ASICs and the other for FPGAs.

Again, EVAL should be synthesized with p=1 to determine the maximum frequency EVAL can be clocked at for the circuit technology being used. It may be possible to implement a p=1 EVAL unit with a 1-to-n output converter that will convert 1-bit values to n-bit values. As with SYN, this method will significantly reduce gate count because 1-to-n converters take very few gates.


ENCODER Parallelization Level

The same techniques described above can be applied to the ENCODER. A p=1 ENCODER should first be synthesized to determine the maximum frequency it can be clocked at for the circuit technology being used. If the p=1 ENCODER is fast enough, m-to-1 and 1-to-m converters can be added to the input and output to allow the ENCODER to input and output m-bit values.
ECC Tek and Licensee Interaction

All of the optimization techniques discussed on this web page require ECC Tek to interact with the licensee since each licensee must make numerous synthesis runs using their synthesis tools set up for the specific circuit technology they will be using to implement their FPGA or ASIC.


Fixed t or Configurable t

Encoders and Decoders can be configured for one fixed t value or can be made programmable/configurable to handle multiple t values.  t is the number of bits the decoder can correct.  The number of redundant bits in a codeword will depend on which t value is selected.


Fixed K or Configurable K

K, the number of bits in the data field, can also be fixed or programmable/configurable.


Decoder Pause Options

Decoders can be designed with or without the option of pausing the output and/or the input.


Abort Option

An abort signal can be designed into the encoder and decoder so that encoder and decoder operations can be immediately aborted, and the encoder and decoder will be returned to their reset states without using the reset signal.


Specific Things Wanted by Specific Customers

Specific customers often want specific things implemented so that the designs fit in with their own designs.  For example, signal names could be changed to match the customer's signal name methodology or signals can be added to signal certain events in the decoding process.  One customer, as an example, wanted a signal indicating that the syndrome was 0.  Different types of things have been requested by customers.


Why Licensing is Difficult

Licensing designs is difficult because potential licensees normally do not know about all the above options and often do not know precisely what they need or want.