Logarithmic time encoding and decoding of integer error control codes

One of the most important characteristics of all error control codes (ECCs) is the complexity of the encoding/decoding algorithms. Today, there are many ECCs that can correct multiple bit errors, but at the price of high encoding/decoding complexity. Among the rare exceptions are integer ECCs (IECCs), whose serial encoding/decoding algorithms run in O(n) time, where n is the codeword length. In this article, we show that IECCs can be encoded/decoded even faster, that is, that their parallel encoding/decoding algorithms have O(log2n) time complexity.

In this article, we will show that reliable communication can be achieved in a much simpler way if integer ECCs (IECCs) are used.These codes use integer arithmetic, which brings with it a number of advantages, such as the possibility of efficient implementation on general purpose processors (GPPs).In one of the previous papers, we showed that IECCs can be serially encoded/decoded in linear time. 14In this article, we will show that IECCs can be encoded/decoded even faster, that is, that their parallel encoding/decoding algorithms run in logarithmic time.We believe this fact will make them very attractive for potential use in future communication and memory systems.
The organization of this article is as follows: Section 2 deals with the basic concepts of IECCs.The parallel encoding/decoding algorithms for this family of codes are described and evaluated in Sections 3 and 4, while Section 5 concludes the article.

IECCS: CONSTRUCTION AND ERROR CONTROL
In Reference 15 it was pointed out that IECCs share many common features with checksum codes. 16One of them is that the codeword consists of k data bytes and one check-byte (Figure 1).In the case of IECCs, the check-byte is computed as the sum of the products of the integer values of the data bytes and the coefficients C i .However, the syndrome S of the received codeword is calculated as Reference 16, that is, as the difference in value between the newly calculated and the received check-byte.Both these facts are summarized in the following definitions.
be the integer representation of a b-bit byte, where a n ∈ {0, 1} and 1≤ i ≤ k.Then, the code C (b, k, c), defined as ) is the coefficient vector and B k+1 ∈ Z 2 b −1 is an integer.
2 b −1 be the transmitted codeword, the received codeword and the error vector, respectively.Then, the syndrome S of the received codeword is defined as From (2) it is easy to see that the nonzero value of S indicates the presence of one or more errors within t b-bit bytes (1 ≤ t < k + 1).The decoder will be able to correct these errors if the corresponding IECC is constructed through the following steps.
1. Defining the error type that the code should correct.In essence, we need to define the values of t and e i .For instance, if we want to construct a class of codes that can correct single errors within one b-bit byte, the values of t and e i will be equal to t = 1 and e i = ± 2 r , where 0 ≤ r ≤ b−1.On the other hand, if we want to construct a class of codes capable of correcting single errors within two b-bit bytes, the values of t and e i will be equal to t = 2 and e i = ± 2 r , where 0 ≤ r ≤ b−1 (Table 1).
Codes from Reference 18 Codes from Reference 17 F I G U R E 2 Bit-width of one ST entry for general IECCs 2. Defining the set of correctable syndromes.In the general case, this set is defined as where ⋮ 3. Finding the coefficients C i .For each value of b ≥ 2 it is necessary to perform a computer search to find the coefficients C i .Although the number of coefficients increases with increasing b, the upper theoretical limit (k max ), in the general case, cannot be determined (the value of k max depends on the class of IECCs) (Table 1).Regardless of that fact, the values of the coefficients C i must be such that where |X| denotes the cardinality of X.

Selecting the code parameters and generating the syndrome table.
The number of the coefficients found determines the number of b-bit bytes that can be protected.By choosing whether to use all coefficients or not, we determine the size of the codeword as well as the size of the syndrome table (ST).The ST always has || entries and is generated based on the values of t, b, k, e i , and C i .The purpose of each entry is to describe the relationship between the nonzero syndrome, error locations and error values (Figure 2).
From the above steps, it is clear that the IECC construction process is independent of the encoding/decoding process.However, for the sake of completeness it is needed to point out that the communication between endpoints starts only when the ST is generated and stored in local memories.In that case, for each incoming codeword, the decoder will calculate the syndrome S. If its value is equal to zero (S = 0), the decoder will assume that the codeword is error-free.However, if the value of S is nonzero (S ≠ 0), the decoder will lookup the ST in order to find the entry with the first b bits as that of the syndrome S. If such an entry exists, the decoder will perform (in parallel) the operations: ⋮ Otherwise, it will declare an uncorrectable error.

PARALLEL ENCODING AND DECODING OF IECCS
In Reference 14 it was shown that the serial encoding/decoding algorithms for IECCs have linear time complexity.However, the data can also be processed in parallel.The motivation for such an approach lies in the concept of parallel addition of p integers.In particular, if a binary tree structure is used, the addition of p integers can be performed in O(log 2 p) time 1 (Figure 3).Using this fact, we can state the following theorems.
Theorem 1.Any (kb + b, kb) IECC can be encoded in parallel in O(log 2 n) time.
Proof.Let us analyze the expression (1).The first thing we notice is that the check-byte is computed as the sum of k products.Each of these products is calculated independently (Figure 4A), which means that the encoder must perform b⋅log 2 b bit operations 20 in order to calculate the product N i = C i ⋅B i , where i = 1, 2, … , k.After that, the encoding procedure reduces to modular addition of k integers using a binary tree with ⌈log 2 k⌉ levels.This means that the check-byte B k+1 will be computed after ⌈log 2 k⌉ additions, where each addition takes b bit operations.Given this and the fact that the codeword has n = (k + 1)⋅b bits, from the expression it is clear that any IECC can be encoded in parallel in logarithmic time.▫ Theorem 2. Any (kb + b,kb) IECC can be decoded in parallel in O(log 2 n) time.
Proof.The decoding process for all IECCs consists of three steps: calculating the syndrome S, looking up the ST and correcting the errors.From (2) we see that performing the first step requires only one operation more than the encoding process.However, if we parallelize all the calculations (Figure 4B), we easily come to the conclusion that the syndrome S will be computed after b⋅log 2 b + b⋅ ⌈log 2 (k + 1)⌉ binary operations.If the value of S is nonzero, the decoder will lookup the ST to get the error correction data.Since the ST can be presorted in ascending order (according to the values of S), it is possible to use a binary search algorithm. 1In that case, the number of table lookups (TLs) will not be greater than ⌊log 2 ||⌋ + 2 14 where each TL takes b bit operations (the comparison of two b-bit integers).If we add to this the fact that the last step (error correction) requires b bit operations (t integer additions in parallel) and that the value of || is never greater than 2 b −2, we get the inequality from which it is clear that any IECC can be decoded in parallel in logarithmic time.▫

EVALUATION
In the previous section, we have seen that the complexity of encoding/decoding of IECCs does not depend on the code's strength.This, however, is not the case with standard ECCs.An obvious example are LDPC codes, whose performance depends both on the code type and the decoding algorithm used.This is the reason why it is often stated that algorithms for decoding weaker LDPC codes run in O(n) time, 5 while those used for decoding stronger LDPC codes have O(n⋅log 2 n) complexity. 6,7On the other hand, it is known that all LDPC codes can be encoded in O(n) time. 4As for Polar codes, they can be encoded and decoded in O(n⋅log 2 n) and O(L⋅n⋅log 2 n) time, respectively, whereby the decoder performance increases with the list size L. 8,9 Unlike LDPC and Polar codes, the encoding/decoding complexity of RS codes grows with the number of check bytes.In particular, if the number of check bytes r is even, RS codes can be encoded and decoded in O(n⋅log 2 r) and O(n⋅log 2 r + r⋅log 2 2 r) time, respectively. 10The fourth and most complex ECCs are Turbo codes.According to References 11,12, these codes can be encoded and decoded in O(n⋅m) and O(n⋅2 m ) time, respectively, where m + 1 is the constraint length of the convolutional codes (Table 2).
In addition to having high encoding/decoding complexity, the mentioned codes are very slow when implemented in software.The reason for this lies in the fact that they use finite field (FF) arithmetic, which is entirely different from the integer and floating point (FP) arithmetic of GPPs.Since the emulation of FF operations requires a large number of instructions 21 (thus slowing down the performance of the processor), some researchers decided to use extremely powerful GPPs and/or graphical processing units (GPUs).However, even this very expensive approach has not proven to be applicable [22][23][24][25] in future communication networks (Table 3).Unlike FF-based codes, IECCs are perfectly suited for implementation on 64-bit processors.This feature is not only related to the fact that GPPs have four integer units (IUs) per core, but also that each IU operates independently of the other ones (Figure 5). 26This means that the proposed encoding/decoding algorithms can be fully implemented if the total number of IUs is not less than k + 1.In that case, the encoder (GPP) would take N IM + ⌈log 2 k⌉ ⋅ N IA clock cycles to generate the check byte B k+1 , where N IM and N IA denote the number of clock cycles needed to perform one integer multiplication and one integer addition, respectively.Starting from the fact that the equalities N IM = 3 and N IA = 1 apply to all GPPs, 26 we easily come to the conclusion that the encoder can process

All IECCs
In a similar way it can be shown that the decoder processes where N ST denotes the number of clock cycles that the decoder needs to access the ST (this table must be stored in the local GPP's memory).
If we analyze the above expressions, we will notice that the encoding speed increases with increasing clock speed and/or codeword length.On the other hand, the decoding speed depends on four parameters, of which N ST plays a dominant role (Table 4).This fact points to the conclusion that the ST should always be stored in the L1/L2 cache.If this is not feasible at the start, the size of the ST should be reduced by shortening the codeword length.(1920, 1856) 29 2 12 3.0⋅10

CONCLUSION
In this article, we have proposed algorithms for parallel encoding/decoding of IECCs.We have shown that the proposed algorithms have logarithmic time complexity and are perfectly suited for implementation on MPs.Both of these features can be used not only to improve the performance of existing codes, but also to construct new ones that would have the potential to be used in future communication and memory systems.

F I G U R E 3
Illustration of the binary tree addition algorithm F I G U R E 4 Illustration of the parallel algorithm for (A) encoding and (B) syndrome computing

F I G U R E 5
Block diagram of an eight-core GPP processing a dataword (codeword) TA B L E 4 Theoretical encoding/decoding throughputs for some 64-bit IECCs implemented on eight-core GPPs

The codeword structure for IECCs TA B L E 1
The main characteristics of several classes of IECCs 26pical number of clock cycles that a processor needs to access the L1/L2/L3 cache.26 a