Enhanced BP decoding schemes of polar codes

Correspondence Si-yu Zhang, Department of Electrical and Computer Engineering, University of Windsor, Windsor, Ontario, Canada. Email: zhang1fs@uwindsor.ca Abstract In this paper, two decoding schemes for polar codes based on the belief propagation (BP) algorithm are proposed. The basic idea of the proposed schemes, called “interleaved BP” (I-BP) and “multiple-candidates BP” (M-BP), is to construct multiple candidates with different reliability values from the received signal and to decode each candidate by a BP decoder. Then, the output of the BP decoder that meets the stopping criterion or the maximum likelihood (ML) rule is chosen as the decoded data. Simulation results show that both the proposed polar decoders outperform the one based only on a single conventional BP decoder. In conjunction with each of the proposed schemes, a feedback structure is also proposed to achieve more performance gain. The proposed feedback structure takes as input the output of each BP decoder, and enhances the a posteriori information of reliable bits and flips unreliable bits. Then, the processed information is fed back into its corresponding decoder. Simulation results show that the performance gain of the proposed schemes with this feedback, compared to the ones without the feedback, may be as large as 1 dB at a frame-error rate 10−2 on frequency selective channels and 2 dB at a frame-error rate of 0.07 on doubly selective channels.


INTRODUCTION
Polar codes have been considered as a type of capacityachieving codes for binary-input discrete memoryless channels (B-DMC) [1]. Successive cancellation (SC) [1] and belief propagation (BP) [2] re two widely applied decoding algorithms for polar codes. In general, SC is less complex than other types of decoding methods [3] (e.g. BP or sphere decoding (SD) [4]). However, with the constraint of code length, SC exhibits sub-optimal error correction performance with high latency. Therefore, many researchers have attempted to improve SC by increasing its complexity [5][6][7][8][9]. Meanwhile, BP is considered as an alternative polar decoding method. Although BP still show sub-optimal performance, it exhibits higher throughput and lower latency due to its parallel structure. Nevertheless, only a few recent works [10,11], have developed modified versions of BP polar decoding to achieve better performance. In [10], to solve three common error patterns present in BP, a postprocessing approach was proposed and achieved significant performance gain. In [11], it was proposed a BP-list (BPL) decoding algorithm that outperformed SC-list (SCL). Furthermore, a noise aided BP (N-BP) proposed in [12] improved decoding performance by adding random noise. Besides these modified BP decoders, some schemes based on outer code concatenation or modified polar coding constructions were also This  proposed [13,14]. However, these methods tried to modify the structure of polar codes.
In this paper, we first propose two enhanced BP decoders to improve the error correction performance of polar codes. The interleaved-BP (I-BP) consists of a group of distinct interelavers, designed according to decoding factor graphs. The multiple candidates-BP (M-BP) yields several candidates by adding random offsets with different variances. Both I-BP and M-BP can generate several estimated codeword candidates which have different soft values. Then, the candidate that meets stopping criterion [15] and maximum likelihood (ML) rule is selected. In conjunction with the proposed BP decoders, a feedback structure is introduced. This structure enhances reliable bits and flips unreliable bits based on their log likelihood ratios (LLRs). If at the end of a round of decoding neither the stopping criterion nor the ML rule is met, the receiver feeds back the processed (enhanced or flipped) a posteriori information of the selected candidate yielded by M-BP or I-BP as the initial messages used for the next iteration. By exchanging these information iteratively, the error correction performance of the decoder can be improved at the expense of increasing complexity and latency.
The remainder of this paper is organized as follows. Section 2 introduces polar codes and the BP algorithm. In Section 3, we The factor graph of a C (8, 5) polar code present the I-BP and M-BP schemes along with the proposed feedback structure. In Section 4, simulation results and discussions are provided. Section 5 gives the conclusion and potential future works.

POLAR CODES AND BELIEF PROPAGATION DECODING
We consider a C (N , K ) polar code of length N = 2 M with K (K ≤ N ) free bits. The free bits are carried by K most reliable channels [16]. The feasible polarized channels are identified by some polar construction schemes, such as Gaussian approximation [17] and Bhattacharyya parameter [1]. The other N − K frozen bits are set to 0. It is assumed that the values and positions of the frozen bits are known by transceivers in advance. Here, the index sets of frozen and free bits are denoted as and c , respectively. Given a message vector u = [u [1], u [2], … , u [N ]] that consists of free and frozen bits, the corresponding polar codeword x = [x [1], x [2], … , x[N ]] can be obtained by the following matrix multiplication: The matrix G ⊗M , which is an N × N generator matrix, is the M th Kronecker power of G, and G called the kernel matrix is given by: BP polar decoding can be viewed as a message passing algorithm over the factor graph of polar code. For example, Figure 1 illustrates the factor graph for a C (8, 5) polar code, with u [1], u [3], u [5] as frozen bits. A BP decoder consists of MN ∕2 processing elements (PE), where M = log 2 N . Figure 2 gives The leftmost value of R t i,1 is 0 (i ∈ c ) for free bits and ∞ for frozen bits (i ∈ ). The rightmost messages L t i,M +1 is obtained according to ln , where y denotes the received signal. In each iteration, the message-passing algorithm is initiated from m = M (rightmost) and m=1 (leftmost), respectively. The update equations for a PE can be simplified considerably by the following scaled min-sum (SMS) updates [18]: The basic decoding element in a BP factor graph is termed as "processing element" (PE), which is shown in Figure 2. A BP decoder has MN ∕2 PEs, where M = log 2 N . Two different LLR-based messages, right to left L t i,m and left to right R t i,m , are transmitted in a PE node, where i (1 ≤ i ≤ N ), m (1 ≤ m ≤ M ) denote the index and layer of a node, respectively, and t denotes the t th iteration. The leftmost value of R t i,1 is 0 (i ∈ c ) for information bits and ∞ for frozen bits (i ∈ ). The rightmost messages L t i,M +1 is obtained according to ln , where y denotes the received signal. In each iteration, LLRs transmission starts from m = M + 1 (rightmost) and m = 1 (leftmost), respectively. According to [18], a min-sum BP updates LLRs in each PE iteratively based on following rules: where the operator g(a, b) is a min-sum operation, which is defined as: where denotes a constant scaling factor. The BP algorithm is run for a fixed number of iterations T or can be stopped as soon as a codeword is obtained. At the t th iteration, from the a posteriori information computed on the ALGORITHM 1 (N , K ) BP polar decoding: (û,x) = BP(y, )

Initialization ∶
Initial values for BP using the received vector y: The index set of the frozen bits ; The maximum iterations T and scale factor Similarly, at the same iteration, the hard decision of the code bit x[i] denoted byx[i] can be obtained on the right side of the graph at node (i, M + 1), aŝ ) . The performance of the BP decoder can be improved by running more iterations [20]. The latency of BP is O(logN ) [2], which is low relative to that of SC, which is O(N ). A typical BP decoding procedure is given in Algorithm 1.

ENHANCED BELIEF PROPAGATION DECODING ALGORITHMS
In this section, I-BP is first proposed. Then, M-BP is proposed in the second part. Last, in the third part of this section, the proposed feedback structure is introduced.
From the above observations, we can see that when y with a specific order is input, output ‚ u with the same order can be calculated according to the factor graph. This specific input order of y can be achieved by a group of interleavers. A block diagram of the proposed I-BP is given in Figure 3. The received sequence  y is firstly interleaved by a group of interleavers before sending to a set of BP decoders. An interleaver is a device that operates on a block of N symbols to reorder or permute them, without repeating or omitting any of the symbols in the block [21]. From a mathematical point of view, an interleaver can be considered as a bijective map that maps every vector to a permuted version of itself. Here, we define a set of J distinct interleavers {Π j } with the following mapping functions: sequence, which is defined as: by permuting symbols in the input sequence y, as: As a result, the components of the new vectors y j are obtained from the components of y as follows: for j = 1, 2, … , J and i = 1, 2, … , N . According to (8), we know that as long as j is defined, the interleaver Π j is solely determined. Essentially, the mapping of these interleavers are defined based on the relationship of nodes connected by "⊕" according to the factor graph on each stage m. Therefore, the number of available interleavers is limited to M , thus J ≤ M . We note that if random interleavers are applied, transmitted bits may cannot be recovered by utilizing the BP polar decoding graph. For instance, if y with a random order [2,1,8,7,4,3,6,5] is sent to a BP decoder, the output sequence using the decoding graph will be: [3] ⊕ y [4] ⊕ y [5] ⊕ y [6] ⊕ y [7] ⊕ y [8] y [1] ⊕ y [3] ⊕ y [5] ⊕ y [7] y [5] ⊕ y [6] ⊕ y [7] ⊕ y [8] y [5] ⊕ y [7] y [3] ⊕ y [4] ⊕ y [5] ⊕ y [6] y [3] ⊕ y [5] y [5] ⊕ y [6] y [5] which is not a valid codeword. Therefore, I-BP implements at most M distinct interleaving operations using Π j based on (8). Then, a parallel BP decoding are implemented continuously until stopping criteria are met or t = T . Subsequently, deinterleavers Π −1 j are required to obtain ‚ x j in the correct order. After finishing decoding and de-interleaving operations, J candidates can be obtained. Denote S as the indices set of J candidates. The following rule is employed to choose a candidate as the final output: which is virtually the ML rule for BPSK modulation.
In practice, the interleavers of I-BP can be obtained in advance since the decoding factor graph is pre-determined. We note that by applying the proposed interleavers, indices of frozen bits, as well as the structures of BP decoders, are not changed. However, by utilizing different interlavers, final LLRs ofû orx may be changed because of different intermediate computations in the factor graph. These different soft values may generate variant estimated ‚ x, which give us more chance to choose the correct estimate.

Multiple Candidates-BP decoding algorithm
It is known that the initial LLRs which are depended on the received vector y are critical to BP decoding. The proposed M-BP yields a group of different initial values by adding a set of random vectors d j = [d j [1], d j [2], … , d j [N ]] with 1 ≤ j ≤ J , where J denotes the maximum number of candidates. By utilizing these different initial y j , where y j = [y j [1], y j [2], … , ] can be obtained. After generating J candidates, M-BP also follows (14) to select the final output. An initial vector y j for M-BP decoding can be expressed in the following way: where d j [i] denotes a Gaussian variable with zero mean and variance 2 0 . Here, we also denote the set of candidatesx j as S A very practical problem is how to choose the optimal 2 0 , which is utilized to generate a random vector d j [i]. Parameter optimization with careful mathematical proof remains an open question. In [12], it was shown that using noise generated by random variance can improve the decoding performance of BP. In this part, instead of using random variance 2 0 [12], a fix value of 2 0 is obtained from experiments. Figure 4 shows that the value of 0 is associated with the value of SNR. It is indicated that the impact of 2 0 is small at low SNRs. Hence, in the simulation, we choose 2 0 = 0.2 based on the result when SNR is 2.5 dB. The block diagram indicating the proposed M-BP is given in Figure 5. According to the block diagram, the received signal y is first scrambled by a random vector d j . Then, J parallel BP decoding are implemented. Then, M-BP selects candidates that satisfies stopping criteria and output the most probable one based on (14). We note that unlike I-BP, the available number of candidates for M-BP is not limited by M , which provides flexi-

The M-BP and I-BP with feedback receiving structure
From the previous section, we know that the proposed BP schemes generate multiple candidates, which have different LLRs. In these decoding procedures, some decoded bits are unchanged even they are affected by different permutations or offsets, revealing that these bits are reliable. In contrast, some decoded bits have low reliability (measured by LLRs) or oscillate easily during the decoding procedure, suggesting that these bits are unreliable and may be potentially incorrect. Inspired by these behaviours, in this section, a feedback structure is proposed, which can be cascaded to the proposed BP schemes. The feedback structure freezes reliable bits and flips unreliable bits. Subsequently, these processed bits are fed back to decoders as the leftmost priori information R and the a posterirori information L on the rightmost side, to enhance the decoding performance. These information are exchanged iteratively until the output sequence meets stopping criteria or the iteration of the feedback structure iter reaches its maximum I . Figure 6 shows a proposed BP (M-BP or I-BP) decoder cascaded with the feedback structure. For all J candidates obtained from the first trial, the ith bit is considered as reliable if all |LLR out i | of J candidates are larger than a threshold , where denotes a constant larger than 0. The calculation of |LLR out i | is given by: Then, the candidate x̃j and the corresponding ỹj are picked by candidate selector according to: The indices of unreliable bits are stored in a set id based on their |LLR out i |. We note that for the proposed feedback structure, after the first failed trail, instead of using all J initial y j from candidate generator as the inputs of next trails, ỹj at current iteration iter is used as the inputs to all J BP decoders. Then, like what has been done to |LLR out i |, similar calculations are also conducted on the information of the input sequence ỹj , which is written as |LLR in i | and calculated by: If the |LLR in i | is larger than a threshold , this bit is trustworthy, and the corresponding |LLR in i | should be enhanced in the next iteration.
After finding reliable and unreliable bits that need to be processed, R are updated based on set id and threshold . If the corresponding bits are reliable, they are considered as preknown. However, if these bits belong to set id , the corresponding messages are flipped by freezing them to opposite values. We note that in order to take advantages of J parallel BP decoders, each decoder flips one distinct bit in set id and enhance all bits that are considered as reliable. The update of message R t i,1 is written as: where LLR max denotes the maximum positive value we can define in a decoding procedure, which effectively biasesû[i] to 0 or 1. Then, the message L t i,M +1 in J BP decoders are also required to be updated based on the value of |LLR in i |. LLRs of reliable bits can be considered as pre-known and then updated by: (20) This feedback structure aims to choose the ‚ x that meets stopping criteria and (14). Here, we define the distance between the received y and the estimated vector ‚ x j at iteration iter as: We know that the ‚ x obtained in current iteration iter may have larger distance than that in the previous iteration. Therefore, if the current D( ‚ x j ,iter ) is larger than D( ‚ x j ,iter−1 ), all updated information need to be dropped, and the former information in iteration iter − 1 should be used in the decoding trial in iter + 1. Otherwise, the updated LLRs obtained in iteration iter must be used. In such ways, more than one incorrect bits may be corrected by the feedback decoding structure. Finally, if the estimated codeword ‚ x meets stopping criteria or the iteration iter ALGORITHM 2 Decoding process utilizing the proposed feedback structure Initialization: x tep = 0, set I = ∅; Obtain ‚ x j , ‚ u j , L, and R from a M-BP decoder using y j ; Calculate LLR out i according to Equation (16) and LLR in i according to Equation (18); Sort LLR in i in descending order to build a set id ; Findj according to Equation (17);

else
Obtain ‚ x j , ‚ u j from a M-BP decoder using ỹj , update R according to Equation (19), and L according to Equation (20); Findj according to Equation (17); Decoding finish, and output the ‚ x that meets Equation (14); Update R according to Equation (19), and L according to Equation (20); end if end while meets its maximum I , the decoding should stop and the estimated codeword must be output. The decoding process of the proposed feedback structure with M-BP is given in Algorithm.2. The proposed feedback structure can be regarded as an extended SCF decoding [22] in BP field. However, unlike SCF that only flips unreliable bits, our proposed feedback structure also enhance reliable bits. Then, since BP is an iterative decoding, the proposed scheme not only updates the initial LLRs on the rightmost side, but also the information on the leftmost side. In addition, unlike SCF that locates unreliable bits based on the result in only one decoding trial, the proposed method selects potential reliable and unreliable bits according to the results given by J parallel BP decoders. Moreover, the proposed feedback structure implements J distinct bits flip simultaneously, which achieves lower latency.

Advantages and benefits
Similar to the conventional BP, the first benefit of I-BP and M-BP is their lower latency compared with SC based polar decoders. It has been suggested that the BP based decoding enjoys advantages of high throughput and low latency for its parallel hardware implementations [23]. The latency of I-BP and M-BP decoding methods is O(log 2 N ). In contrast, SC based decoding requires higher latency, which can be scaled with O(N ). The second benefit of the proposed schemes refer to their simplicity. I-BP and M-BP decoders do not require any modification on BP decoder itself, which is different from some other modified BP decoders [11]. The only additional step takes place on the input sequence, which can be permuted in advance. The third benefit of the proposed methods refers to their flexibility. On the one hand, to save hardware resources, M-BP or I-BP can be implemented serially using unchanged decoding structures. On the other hand, to improve the efficiency, the proposed methods can be employed in a parallel way. Furthermore, BP based decoders yield soft outputs, which can be applied in joint detection and decoding that cannot be achieved by hard decision based decoders. We note that the proposed decoding schemes also pay the cost like high hardware complexity due to the parallel structure, whereas this issue can be solved by changing the proposed schemes to sequential structures. Also, the feedback structure increases latency. Nevertheless, simulation results suggest that compared with the gain from adding feedback structures, the latency increase is tolerable, especially in moderate and high SNR scenarios.

SIMULATION RESULTS
In this section, some simulation results are presented and evaluated. The proposed BP decoding schemes are compared from several aspects. First, the FER performance between the two proposed BP and the conventional BP are compared. Then, the FER performance of the proposed BP cascading the feedback structure (denoted as "F-BP") are provided and compared with the proposed BP without such structure. Also, to show the advantage of the feedback structure, F-BP schemes are performed and evaluated in different channels (e.g. fading and AWGN channels). Last, the proposed BP schemes are compared with existing modified BP decoders, including BP-List [11] (denoted as "BPL"), noise aided BP [12] ("N-BP"), and SCL [24]. In our simulations, coding rate R is 0.5. Perfect synchronization is assumed, and the maximum iteration (T ) is 50 for N = 512 polar codes and 40 for N = 256 polar codes. Moreover, in the proposed methods, fixed threshold = 5 and = 3 are used in AWGN and selective channels, respectively. Figure 7 and Figure 8 show the FER performance among I-BP, M-BP and the conventional BP with N = 256 and N = 512, respectively. According to the simulation results, we know that although all proposed decoders can improve the FER performance compared with the conventional BP, I-BP exhibits better FER performance than that of M-BP with identical number of candidates. Additionally, increasing the amount of interleavers or random vectors d can benefit the FER performance. Further, compared with the conventional BP, performance gains become more remarkable at high SNRs. Note that the number of interleavers in I-BP is limited to M , which may not be sufficient for the requirement of some cases. Nevertheless, by combing with M-BP, more candidates can be gen- Then, in Figure 9, we investigate the FER performance between the proposed BP schemes with and without the feedback structure in AWGN channels. It is illustrated that the feedback structure is capable to further improve the FER performance. By increasing the maximum iteration I , the performance can be enhanced at the cost of increasing latency. Moreover, the enhancement with I becomes remarkable in moderate or high SNR scenarios. Conclusively, the feedback structure helps the proposed BP schemes improve their decoding performance at the cost of increasing latency. Nevertheless, Figure 10 suggests that in moderate and high SNR scenarios, BP with the feedback structure exhibit a comparable latency to conventional BP. Further, Figure 11 and Table 2 reveal that when I is 7, in FIGURE 9 FER performance between BP decoders with and without the feedback structure FIGURE 10 Average iterations iter among different BP decoders with the feedback structure most cases, BP with the proposed structure can obtain ‚ x correctly within three iterations. For example, when Eb∕N 0 is 1.5 dB, more than 70% decoding procedures can be successfully finished within three iterations. This behaviour becomes more remarkable when SNRs are high.
To comprehensively investigate the proposed BP schemes, the FER performance of the feedback structures cascading I-BP and M-BP (denoted as "F-IBP" and "F-MBP") are compared in Figure 12, using J = 4 in frequency selective channels combing with OFDM. Figure 12 suggests that like what happened in AWGN channels, the proposed BP with feedback structure also exhibits better FER performance than that of the BP without such structure. Besides, compared with M-BP, F-MBP obtains nearly 1 dB gains at probability 1 × 10 −2 . Hence, the feedback structure exhibits stronger error correction ability in frequency selective channels. Intuitively, we can also expect that increasing I can improve the FER performance though this benefit will vanish gradually with the increment of I . In addition, Figure 13 provides FER performances of the proposed BP decoding algorithms combined with OFDM systems in doubly selective channels (DSC) [25]. We note that a least square (LS) channel estimator is employed for  simplicity [26]. The LS estimation cannot acquire the perfect channel side information (CSI), which is suggested in Figure 14. The mean square error (MSE) is used to measure the performance of channel estimators. The corresponding cost function is defined as: where ‚ H denotes the estimate CSI in frequency domain, Y, and X is transmitted and received sequence in frequency domain, respectively. It is suggested that the MSE of LS channel estimator increases when the condition of channels is degraded, revealing that this channel estimation is not perfect. It is therefore speculated that although all BP schemes undergo performance degradations in DSCs, those BP decoders using the feedback receiving structure are capable of achieving bet- ter FER performance than other modified BP decoders. The feedback receiver can gain nearly 2 dB at probability 0.07, which is noticeable than that in frequency selective channels.
Subsequently, we investigate the sensitivity of the proposed BP with noisy channel estimation. It is known that a feedback system might be instable in some extreme cases. Here, the noisy channel estimation is defined as: ‚ H = H + e, where H denotes the CSI in frequency domain, e is the estimation error, which is a vector of complex-valued, independent Gaussian distributed random variable with zero mean, 2 h variance [27]. Figure 15 shows that the noisy estimation adversely affects the FER performance. However, this figure also reveals that when 2 h is lower than 0.2, the proposed F-BP decoding can achieve relatively stable error correction performance. Even 2 h rises gradually, the FER is degraded smoothly, suggesting the stableness of our proposed structure. Last, the proposed BP decoders as well as their feedback cascaded versions are compared with some existing polar decoding schemes in AWGN channels. Figure 16 suggests that with the same number of candidates, the proposed BP decoding schemes exhibit superior FER performance than that of existing modified BP decoding schemes. For instance, when Eb∕N 0=3.0 dB, F-BP can achieve almost 0.4 dB gain than that of the conventional BP. Then, compared with N-BP which uses random noise with variance 0 , the proposed M-BP can still achieve better performance. Moreover, as a reference, the FER of a SCL decoder with list size 4 is also provided. It is illustrated that the proposed BP schemes achieve better performance than that of the SCL decoding except that in the case with Eb∕N 0=3.5 dB.

CONCLUSION
In this paper, two enhanced BP decoding algorithms for polar codes are proposed. These decoding schemes employ specific designed interleavers or offsets to effectively enhance the FER performance without changing the structure of BP decoders. Moreover, a feedback structure is proposed based on the proposed enhanced BP decoders. This structure iteratively exchanges the information obtained during the BP decoding. The proposed BP schemes keep most advantages of the conventional BP at the cost of increasing complexity. The only modification is the way to generate permuted input sequences for BP decoders, which can be implemented in advance. Moreover, M-BP and I-BP decoding schemes can be implemented in both serial and parallel ways, thereby conducting effective tradeoff between cost and efficiency. Last, with the proposed feedback structure, at the cost of increasing complexity and latency, the FER performance of BP decoders can be further enhanced, and this gain is more noticeable in fading channels. However, how to optimize the parameter 0 , threshold , and decrease the latency of the proposed BP decoding with feedback structure require further studies.