Asymptotic Characterisation of Regularised Zero-Forcing Receiver for Imperfect and Correlated Massive MIMO Systems with Optimal Power Allocation

In this paper, we present asymptotic high dimensional analysis of the regularised zero-forcing (RZF) receiver in terms of its mean squared error (MSE) and bit error rate (BER) when used for the recovery of binary phase shift keying (BPSK) modulated signals in a massive multiple-input multiple-output (MIMO) communication system. We assume that the channel matrix is spatially correlated and not perfectly known. We use the linear minimum mean squared error (LMMSE) method to estimate the channel matrix. The asymptotic approximations of the MSE and BER enable us to solve various practical optimisation problems. Under MSE/BER minimisation, we derive 1) the optimal regularisation factor for RZF; 2) the optimal power allocation scheme. Numerical simulations show a close match to the derived asymptotic results even for a few dozens of the problem dimensions.


I. INTRODUCTION
S INCE the early works of [1], [2], massive multiple-input multiple-output (MIMO) research has been thriving. The idea of massive MIMO is to use a very large number of antennas at the base station which offers the desired spatial multiplexing and can reduce the transmitted power [2]. Therefore, it has been considered a promising vital technology to achieve the high spectral/energy efficiencies and high data rates required by the fifth generation (5G) and next wireless communication generations [3].
Channel state information (CSI) plays an important role in attaining the significant benefits of massive MIMO systems, and accurately recovering the transmitted symbols [3]. It is well known that perfect knowledge of CSI is an ideal scenario that is impossible to obtain. However, in practice, only imperfect or partial CSI can be acquired through a process called channel estimation or training. Training refers to the process of sending a known sequence of pilot symbols which can be directly incorporated in the process of estimating the CSI. After this step, the receiver employs the estimated CSI to detect the corresponding transmitted data symbols.
The overall system performance can be improved by optimising the power allocation between the transmitted pilot and The author is with the Department of Electrical Engineering, College of Engineering, University of Ha'il, P.O. Box 2440, Ha'il, 81441, Saudi Arabia (e-mail: am.alrashdi@uoh.edu.sa). data symbols. Power optimisation problems in MIMO systems have been proposed based on different performance metrics. In [4], [5], the authors derived a power allocation scheme based on minimising the mean squared error (MSE), while minimising the the bit error rate (BER) and symbol error rate (SER) was considered in [6]- [8]. Training optimisation based on maximising the channel capacity was addressed in [9]- [11]. In addition, the authors in [12]- [14] provided power allocation strategies based on maximising the sum rates. Training optimisation problems are considered in a wide range of systems including traditional MIMO systems [11], singlecell massive MIMO systems [15] and multi-cell multi-user MIMO networks [13], [16], [17]. The list of above references is not inclusive, since power allocation optimisation research has very rich literature. However, we cited the most related works to this paper.
The power allocation in the aforementioned papers was investigated essentially for uncorrelated channel models. However, in practice, wireless communication systems, including massive MIMO systems, are generally spatially correlated [18]. The power allocation optimisation problem was developed for correlated channels to maximise the sum rates [19], [20], or the spectral efficiency [15], [21]. To the best of our knowledge, power optimisation problems based on MSE or BER minimisation that involve spatial correlation models in massive MIMO systems are largely unexplored.
In this paper, we propose the use of the regularised zeroforcing (RZF) as a low complexity receiver for a spatially correlated massive MIMO system. We derive novel sharp asymptotic approximations of its MSE and BER performance using binary phase shift keying (BPSK) signaling for simplicity. Then, these approximations are used to derive an optimal power allocation scheme between pilot and data symbols. The main technical tool used in our analysis is the recently developed convex Gaussian min-max Theorem (cGMT) [22], [23]. The cGMT framework has been used to analyse the error performance of various regression and classification problems under independent and identically distributed (i.i.d.) assumption on the entries of the channel matrix [23]- [30]. For correlated channel matrices, the cGMT was recently used in [31], [32] to characterise the performance of the Boxrelaxation and the LASSO detectors, respectively. However, these references assume the ideal case of perfect knowledge of the CSI which is impossible to obtain in practice, while this work deals with the more difficult and common in practice scenario of imperfect CSI.

A. Organisation
The remainder of this paper is structured as follows. Section II describes the system model and the considered RZF receiver. The main asymptotic analysis results are presented in Section III. Section IV presents the numerical simulations used to verify the high accuracy of our results. In addition, Section V illustrates the optimal power allocation scheme derived in this paper. The paper is then concluded in Section VI. Finally, the approach of the proof of the main results is given in Appendix A.

B. Notations
Bold face lower case letters (e.g., x) represent a column vector while x i is its i th entry and x represents its 2 -norm. Matrices are denoted by upper case letters such as X, with I n being the n × n identity matrix, while 0 m×n is the all-zeros matrix of size m × n. The (i, j) entry of matrix X is denoted as [X] ij . tr(·), (·) T , and (·) −1 are the trace, transpose and inverse operators, respectively. X 1/2 represents the square root of matrix X such that X = X 1/2 X T /2 .We use the standard notations E[·], and P[·] to denote the expectation of a random variable, and probability of an event, respectively. The notation q ∼ N (0, R q ) is used to denote that the random vector q is normally distributed with 0 mean and covariance matrix R q = E[qq T ], where 0 represent the all-zeros vector. We write " P −→ " to denote convergence in probability as n → ∞. The notation x e −u 2 /2 du is the Q-function associated with the standard normal density.

II. SYSTEM MODEL AND SIGNAL DETECTION
We consider a flat block-fading massive MIMO system with n transmitters (Tx) and m receivers (Rx). The transmission consists of T symbols that occur in a time interval within which the channel is assumed to be constant. A number T t pilot symbols (for channel estimation) occupy the first part of the transmission interval with power, ρ t . The remaining part is reserved for transmitting T d = T − T t data symbols with power, ρ d . It implies from conservation of time and energy that: where ρ is the expected average power. Alternatively, we have ρ d T d = αρT , where α ∈ (0, 1) is the ratio of the power allocated to the data, then ρ t T t = (1 − α)ρT is the energy of the pilots. Fig. 1 illustrates the considered system model. The received signal model for the data transmission phase is given by where the following model-assumptions hold, except if otherwise stated:  • The MIMO channel matrix is given by [19] This matrix model is referred to as the receive-correlated Kronecker model [33]. This implies that A has n i.i.d. columns, each with zero mean and covariance matrix R. • H ∈ R m×n is a random matrix which has i.i.d. standard Gaussian entries (with zero mean and unit variance). • R ∈ R m×m is a positive semi-definite Hermitian matrix, satisfying 1 1 m tr(R) = O(1). It captures the spatial correlation between the receive antennas and hence termed the receive-correlation matrix. • w ∈ R m is the noise vector with i.i.d. standard Gaussian entries, i.e., w ∼ N (0, I m ). • x 0 ∈ R n is the signal to be recovered, which is assumed to be a binary phase shift keying (BPSK) signal, i.e., x 0 ∈ {±1} n .

A. Channel Matrix Estimation
In this paper, we consider the linear minimum mean squared error (LMMSE) estimate A of the channel matrix A, which is given by [34] where Y t = ρt n AX t + W t ∈ R m×Tt is the received signal corresponding to the training phase, X t ∈ R n×Tt is the matrix of transmitted orthogonal pilot symbols with T t ≥ n, and W t ∈ R m×Tt is an additive white Gaussian noise (AWGN) matrix with E[W t W T t ] = T t I m . According to [34], [35], the k th column (for all k ≤ n) of A is distributed as N (0, R A ) with a covariance matrix R A that is given by Note that the pilots energy T t ρ t controls the quality of the estimation. In fact, as T t ρ t → ∞, A → A which corresponds to the perfect CSI case. By invoking the orthogonality principle of the LMMSE estimator, it can be shown that the k th column of the estimation error matrix ∆ := A − A follows the distribution N (0, R ∆ ) with the following covariance matrix [34]: From the orthogonality principle of the LMMSE as well, one can show that A and ∆ are uncorrelated, but both of them follow a Gaussian distribution, hence they are statistically independent. 2

B. Signal Detection: RZF Receiver
In this work, we consider the regularised zero-forcing (RZF) receiver that solves the following optimisation where λ ≥ 0 is the regularisation factor. For this RZF receiver, x admits the following closed-form solution: with A := ρ d n A. For this receiver, the detection is performed as follows where sign(·) is the sign function which operates element-wise on vector inputs.

C. Figures of Merit
To evaluate the performance of the RZF receiver, we consider the following performance metrics: 1) Mean Squared Error (MSE): This measures the performance of the estimation step of the receiver (the first step in (6)) and is defined as: 2) Bit Error Rate (BER): This metric is used to evaluate the performance of the second step of the receiver, i.e., the detection step in (8). It is defined as where 1 {·} is the indicator function.
In relation to the BER is the probability of error, P e , which is defined as the expected value of the BER averaged over the noise, the channel and the constellation. Formally, III. MAIN RESULTS In this section, we provide our main results on the asymptotic characterisation of the RZF receiver in terms of its MSE and BER.

A. Technical Assumptions
First, we need to state some technical assumptions that are required for our analytical analysis. Assumption (1): We assume that the problem dimensions m and n are growing large to infinity with a fixed ratio, i.e., for some fixed constant ζ > 0.
Assumption (2): We assume that the normalised coherence time, normalised number of pilot symbols and normalised number data symbols are fixed and given as respectively.
Note that under Assumption (2), the covariance matrix of A becomes and the time/energy conservation equation in (1) becomes Finally, define the spectral decomposition of R A as where U ∈ R m×m is an orthonormal matrix, and Γ ∈ R m×m is a diagonal matrix with the eigenvalues of R A on its main diagonal.

B. RZF Receiver Performance Characterisation
In this subsection, we precisely characterise the high dimensional performance of the RZF receiver. We begin by stating the MSE analysis as given in the next theorem.
Theorem 1 (MSE of RZF). Let x be a minimiser of the RZF problem in (6) for some fixed but unknown BPSK signal x 0 , then for any fixed λ > 0, ζ > 0, and under Assumptions (1) and (2), it holds that where ν * is the unique solution to the following scalar minimax optimisation problem: and γ j is the j th eigenvalue of R A .
Proof. The proof of this theorem is given in Appendix A.
Remark 1. From the first order optimality conditions, i.e., the solutions (ν * , µ * ) can be easily found as: and µ * is the solution to the following fixed-point equation: Remark 2. For R = I, (i.e., no correlation, γ i = 1∀i,) and perfect CSI (∆ = 0 m×n ), we recover the well-known MSE formula of the Zero-Forcing (ZF) receiver (i.e., when λ = 0): Note that the MSE result of Theorem 1 holds for x 0 drawn from any distribution with zero mean and unit variance and not necessarily from a BPSK constellation. However, for BPSK signals, the BER of the RZF receiver is given in the next Theorem.
Theorem 2 (BER of RZF). For ν > 0, and µ > 0, define Then, under the same settings of Theorem 1, it holds that Proof. The proof is relegated to Appendix A.

Remark 3 (Probability of error
Corollary 1 (Optimal regulariser). The optimal regularisation factor that minimises the MSE or BER is given as Proof. Note that the MSE expression depends on λ through ν * only. Hence, the above result can be proven by taking the derivative of ν * with respect to λ. In addition, λ * turned out to be optimal in the BER sense as well. This can be shown by taking the derivative of (21) with respect to λ.

Remark 4.
Under perfect CSI (∆ = 0 m×n ), Corollary 1 simplifies to the well-known formula: λ * = 1 ρ d , independent of the correlation matrix R, which was previously shown in [19], [36] for other optimality metrics such as maximising the sum rate or SINR, etc. Here, our optimality metrics are MSE and BER which were not considered for the correlated channel model before. As mentioned in [36], for large n, the RZF receiver is equivalent to the MMSE. For uncorrelated channels (R = I m , and R ∆ = σ 2 ∆ I m ), it was proven in [7] that which is consistent with (23) for R = I m .
In Fig. 2, we use the exponential correlation model for R which is defined as [37] R(r) = r |i−j| 2 i,j=1,2,··· ,m , r ∈ [0, 1), to show the effect of increasing the correlation on the optimal regularisation factor and compare it with the perfect CSI case, i.e., λ * = 1 ρ d . As we can see, for the imperfect CSI scenario, more regularisation is needed due to the channel estimation errors. Furthermore, we observe that as r increases, less regularisation is needed.

IV. NUMERICAL RESULTS
To validate our theoretical predictions of the MSE and BER as given by Theorem 1 and Theorem 2, we consider the exponential model given earlier in (25) . Fig. 3 shows the MSE/BER curves v.s. the regularisation factor λ. For the Monte-Carlo (MC) simulations, we used ζ = 1.5, n = 400, r = 0.4, α = 0.5, T = 1000, T t = n, and ρ = 10 dB, and the data are averaged over 500 independent Monte-Carlo trials. We can see that from both figures, there is an optimal value of the regulariser λ * that minimises the MSE/BER. This optimal value is the same for MSE or BER as we can see from the figures.
In addition, we plotted in Fig. 4, and Fig. 5 the MSE/BER performance of the RZF receiver versus the total average power ρ and for different correlation coefficient r values. We used the same parameters values as in the previous experiment. These figures again show the great match between our analytical expressions and the MC simulations.
Finally, in Fig. 6, we compare the BER performance of the RZF receiver to the conventional zero-forcing (ZF) receiver (i.e., λ = 0) that is widely used in wireless communications literature. From this figure, it can be seen that the RZF receiver clearly outperforms the ZF.

V. POWER ALLOCATION OPTIMISATION
In this section, we will use the previous asymptotic approximations of the MSE and BER to find the optimum power allocation between pilot and data symbols to asymptotically minimise the MSE or BER. For fixed τ t and τ , the power allocation optimisation problem can be caste as It can be shown that the above optimisation problem boils down to only optimising the data power ratio α, i.e., where MSE(λ * ) is the asymptotic MSE expression in (14) while using the optimal value of the regulariser λ * there. Similarly, we have where where BER(λ * ) is the asymptotic BER expression in (21), but with optimal λ * . However, based on (21), since minimising the Q-function amounts to maximising its argument, we have For this RZF receiver, finding α MSE * or α BER * in a closed form seems to be a difficult task, but by using a bisection method we can numerically find the optimal power allocation as shown in Fig. 7 for different values of the correlation coefficient r. In [7], for the uncorrelated channel R = I m , it has been shown thatᾱ MSE * =ᾱ BER * =ᾱ * , whereᾱ * has the following closedform expression (see [7, eq. (36)]): where ϑ = 1+ρτ Fig. 7, we can see that even for the correlated case, we still have that α MSE * = α BER * which indicates that optimising the MSE is equivalent to optimising the BER asymptotically. Furthermore, from this figure we can see thatᾱ * is a quite good approximation of α * for r ∈ [0, 0.9]. This suggests that we can use the optimalᾱ * from the uncorrelated channel model for the correlated channel case with negligible effect on the performance. Similar observations were found in [20].

VI. CONCLUSION
This work sharply characterises the asymptotic behaviour of the RZF receiver under the presence of correlation and uncertainties (in the form of estimation errors) in the channel matrix. Particularly, we derived asymptotic expressions of the MSE and BER of the RZF. We then considered a concrete application of our theoretical results to a BPSK modulated massive MIMO wireless communication system, and optimise its performance by optimally allocating power between pilot and data symbols. The results also enabled us to set the regularisation factor in an optimal way which was shown to further improves the performance. Numerical results showed great agreement to the derived theoretical expressions even when the dimensions are not very large.
Possible future extensions of this work include: studying more involved modulation schemes (such as PAM, QAM, and PSK), and analysing advanced receivers such as the RZF with a box-constraint. Another interesting future work is to consider the performance of double-sided correlated massive MIMO systems and study their optimal power allocation.

ACKNOWLEDGMENT
The work of Ayed M. Alrashdi is supported by the University of Ha'il, Saudi Arabia.

APPENDIX A APPROACH OF THE PROOF
In this section, we prove the main results of the RZF receiver. We first introduce the main tool used in the analysis, i.e., the cGMT.

A. cGMT Framework
The proof is based on the cGMT framework [22]. Here, we recall the statement of the theorem, and we refer the reader to [22], [23] for the complete technical details. Consider the following two min-max problems, which we refer to, respectively, as the Primal Optimisation (PO) and Auxiliary Optimisation (AO): where C ∈ Rm ×ñ , g 1 ∈ Rm, g 2 ∈ Rñ, K a ⊂ Rñ, K b ⊂ Rm and ξ : Rñ × Rm → R. Moreover, the function ξ is assumed to be independent of the matrix C. Denote by a Ψ := a Ψ (C), and a ψ := a ψ (g 1 , g 2 ) any optimal minimisers of (30a) and (30b), respectively. Further let K a , K b be convex and compact sets, ξ(a, b) is convex-concave continuous on K a × K b , and C, g 1 and g 2 all have i.i.d. standard normal entries. Then, the cGMT framework relates the optimiser a Ψ of the PO with the optimal value of the AO as summarised in the following theorem.
Theorem 3 (cGMT, [22]). Let K be any arbitrary open subset of K a , and K c = K a \K. Denote ψ (n) K c (g 1 , g 2 ) the optimal cost of the optimisation in (30b), when the minimisation over a is constrained over a ∈ K c . Suppose that there exist constants β and δ > 0 such that in the limit asñ → +∞, it holds with probability approaching one: (i) ψ (n) (g 1 , g 2 ) ≤ β + δ, and, After introducing the cGMT, we are now in a position to outline the proof of Theorem 1 and Theorem 2. The steps of the proof are given in the next subsections.

B. Deriving the Minimax Optimisation
For convenience, we consider the error vector e := x − x 0 , then the problem in (6) can be reformulated as (32) Without loss of generality, we assume that Then, Next, we note that A can be written as A = R

1/2
A B, with B being a Gaussian matrix with i.i.d. standard entries (0-mean and unit-variance) and R A is the covariance matrix of A as defined before. Thus, we have Since the Gaussian distribution is invariant under orthogonal transformations, and recalling that the spectral decomposition of R A is R A = UΓU T , we have e = arg min e ρ d n with abuse of notation for B. 3 The loss function can be expressed in its dual form through the Fenchel-Legendre conjugate as Then, (36) becomes One technical requirement of the cGMT is the compactness of the feasibility sets. This can be handled according to the approach in [22,Appendix A], by introducing sufficiently large artificial constraint sets K e = {e ∈ R n : e 2 ≤ C e }, and K u = { u ∈ R m : u 2 ≤ C u } for some sufficiently large constants (independent of n) C e , C u > 0, which will 3 We reused B to denote another standard Gaussian matrix.
not asymptotically affect the optimisation problem. Then, we obtain The above optimisation problem is now in the desired minmax form of a PO problem of the cGMT. However, we still have correlated entries in the bi-linear term and we have to transform them to a term that involves a standard Gaussian matrix with i.i.d. entries (as required by the cGMT statement).
To do so, redefine Then, after properly normalising Φ (n) by 1 n , it becomes The above optimisation is in a PO form, and its corresponding AO is where g ∈ R m and s ∈ R n are independent random vectors with i.i.d. standard normal entries each. Fixing the norm of the normalised error vector, e √ n , to η := e √ n , and definingē := e e yields Defining u := u √ n gives: where K u is defined in a similar fashion to K u . The optimisation overē can be easily found as follows with a minimiser: Also, note that 1 n x 0 2 P −→ 1. Thus, by applying Lemma 10 in [22], we have where The square root in the last term of the above equation can be written in a variational form using the following identity with Θ = 4λ 2 ρ 2 d + ρ d u 2 . Hence, φ (n) becomes Next, for convenience, let and then, The optimisation over u is straightforward: Then, the AO writes The above optimisation is now over scalar variables only, namely η and χ which is easier to analyse. We will refer to (55) as the Scalar Optimisation Problem (SOP) and study its asymptotic behaviour next.
C. Analysis of the Asymptotic Behaviour of the SOP First, note that g ∼ N (0, R g ), where 4 Then, using tools from random matrix theory (RMT) such as the Trace Lemma [38], we have Therefore, again, using [22,Lemma 10] Defining µ := χ η , we get Finally, note that η appears everywhere in φ (n) as η 2 and η > 0, so we can use the change of variable ν := η 2 to have

D. Exact Asymptotics of RZF via the cGMT
We are now in a position to study the asymptotic behaviour of the RZF receiver.
MSE Analysis: Let e be the optimal solution to the AO defined as the solution to (44). Let ν * be the optimal solution to (60). For any > 0, define the set: Denoteη as a minimiser of (59). By definition,η = e n , or using the change of variables that we introduced,ν = e 2 n . We have shown in the previous section that φ (n) − φ (n) P −→ 0, and since φ (n) in (60) has a unique minimiser ν * , then, applying Lemma 10 in [22]:ν − ν * P −→ 0, which implies that This proves that e ∈ K with probability approaching 1. Then, applying the cGMT yields that e ∈ K with probability approaching 1 as well. This ends the proof of Theorem 1. BER Analysis: For the BER analysis, we will change the set K in (61) to the set given in (63).
Recall that the optimal solution of the AO in (46) is given as: Also, remember that u * = 1 √ n T −1 g, then, u * 2 = 1 n g T T −2 g T . Then, using the Trace Lemma, we have Or, define S γ (ν, µ) := 1 n tr R g T −2 (66) then, Using the fact thatν − ν * P −→ 0 andμ − µ * P −→ 0, then for all i = 1, 2, · · · , n, we have Hence, using the above expression of e, we have from which we can easily get Therefore, e ∈ K with probability approaching one. Note that the indicator function 1 {ẽi≤−1} is not Lipschitz, so we cannot directly apply the cGMT. However, as discussed in [23,Lemma A.4], this function can be appropriately approximated with Lipschitz functions. Therefore, we can conclude by applying the cGMT that e ∈ K with probability approaching one, which completes the proof of Theorem 2.