Robust smoothing of one ‐ dimensional data with missing and/or outlier values

Penalized least squares (PLS) is a popular data smoothing technique. However, existing PLS smoothing algorithms behave as low ‐ pass filters (LPF), and, hence, they may introduce distortions to bandpass signals. These algorithms also have difficulties associated with the selection of the smoothing parameter and the order of the difference matrix they utilize. A new PLS smoothing algorithm is developed to overcome these limitations. In the proposed algorithm, two difference matrices and two regularisation parameters are utilized. As a result, the proposed algorithm can act as either an LPF or a bandpass filter. The particular shape of the filter is determined by the values of the regularisation parameters and the orders of the two difference matrices. The spectral characteristics of the smooth signal are utilized to derive closed ‐ form expressions for the optimum values of these four parameters. The derived algorithm is further developed to handle the case of having missing and/or outlier samples in the measured data. The proposed algorithms are completely data driven, and they do not require any special optimisation tools. In addition, the proposed algorithms are also shown to produce significantly improved results compared to some existing smoothing techniques.


| INTRODUCTION
Signal denoising/smoothing is one of the most common problems in signal processing. Measured signals (data) are usually contaminated by different kinds of noise and/or interfering signals, for example, instrumentation noise, 50/60 Hz power line interference and baseline drift. In addition, measured signals are also susceptible to missing and/or outlier samples, for example, as a result of instrument malfunction. The case of denoising smooth signals contaminated by additive white Gaussian noise (AWGN) is considered, where a smooth signal is defined as a signal that has continuous derivatives up to some order (typically ≥2) over its entire domain [1]. This problem is known in the study as the signal smoothing, to distinguish it from the signal denoising problem. See [2] for the distinction between signal denoising and signal smoothing. The possibility of having missing and/or outlier samples is also considered.
Many procedures have been developed in the literature to perform signal smoothing, each of which has its own advantages and disadvantages. The simplest procedure is applying a moving average filter to the data. However, in order to achieve a high degree of noise reduction, a large filter length N has to be utilized, which in turn could distort useful high-frequency components in the desired smooth signal [3]. Another procedure, which is a generalisation to the moving average procedure, is based on fitting each N samples of the noisy signal by a dth order polynomial, where N is odd and ≥d + 1. Using least squares fitting to calculate the coefficients of the fitting polynomial, this procedure is reduced to discrete convolution with a fixed impulse response. This procedure is known as the Savitzky-Golay (SG) filter [4]. See [3][4][5] and the references therein for more details about the SG filtering procedure. The SG filter has the advantage that it does not introduce delays to the smoothed signal, and it can handle short intervals of missing samples [6]. However, the SG filter may not be able to handle large intervals of missing data, and special care has to be applied to the computations at the boundary of the analyzed data [6]. In addition, the two parameters d and N control the shape of the frequency response of the SG filter [3], and hence they have to be carefully selected in order to achieve a desired performance.
Another smoothing approach that has wide range of applications is called the singular spectrum analysis (SSA) [7][8][9]. The SSA is nonparametric, and it makes no prior assumptions about the properties (e.g. linearity, stationarity or Gaussianity) of the analysed data. Therefore, SSA can be utilized with arbitrary statistical processes [8,9]. The SSA technique is based on converting a measured time series into a matrix called the trajectory matrix, which in turn is decomposed using singular value decomposition. The separated singular vectors are then classified into signal and noise subspaces, and the smoothed signal is constructed using the signal subspace. See [7][8][9] for more details about the SSA technique. Two difficulties are associated with the SSA approach: choosing the appropriate dimension of the trajectory matrix and identifying the signal subspace. Some approaches have been developed to overcome the second difficulty [8][9][10], while the first difficulty remains problem dependent. Some guidelines about selecting the size of the trajectory matrix were given in [8].
The empirical mode decomposition (EMD) technique [11] adaptively decomposes any time series into a set of components called intrinsic mode functions (IMFs). Noise-assisted versions called ensemble EMD (EEMD) [12], complementary EEMD (CEEMD) [13] and the CEEMD with adaptive noise (CEEMDAN) [14] have been developed to further improve the decomposition capabilities of the EMD approach. The EMD-based decomposition techniques have been utilized in wide range of applications, especially in the area of signal denoising/smoothing [11][12][13][14][15][16][17]. In the EMD-based decomposition techniques, the IMFs are extracted in the descending order of their frequency contents. Therefore, the main idea of signal smoothing using EMD is to exclude the lower order IMFs, which most probably belong to the contaminating noise. From the filtering point of view, it was reported in [18] that EMD acts essentially as a dyadic filter bank. Therefore, excluding the lower-order IMFs is equivalent to lowpass filtering, in which the bandwidth (BW) of the filter is dynamically and automatically adjusted based on the input signal. To ensure that no signal components have erroneously eliminated with the discarded IMFs (as a result of the mode mixing phenomenon associated with the EMD), one of the noise-assisted EEMD approaches has to be utilized. However, the computation cost of EEMD-based approaches is much higher than that of the basic EMD. In addition, residual noise could be introduced in the IMFs as a side effect of the utilized noise-assisted technique [16].
Another popular smoothing approach is the penalized least squares (PLS) regression, which is often credited to a publication by Whitaker [19]. In this approach, a smooth signal is obtained by minimising a cost function that balances two objectives: the data fidelity to the measured signal and the roughness of the smoothed signal. The balance between these two objectives is controlled by a parameter μ called the smoothing parameter, which has to be carefully selected. Based on the utilized measure of roughness, two different classes of algorithms can be distinguished. The first class is known as the smoothing spline [20,21], and in this class the roughness is measured using the square integral of the qth derivative of the smoothed signal. Usually, the value q = 2 is utilized, which results in the well-known smoothing approach called cubic spline. In the second class of algorithms, the roughness of the smoothed signal is measured using the qth order difference matrix [1,6,19,[20][21][22][23][24][25]. In the case of having equally spaced samples with repeated border elements, a smooth signal in this case can be efficiently calculated using the discrete cosine transform (DCT) [1,[25][26][27]. The optimum value of the smoothing parameter μ used in both classes of algorithms is usually calculated using the generalized cross validation (GCV) method [28].
As will be demonstrated in Section 2.1, the existing PLS smoothing algorithms act solely as low-pass filters (LPF) of the DCT coefficients of the measured noisy data. In addition, both the BW and the transition band △f (from the pass-band to the stop-band) of the LPF are controlled by the smoothing parameter μ and the order q of the difference matrix used with these algorithms. Therefore, the following three limitations associated with the existing PLS smoothing algorithms can be recognized. First, these algorithms usually introduce distortions to bandpass signals, that is, signals their DCT coefficients are confined in the frequency band f L ≤ f ≤ f H , where f L > 0 is the lower pass-band edge frequency. In addition, the higher the value of f L , the larger the distortion introduced to the bandpass smoothed signal. The second limitation is associated with the selection of the order q of the difference matrix. The value of q is usually selected independently from the spectral contents of the smooth signal to be restored. This may result in inappropriate values of the BW and △f of the LPF, and hence a distortion could be introduced to the smoothed signal. The third limitation is associated with the selection of the value of the smoothing parameter μ. As stated before, this is usually done using the GCV method. However, computing μ using the GCV increases the computation cost of the smoothing algorithm. In addition, the GCV could fail to find the optimum value of μ in cases where the GCV objective has shallow minimum [22], or when an inappropriate value of q was selected.
To overcome the above mentioned limitations, a new PLS smoothing algorithm is proposed. A new objective function is utilized in the proposed algorithm. The proposed objective function depends on four different parameters: two regularisation parameters and the orders of two difference matrices. By proper selection of the four parameters, the proposed algorithm can act as either an LPF, a high-pass filter (HPF) or a bandpass filter (BPF) to the DCT coefficients of the measured data. For automatic selection of the appropriate filter frequency response, closed-form expressions of the optimum values of the four parameters are derived based on the spectral characteristics of the smooth signal to be restored. Therefore, the proposed algorithm always attempts to minimize the distortion introduced to the smoothed signal by encouraging its band to be confined to the band of the embedded smooth signal. The proposed algorithm is referred to as the band-confined smoothing algorithm (BCSA).
The proposed BCSA algorithm is further developed to handle the case of having missing and/or outlier samples, and the resulting algorithm is referred to as robust BCSA (RBCSA). In applying the RBCSA algorithm, the indices of the missing samples are assumed to be known a priori, while an adaptive procedure is followed to estimate the indices of the outlier samples. Therefore, both BCSA and RBCSA are completely data driven, fast and do not require any special optimisation tools to accomplish their goals.
The remaining part of this article is organized as follows. The proposed algorithms are derived in Section 2. Section 3 presents some experiments and discussions to assess the performance of the proposed algorithms. Finally, concluding remarks are given in Section 4. In addition, the following notations are used throughout the full text. A bold upper case symbol, for example, A refers to a matrix; similarly α refers to a column vector; the kth element of α is referred to using either α k or α[k]; α[t 1 :t 2 ] is the set of elements in α from index t 1 to t 2 ; diag(α) is a diagonal matrix its main diagonal elements are the entries of α; a discrete-time signal z(t), t = 1, …, T, is represented (using vector notations) by the vector z ∈ R T , where T is the number of time points; and kzk p ¼

| Problem Formulation and Related Work
A noisy signal y(t) is modelled using the following equation: where x(t) is a smooth signal, v(t) is a zero mean AWGN and T is the number of time points (samples). It is assumed that the samples of the noisy signal y(t) are equally spaced. This holds for most applications, because many instruments sample signals at a regular grid along time. The more general case of having arbitrarily spaced observations [22] is not considered, because this case does not result in a simple form as the one derived in this section. An estimate of the smooth signal x(t) was obtained in [1,6,19,[22][23][24]26,27] by solving the following penalized least squares problem: where D q is the qth order difference matrix, and μ L is the smoothing parameter, which controls the trade-off between the fidelity to the measured data (measured by ‖y − x ‖ 2 2 ) and the smoothness of the estimated signal (measured by ‖D q x ‖ 2 2 ). The structure of D q for some values of q is presented shortly.
Equation (2) is a convex optimisation problem, and its minimisation results in the following linear system that allows the determination of the smoothed signal b x x is an estimate of the smooth signal x, and I is the T � T identity matrix. The solution vector b x can be calculated directly using b x ¼ A −1 q y as long as the data size T is small, for example, less than 1000 samples. Beyond that size, the time to calculate the inverse of A q becomes relatively large [6]. Utilising the sparsity and the symmetry of A q , more computationally efficient approaches have been proposed in [6,23]. However, these two approaches lose efficiency when an estimation of the smoothing parameter μ L is required [1].
Another approach to solve Equation (3) for the case q = 1 was suggested in [27]. It is known that the first-order difference matrix D 1 is a Toeplitz banded matrix, and its first row equals [−1 1 0 … 0]. Therefore, the matrix D ≔ D T 1 D 1 has the following structure: The author in [27] utilized the result previously reported in [29] that the matrix D is diagonalized by the DCT. Therefore, the eigendecomposition of D is given by: where U is a unitary matrix (UU T = I) its columns represent the basis of the DCT, and Λ ≔ diag(λ) is a diagonal matrix containing the eigen values of D. Moreover, the kth entry of λ is given by [29]: For the case q = 2, the authors of [1,[25][26][27] have assumed repeated boundary elements (x 0 = x 1 and x T+1 = x T ). This resulted in a second-order difference matrix D 2 ¼ −D, where D is as defined in Equation (4). Accordingly, D T 2 D 2 ¼ D 2 , and its eigendecomposition is given by D 2 ¼ U Λ 2 U T . A generalisation of this result to the qth order difference matrix (q > 2) was derived in [26] by expressing D q using D 1 and D 2 . For instance, D 4 = D 2 D 2 and D 5 = D 1 D 2 D 2 . Following this procedure, it can be shown that D T q D q ¼ D q , and hence Equation (2) becomes: which has a solution given by Equation (3), with Substituting the eigen decomposition of D q ≔ U Λ q U T into the expression of A q , and applying the MOURAD unitary property of U, it is straight forward to show that the solution of Equation (7) is given by: where Γ L = diag(γ L ) is a diagonal matrix, and its kth diagonal entry is given by: Calculating the solution vector b x of Equation (7) using Equation (8) can be efficiently performed using the following three steps: Step 1: Calculate the DCT coefficients c y (representing U T y) of the noisy input signal y.
where ⊙ denotes the Hadamard product; that is, element-by-element multiplication, and the entries of γ are as defined in Equation (9).
Step 3: Calculate the smoothed signal b x as the inverse DCT of c b x .
As can be observed from the three steps listed above, the solution to Equation (7) can be interpreted as a filtering (see Step 2) to the DCT coefficients c y of the noisy signal y(t). The shape of the filter is determined by the weighting vector γ L (the main diagonal of Γ L ), which in turn depends on the two controlling parameters q and μ L . As can be observed from Figure 1, the shape of γ L is equivalent to an LPF. The dependency of the shape of the LPF on the two parameters q and μ L can be investigated by changing the value of one of the two parameters while fixing the value of the other one. Following this approach, it was observed that (this result is not presented due to space limitation) increasing either of the two parameters has the effect of reducing the transition band △f from the pass-band to the stop-band of the LPF. On the other hand, it was also observed that, while increasing the value of q has the effect of increasing the BW of the LPF, increasing the value of μ L has the opposite effect.
Since the shape of γ L is equivalent to an LPF, regardless of the spectral contents of x(t), applying Equation (7) to smooth a bandpass signal, for example, a sinusoidal signal with frequency f o > 0, will always result in a distorted signal due to the noise components in the frequency band 0 ≤ f < f o . See Figure 1(a) for the case f o = 10 Hz. This represents one of the limitations associated with Equation (7). In addition to this limitation, existing implementations of Equation (7) suffer from the following additional two limitations. The first one is related to selecting the order q (of the difference matrix D q ) independently from the spectral contents of the smooth signal x(t). As can be observed from Figure 1(a) for the case of smoothing a sinusoidal signal with frequency f o = 10 Hz, the value q = 10 produced the best performance compared to the other two values of q = 1 and q = 2. However, for the case presented in Figure 1(b), which represents the same sinusoidal signal superimposed on a ramp signal, it is obvious that the case q = 10 produced the worst performance compared to the other two values of q.
The second limitation associated with the existing implementation of Equation (7) is associated with the computation of the smoothing parameter μ. Usually, the optimum value of μ is calculated using the GCV technique [1,22,23,26]. However, the GCV technique requires a special optimisation tool to compute the optimum value of μ. In addition, the GCV could fail to obtain the optimum value of μ in case that inappropriate value of q was utilized. This case is presented in Figure 1. In this figure, the values of μ associated with q = 1, 2 and 10 were computed using the GCV approach. However, comparing the shapes of various γ curves in Figure 1(a) and (b) associated with each value of q, it is obvious that the presence of the lowfrequency ramp signal in Figure 1(b) misleads the GCV approach. As a result, and for all values of q, large attenuations were introduced to the 10 Hz sinusoidal signal in Figure 1

MOURAD
In the remaining part of this section, new objective function is suggested to overcome the first limitation associated with smoothing bandpass signals. In addition, closed-form expressions for the optimum values of q and μ are derived to overcome the limitations associated with the existing implementation of Equation (7).

| Proposed objective function
In this subsection, a new objective function is proposed to overcome the limitation associated with Equation (7) in smoothing bandpass signals. As will be shown later, the proposed objective function will result in a weighting vector γ B that can be adjusted to either have the characteristics of an LPF (similar to γ L defined in Equation (9)) or a BPF. Recall that a BPF can be interpreted as a cascade of an LPF and a HPF with their cutoff frequencies properly selected. Therefore, for the objective function Equation (7) to result in a weighting vector γ B having the characteristics of a BPF, a modification to Equation (7) is necessary. This modification can be performed by incorporating another penalty term that has the capability of producing a weighting vector γ H having the characteristics of an HPF. This can be accomplished by observing that replacing the penalty term μ L x T D q x À � in Equation (7) by the following penalty term 1= ð μ H x T D −p x Þ and solving the resulting optimisation problem, the solution vector in this case will have the following expression b x ¼ U Γ H U T y , which is similar to Equation (8) with the exception that the diagonal entries γ H of the diagonal matrix Γ H are given by: It is obvious from Equation (10) that the weighting vector γ H has the characteristics of an HPF, and the effect of the two parameters μ H and p on the shape of γ H is similar to the effects of μ L and q on the shape of γ L .
Based on the above discussion, the following objective function is proposed to get a waiting vector γ B having the characteristics of a BPF: where μ L and μ H are two regularisation parameters, q and p are two integer numbers representing the orders of the difference matrices D q and D p , the matrix D q ≔ D T q D q , and D −p is the inverse of the matrix D p ≔ D T p D p . The objective function Equation (7) is a special case of the proposed objective function Equation (11) for the case μ H = ∞. Even though the first two penalty terms in Equation (11) can result in a weighting vector having the characteristics of a BPF, the incorporation of the third penalty term μ L μ H x T D q−p x simplifies the expression of the resulting weighting vector, and makes the calculation of the four controlling parameters (p, q, μ L and μ H ) straight forward, as will be shown later.
The minimisation of Equation (11) results in the following linear system of equations: Applying the eigendecomposition Equation (5) of the matrix D we get: where Λ ≔ diag(λ) is a diagonal matrix containing the eigenvalues of D, and the kth entry of λ is defined in Equation (6).
Since U is a unitary matrix, and the overall matrix between the two brackets in the LHS of Equation (13) is diagonal, the solution vector b x of Equation (13) can be efficiently calculated as: where Γ B ≔ diag(γ B ) is a diagonal matrix, and its kth diagonal entry is given by: where γ L [k] and γ H [k] are as defined in Equations (9) and (10), and they, respectively, represent the kth entries of the diagonal matrices Γ L ≔ diag(γ L ) and Γ H ≔ diag(γ H ). The weighting vectors γ L and γ H are presented in Figure 2. From Equation (15), it is obvious that the diagonal matrix Γ B can be expressed as Γ B = Γ L Γ H , and, hence Equation (14) can be expressed as: For the case μ H = ∞ we have Γ H = I, and, hence, Equation (16) is reduced to the low-pass filtering Equation (8). In addition, setting μ L = 0 results in Γ L = I, and b x calculated MOURAD -327 using Equation (16) in this case is a high-pass filtered version of the input signal y. Moreover, and as can be observed from Equation (15), the two parameters μ L and q control the shape of γ L , while μ H and p control the shape of γ H . Therefore, if these four parameters are selected such that γ L and γ H overlap as in Figure 2, then the weighting vector γ B ≔ γ L ⊙ γ H will have the characteristic of a BPF. In the next section, the spectral information of x(t) is utilized to derive closed-form expressions for the optimum values of q, p, μ L and μ H .

| Calculating the optimum values of q, p, μ L and μ H
In this subsection, the spectral characteristics of γ L are utilized first to find the closed-form expressions for the optimum values of q and μ L . Similar procedure will then be followed to find the closed-form expressions of p and μ H using the characteristics of γ H . The optimum values of these four parameters are derived for the objective of confining the band of the estimated signal to the band of the imbedded smooth signal, and, hence, minimising the distortion introduced to the estimated signal. Referring to Figure 2, it is assumed that the nonzero DCT coefficients of the smooth bandpass signal x(t) are confined in the frequency band T is the total number of samples, ⌈a⌉ is the smallest integer number greater than α and 0 ≪ γ p < 1. In addition, it is assumed that the upper stop-band edge frequency of the BPF is given by usually has a small value, for example, △f = 1 Hz. Furthermore, it is assumed that γ L [j s ] = γ s , where j s ¼ ⌈2T f s 2 =F s ⌉ is the index associated with f s 2 , and 0 < γ s ≪ 1. The following values γ p = 0.99 and γ s = 0.01 are utilized.
Substituting the values of j p and j s into the expression of γ L we get: where λ j p and λ j s are calculated using Equation (6) by substituting k = j p and k = j s , respectively. Rearranging Equations (17) and (18) to eliminate μ L , it is straight forward to show that: which can be written in the following form: Therefore, the optimum value of q can be calculated using the following expression: This expression can be further simplified by applying the following two constraints on the desired shape of γ L : (1) γ s ≪ 1 and (2) γ p = (1 − γ s ). In this case, Equation (19) is reduced to the following simplified expression: Substituting the expressions of λ j p and λ j s (defined in Equation (6)) into Equation (20), and using the properties of the trigonometric functions, the value of q can be expressed in terms of the design parameters △f and f p 2 as: Note that, since the quantity π△f/F s ≪ 1, the following approximations were applied in deriving the expression in Equation (21): cos(π△f/F s ) ≃ 1 and sin(π△f/F s ) ≃π△f/F s .
The optimum value of μ L can then be obtained by substituting the calculated value of q into Equation (18) (or equivalently Equation (17)), and rearranging the equation to get the following expression: A similar procedure can be followed to determine the optimum values of p and μ H using the expression of γ H . In this case, the following two constraints are applied: γ H [i p ] = γ p and γ H [i s ] = γ s , where i p and i s are respectively, the indices associated with the lower pass-band edge frequency f p 1 and the lower stop-band edge frequency f s 1 of the desired BPF. See Figure 2. The derived expressions are: Note that, following the proposed procedure, no special optimisation tools are required to compute the regularisation parameters (as in the GCV approach). In addition, the calculated values of q, p, μ L and μ H are optimum in the sense that they produce a desired response based on the spectral contents of the measured data.

| Estimating the edge frequencies f p 1 and f p 2
It was assumed in the previous subsection that the edge frequencies f p 1 and f p 2 are known a priori. However, in real practice, the values of these two frequencies are generally unknown. In this subsection, an iterative procedure based on applying a statistical test is suggested to estimate the edge frequencies f p 1 and f p 2 , or equivalently their indices i p and j p , respectively. The statistical test applied here is the F-test, which is used for testing the equality of two variances, and it can be implemented using the Matlab function 'vartest2.m'. See [30] for more details about the Ftest.
Let's start first with the procedure of estimating the value of j p . It is obvious that all the DCT coefficients c y [j p : T] are noise coefficients. Therefore, in the proposed procedure, a relatively large value 1 ≪ b j p < T is first used as an initial guess of the value of j p to ensure that all the coefficients c j ≔ c y ½ b j p : T � are noise coefficients. For all the examples presented, the value of b j p was always initiated to b j p ¼ ⌈0:8T ⌉. The value of b j p is then iteratively decreased using the iterative procedure presented by Step 4 in Table 1.
A similar procedure has been followed to estimate the lower index i p . Since, the coefficients c y [1 : i p ] are noise coefficients, an initial guess b i p ¼ 1 is first applied, and then the value of b i p is iteratively increased using the iterative procedure presented by Step 6 in Table 1. Note that, in applying this iterative procedure, the previously detected noise coefficients c j are utilized as reference noise coefficients.
In applying the proposed algorithm, it was found that the optimum value of the significance level α (used with the F-test) has to be selected inversely proportional to the input SNR. Therefore, it is suggested to select the value of α using the following formula α ¼ α 0 = d SNR, where d SNR is an estimate of the input SNR, and α 0 is a small positive number, for example, 0.01. It is suggested to calculate d SNR from the DCT coefficients c y of the measured data y(t) using the following equation:
4. while b j p ≥ a þ 2 (% updating the index b j p associated with f p 2 ) (a) Apply the F-test using the Matlab command: h = vartest2(c a , c j ,'tail','right','alpha', α); (b) Check the result of the test: if h = 1 (i.e. c a and c j are significantly different) . c a and c j are not significantly different) 8. Utilize f p 2 to calculate q and μ L using Equations (21) and (22), respectively.
10. Utilize q, μ L , p and μ H to calculate γ B using Equation (15). 11. Calculate: b where c j ¼ c y ½ b j p : T �. Note that, for better estimate of the d SNR, the value of the d SNR has to be updated each time a new estimate of b j p is obtained. The proposed BCSA is summarized in Table 1. The algorithm accepts five inputs and produces the smoothed signal b x as an output. Note that, in Step 9 in Table 1, the input signal y is considered as a bandpass signal only if the estimated value of the lower pass-band edge frequency f p 1 satisfies f p 1 > △f .

| Dealing with missing and outlier samples
In addition to the contaminating noise, measured data are also vulnerable to missing and/or outlier samples. Missing samples can be resulted, for instance, from instrument malfunction or cease of measurements during maintenance periods. Missing samples can be also introduced intentionally, for example, in the process of over-sampling a data series [6]. On the other hand, outlier samples are defined as extreme values that deviate from other sample values in the measured data. They may indicate experimental errors or variability in the measurements. Before explaining the procedure to handle missing and outlier samples, let's first define a diagonal matrix S ∈ R T�T , its diagonal entries equal one except at the indices corresponding to the missing and the outlier samples, at which the diagonal entries equal zero. The complement of S, denoted as � S, satisfies the relation S þ � S ¼ I , where I ∈ R T�T is the identity matrix. Then, to handle the case of having missing and/or outlier samples, it is suggested to encourage the optimisation problem in Equation (11) to focus only on the most reliable samples in the measured data vector y. This can be achieved by incorporating S into Equation (11) as follows: The minimsation of Equation (26) results in the following linear system: Substituting S ¼ I − � S into the LHS of Equation (27), and rearranging the resulting equation we get: Comparing Equation (28) with Equation (12) we find that, even though the LHS of both equations is identical, the solution vector b x appears on both sides of Equation (28). Therefore, Equation (28) can be solved iteratively as follows: 1) starting with an initial solution vector b x 0 ¼ 0, then the solution vector can be updated using the following equation, which has a form similar to Equation (14): where y k ¼ Sy þ � S b x k , and the parameters used for constructing Γ B,k are estimated from the DCT of y k . Note that, the entries of y k corresponding to the reliable samples are given by Sy, and, hence, they equal the corresponding entries in y. On the other hand, � S b x k represents the entries of y k corresponding to the missing and/or outlier samples, which are updated in each iteration by the corresponding entries in b x k . For the case of missing samples, the indices J m of the missing samples are usually known a priori. However, the indices J o of the outlier samples are generally unknown, and, hence, they have to be estimated. Starting with an initial estimate J o ¼ ½ �, and running Equation (29), it was observed that most of the outlier samples appear in the residual signal v k ≔ y − b x kþ1 . Therefore, it is suggested to estimate J o from the residual signal v k using the procedure presented in Step 2 (d-g) in Table 2, which is inspired by the procedure followed in [31]. The success of the proposed algorithm in correcting outlier samples depends crucially on the success of this iterative procedure in correctly estimating the standard deviation of v from the residual signal v k , and hence estimating the optimum value of the threshold parameter ξ k . Therefore, it is necessary to identify the breakdown point of this iterative procedure, that is, the maximum number of outlier samples beyond which the procedure fails.
Unfortunately, it is not easy to derive mathematical expression for the breakdown point. Therefore, it is suggested to follow the following Monte Carlo simulations to investigate the behaviour of the iterative procedure, and hence approximately identifying its breakdown point. In this simulation, a random noise vector v ∈ R 1000 is first generated from a Gaussian distribution with zero mean and unit variance. Then, δ% of the samples of v is replaced by random outlier values uniformly distributed in the range ±[α∼b]. The values of δ considered in this simulation increases from 0.1 to 50 in steps of 1. The value of α represents the smallest artefact value, and hence, it has to be selected ≥3σ v , where σ v = 1 is the standard deviation of the samples in v. The values of α considered in this simulation are α = 3 and α = 10, and, with each value of α, b takes one of the following values [10,15,20,40,80, 120] to investigate the impact of the largest artefact value on the performance of the iterative procedure. For each combination of δ, α and b, new artefacts are generated and their locations are randomly selected. The iterative procedure is then applied on the constructed vector to estimate the value of its standard deviation. The absolute difference between σ v and its estimate is then calculated. This procedure is repeated 2000 different times for each combination of δ, α and b, and the average value of the calculated difference is computed. The result of this simulation is shown in Figure 3.
As can be observed from Figure 3, the largest value of δ at which the iterative procedure fails (i.e. the breakdown point) equals 26.5% and 25.5% for the cases α = 3 and α = 10, respectively. However, this large value occurs only when the dynamic range of the artefact is large, that is, when b ≳ 100. In addition, there is a direct relation between the breakdown point and the value of b, and the worst case occurs when b = α = 10. In this case, the iterative procedure can correctly estimate the standard deviation of the noise as long as the percentage number of outlier samples is lower than 10%. This experiment was repeated for a wide range of values of α and b and it was found that the breakdown point is always less than 30%. In addition, for all the considered cases for which b = α ≥ 10, the breakdown point was found to be ≃10%. The breakdown point decreased to 5% when the following values b = α = 4 were selected. However, in this last case, the values of the added samples are too small to be considered as artefacts. From the above discussion, the following conclusion can be drawn. The breakdown point of the iterative procedure is approximately confined to the range 10%∼25%, where the upper value can be obtained only if the artefacts have wide range of values, and the lower value is obtained when all the artefact values = ±α, where α ≥ 10.
The overall algorithm, which is referred to as RBCSA, is summarised in Table 2. As shown in Table 2, the inputs to the RBCSA algorithm are identical to those of the BCSA, in addition to the indices J m of the missing samples. If the input vector y does not have missing samples, then the input J m is simply entered as an empty set J m ¼ ½ �. The algorithm terminates if either a maximum number of iterations K max is reached (e.g. K max = 100), or the condition in Step 2(c) is satisfied, where ϵ is a small number (e.g. ϵ = 10 −4 ). It is worth mentioning that the developed RBCSA algorithm cannot be used for outlier correction in case that the original signal itself contains high-amplitude samples, for example, the R-peaks in ECG signals. In such cases, a variant of the iterative procedure presented in Step 2(d-g) in Table 2 is needed to distinguish the high-amplitude signal values from the contaminating outlier samples. This will be considered as a topic for future research. The Matlab codes of the two algorithms BCSA and RBCSA are available upon request.

| SIMULTIONS AND RESULTS
In this section, five different examples are presented to examine the performance of the proposed algorithms. In the first four examples, simulated signals are generated and AWGN is added at specified values of the SNR. The performance of the various algorithms considered in this section is quantified using the signal-to-noise ratio improvement (SNR imp ) in dB. The SNR imp is computed using the following equation [26]: where y(t) is the measured signal, T is the number of samples, x (t) is the smooth signal and b xðtÞ is its estimate. In addition, the following parameters γ s = 0.01, △f = 1 Hz and α 0 = 0.01 are used with the proposed algorithms.
3.1 | Example 1: dependency of the performance on the lower edge frequency f p 1 As stated before, all the PLS smoothing algorithms based on Equation (2) behave as LPFs, while the proposed BCSA algorithm has the capability of switching between LPF and BPF based on the significant DCT coefficients of x(t). Therefore, for smoothing bandpass signals, it is expected that the performance of the existing PLS smoothing algorithms (e.g. the smooth algorithm developed in [1]) degrades as the value of the lower edge frequency f p 1 of x(t) increases. The effect of the value of f p 1 on the performance of the smooth algorithm [1] and the proposed BCSA is investigated in this example. In this comparison, a noisy bandpass signal y(t) is constructed by first generating a sinusoidal signal with frequency f 0 Hz, and then adding AWGN at a specified value of the SNR. The following five values of f 0 = 1, 5, 10, 15 and 20 Hz are considered, while only the following two extreme values of SNR = 0 and 15 dB are considered. The constructed noisy signal y(t) is applied to the two algorithms, and the SNR imp associated with each algorithm is then calculated using the obtained smooth signals. This procedure is repeated 200 different times for each combination of f 0 and SNR, where in each time a new AWGN is generated. The value of the SNR imp associated with each algorithm is then calculated as the average over the 200 runs. The results are shown in Figure 4. As can be observed from Figure 4, and for the two considered values of the SNR, the SNR imp associated with the proposed algorithm is almost independent on the value of f 0 , while the SNR imp associated with the smooth algorithm decreases exponentially as the value of f 0 increases. Thanks to the bandpass switching capability associated with the proposed BCSA algorithm.

| Example 2: qualitative comparison
In this example, the performance of the proposed BCSA algorithm is qualitatively compared with that of the smooth algorithm [1], which belongs to the family of PLS smoothing algorithms. Three different signals are utilised in this comparison. The duration of each signal equals 2 s, while the signals are sampled at 256 Hz. Therefore, each signal consists of T = 512 samples. The first signal consists of a ramp signal in addition to two sinusoidal signals having frequencies 2 and 15 Hz. Due to the presence of the ramp signal, the DCT coefficients of this composite signal start from the zero frequency, and, hence, it represents a low-pass signal. The second signal is a frequency modulated signal, while the third signal consists of three sinusoidal signals having frequencies 15, 17 and 20 Hz. None of these two signals has DCT coefficients near the zero frequency, and, hence, they can be considered as bandpass signals. For each signal, AWGN is added at 5 dB SNR, and the noisy signal is applied to both algorithms. The results are presented in Figure 5.
The middle column in Figure 5 presents the DCT coefficients of the three signals, as well as the weighting vectors γ L and γ B estimated by the smooth algorithm and the BCSA algorithm, respectively. As can be observed from the three subfigures in the middle column, the nonzero entries of the weighting vector γ B estimated by the proposed BCSA algorithm are always confined to the BW of the embedded smooth signal x(t). Therefore, the amount of distortion introduced to the estimated smooth signal by the out-of-band noise components is minimised by the proposed algorithm. No such performance can be observed in the weighting vector γ L estimated by the smooth algorithm, even in the case of the low-pass signal presented in the upper subfigures. The main reason behind the limited performance of the smooth algorithm, compared to the proposed BCSA algorithm, resides in the way of selecting the parameters q and μ. In the smooth algorithm, the value of q is set equal to two, while the For this low value of q, the GCV alone cannot confine the nonzero entries of γ L to the band of x(t). As a result, and as can be observed from the right column in Figure 5, the proposed BCSA algorithm outperforms the smooth algorithm for the three signals considered in this example. The values presented on the subfigures on the right column represent the SNR imp in dB associated with each algorithm. It is evident that the proposed algorithm has estimated the imbedded smooth signals with relatively high values of the SNR imp compared to the smooth algorithm. More qualitative and quantitative comparisons between the two algorithms are presented in the next two examples.

| Example 3: comparison with other smoothing techniques
In this example, the performance of the proposed BCSA algorithm is compared with the performance of the following algorithms: smooth [1], the cubic spline CSpline (implemented using the Matlab function 'csapsGCV.m' [32]), the SSA [10], the SG filter [4] (implemented using the Matlab function 'sgolayfilt. m'), the discrete wavelet transform coefficient shrinkage (implemented using the Matlab function 'wden.m') and the complementary EEMD with adaptive noise (CEEMDAN) [14], which, for simplicity, is abbreviated as EEMD. Recall that the EEMD algorithm decomposes a signal into a set of IMFs in decreasing order of their frequency contents. Therefore, to automatically identify the IMFs related to the noise, it is suggested to calculate the autocorrelation parameter c i ≔ ∑ T t¼2 u i ðtÞu i ðt − 1Þ, where u i (t) is the normalised ith IMF. For the IMFs associated with the noise, the value of c i will be small compared to the other IMFs associated with the smooth signal. Therefore, for smoothing a signal, the first i v IMFs are first discarded, and the smoothed signal is then constructed by summing the remaining IMFs, where i v is identified as the index at which the coefficients c i have abrupt increase in its value.
In this example, a noisy signal y(t) is constructed by first generating a smooth signal x(t), and then adding AWGN at a specified value of the SNR. The duration of x(t) equals 4 s, and it was sampled at 256 Hz. Therefore, x(t) consists of T = 1024 samples. In addition, x(t) is composed of three different sinusoidal signals having frequencies 2.5, 5 and 10 Hz, and the considered values of the SNR are 5, 10, 15 and 20 dB. For each value of the SNR, the constructed noisy signal is applied to the competing algorithms, and a smooth signal is obtained from each algorithm. The SNR imp associated with each algorithm is then calculated. This procedure is repeated 200 different times for each value of the SNR, where in each time a new AWGN is generated. The SNR imp associated with each algorithm is then calculated as the average over the 200 runs. The results are shown in Figure 6(a). As can be observed from this subfigure, the proposed algorithm outperforms all the competing algorithms for all the considered values of the SNR. On the other hand, the average convergence time of the seven algorithms considered in this comparison is presented in Figure 6(b). As can be observed from this subfigure, the competing algorithms can be classified in terms of the average convergence time into two classes. The first class contains the two algorithms (CEEMDAN and the cubic spline) that have much higher execution time compared to the remaining algorithms, which belong to the second class. In addition, it can be also observed from Figure 6(b) that both the BCSA and the SSA algorithms have the highest average convergence time among all the algorithms in the second class. However, the average convergence time of these two algorithms is about 10 ms, which is much smaller than the signal duration. Therefore, all algorithms in the second class can be adapted to be used in real-time applications.

| Example 4: missing and outlier samples
In this example, the performance of the proposed RBCSA algorithm in smoothing signals associated with missing or outlier samples is examined. For the case of missing samples,  Table 2. The EEMD is not considered in this comparison due to its high computation cost. For the case of outlier samples, the proposed RBCSA is compared only with the robust smooth algorithm 'RSmooth' developed in [1], because only these two algorithms are capable of identifying the indices of the outlier samples. In both cases, the noisy signal y(t) is first constructed by adding AWGN at 10 dB to the smooth signal x (t) utilized in Example 3. Then, for the case of missing samples, δ% of the constructed noisy signal y(t) is excluded, where the indices of these samples are randomly selected. The values of δ considered in this example are 10, 20, …, 50%. A sample of the constructed noisy signal for the case δ = 50% is shown in the upper panel in Figure 7(a). The smoothed signals obtained by the proposed RBCSA and the RSmooth algorithms are shown in the lower panel in Figure 7(a). To quantitatively assess the performance of the various algorithms, and for each value of δ, the SNR imp associated with each algorithm is calculated as the average over 200 different runs, where in each run new AWGN is added at 10 dB and new δ% indices of the constructed noisy signal are discarded. The average SNR imp associated with all algorithms is shown in Figure 7(b).
As shown in this figure, and for all the considered values of δ, the proposed RBCSA outperforms all the competing algorithms.
For the case of outlier samples, the signals are constructed as before, with the exception that, instead of discarding δ% of the samples, these samples are replaced by random outlier values uniformly distributed in the range ±[10∼20]. The values of δ considered in this example are 5%, 10%, 15% and 20%. A sample of the constructed noisy signal for the case δ = 15% is shown in the upper panel in Figure 7(c). The smoothed signals obtained by the proposed RBCSA and the RSmooth algorithms are shown in the lower panel in Figure 7(c). To quantitatively assess the performance of the two algorithms, and for each value of δ, the SNR imp associated with each algorithm is calculated as the average over 200 different runs, where in each run new AWGN is added at 10 dB and new δ% outlier samples are generated, where the indices of the outlier samples are randomly selected. The average SNR imp associated with the two algorithms are shown in Figure 7(d). As shown in this figure, and for all the considered values of δ, the proposed RBCSA outperforms the RSmooth algorithm.

| Example 5: smoothing data of the daily infection by coronavirus disease 2019 (CoViD-19)
The proposed BCSA algorithm is now tested using the real data of the daily infection by the CoViD-19. Two different datasets are considered in this example. The datasets represent the daily infection by the coronavirus in the United States and the United Kingdom in the interval from the first of March 2020 to the end of June 2020. The data is publicly published by the European Centre for Disease Prevention and Control at [33]. The recorded data are presented by the blue waveforms in Figure 8, while the smoothed data are presented by the red waveforms. As can be observed from Figure 8, the two datasets have been accurately smoothed by the proposed BCSA algorithm. It can be also observed from the smoothed curves that the behaviour of daily infections in the two countries was almost similar (however with different scales) in the first 70 days. However, in the following 50 days, it is evident that the rate of infection in the United Kingdom has decreased significantly, while for the United States, the rate of infection has increased significantly in the last 20 days of the considered interval.

| CONCLUSION
A new smoothing algorithm was developed to overcome the limitations associated with the existing PLS smoothing algorithms based on Equation (2). A new objective function was utilised in the developed algorithm. The proposed objective function depends on four parameters: two regularisation parameters and the orders of two difference matrices. By properly selecting the values of the four parameters, the proposed algorithm can act either as an LPF or a BPF. To minimise the distortion introduced to the estimated signal, the spectral characteristics of the imbedded smooth signal were utilised to derive closed-form expressions for the optimum values of these four parameters. As a result, the distortion introduced by the proposed algorithm to the smoothed signal is always smaller than the distortion introduced by the existing PLS smoothing algorithms based on Equation (2), which act solely as LPFs. In addition, and in contrast to the GCV method used with the existing PLS algorithms, no special optimisation tools are required to estimate the optimum value of the regularisation parameters in the proposed algorithm. The derived algorithm was further developed to handle the case of having missing and/or outlier samples in the measured data. An adaptive procedure was suggested to estimate the indices of the outlier samples. Therefore, the proposed algorithms are completely data driven, and they do not require any user intervention. In addition, the proposed algorithms were also shown to produce significantly improved results compared to some existing smoothing techniques.