High‐dimensional regression coefficient estimation by nuclear norm plus l1 norm penalization

We propose a new estimator of the regression coefficients for a high‐dimensional linear regression model, which is derived by replacing the sample predictor covariance matrix in the ordinary least square (OLS) estimator with a different predictor covariance matrix estimate obtained by a nuclear norm plus l1 norm penalization. We call the estimator ALgebraic Covariance Estimator‐regression (ALCE‐reg). We make a direct theoretical comparison of the expected mean square error of ALCE‐reg with OLS and RIDGE. We show in a simulation study that ALCE‐reg is particularly effective when both the dimension and the sample size are large, due to its ability to find a good compromise between the large bias of shrinkage estimators (like RIDGE and least absolute shrinkage and selection operator [LASSO]) and the large variance of estimators conditioned by the sample predictor covariance matrix (like OLS and principal orthogonal complement thresholding [POET]).

where kβk 2 2 ¼ β 0 β ¼ P p j¼1 β 2 j and LASSO estimator (Tibshirani, 1996), which is derived as where kβk 1 ¼ P p j¼1 jβ j j.As clearly highlighted by Hastie (2020), RIDGE and LASSO estimators can be rephrased as constrained optimization problems, where the constraint is kβk 2 < C k in the case of RIDGE regression and kβk 1 < C k in the case of LASSO regression, for some C k > 0.
In Hastie (2020), the link between ridge regression and the spectral decomposition of the matrix X 0 X is elegantly pointed out, while Le et al. (2020) describe the relationship between ridge regression and covariance matrix regularization.These results show that, when p ≥ n, βRIDGE may be extremely biassed, as also reported in Zou (2020).Although βLASSO tends to be slightly less biassed and a bit more variable, it is also subject to several drawbacks in high dimensions, particularly when the coefficient vector β is not element-wise sparse.It follows that, when p is large, βOLS is not feasible or extremely variable, while RIDGE and LASSO are very biassed.
In this paper, we explore the possibility to replace the sample covariance matrix of the predictors ΣX in the OLS estimator βOLS by a regularized covariance matrix estimate, obtained by solving the specific regularization problem described in Farnè and Montanari (2020).Therein, a high-dimensional covariance matrix estimator is proposed, under the assumption that the true covariance matrix of the predictors Σ X follows a low rank plus sparse structure.This assumption is very natural as it results from an approximate factor model (Chamberlain & Rothschild, 1982) imposed to the vector x.Principal orthogonal complement thresholding (POET) estimator (Fan et al., 2013) also assumes a low rank plus sparse structure for Σ X .That algebraic structure has been analysed and retrieved in exact form in Chandrasekaran et al. (2011) and in approximate form in Chandrasekaran et al. (2010).Following those proposals, in Farnè and Montanari (2020), Σ X is recast as the solution of a least squares problem penalized by the nuclear norm of the low rank component (see Fazel et al. 2001) and the l 1 norm of the sparse component of Σ X .The statistical properties of such estimator, called ALCE (ALgebraic Covariance Estimator), have been studied in Farnè and Montanari (2020).
Given these premises, it sounds appropriate to replace the matrix ΣX by ALCE estimator in βOLS and to explore the statistical properties of the resulting estimator of β.Our expectation is that the ALCE estimator of β is able to attain a convenient balance between bias and variance when p is large, thus providing a valid alternative when OLS is too unstable and RIDGE/LASSO are too biassed.
The rest of the paper is structured as follows.Section 2 explores the theoretical framework behind our proposed high-dimensional regression coefficient estimator.Section 3 describes in more detail the statistical properties of our proposed estimator.Section 4 contains a wide simulation study, where a full p-dimensional regression coefficient vector is recovered, under different dimensions and sample sizes, by means of several methods, which are thoroughly compared.Finally, Section 5 provides some concluding remarks.

| Notation
Given a p Â p symmetric positive semidefinite matrix M, we denote by λ i ðMÞ, i f1,…, pg, the eigenvalues of M in decreasing order.We recall the following norm definitions: 2. Induced by vector: , which is the maximum number of nonzeros per row-column; b.Spectral norm: kMk 2 ¼ kMk ¼ λ 1 ðMÞ.

Schatten:
a. Nuclear norm of M, here defined as the sum of the eigenvalues of M: kMk * ¼ P p i¼1 λ i ðMÞ.
We denote the rank of M as rkðMÞ and the sparsity pattern of M as sgnðMÞ, where sgnðMÞ is a p Â p matrix whose ij entry is 1 if We indicate with diagðMÞ a diagonal p Â p matrix containing only the diagonal of M, and we define the matrix The aim of this paper is to compare the performance of different estimators of the vector of linear regression coefficients in high dimensions and to test the validity of a new proposal.The covariance structure of the vector of predictors x is crucial when deciding how to replace the matrix ðX 0 XÞ À1 in βOLS with a feasible alternative when p ≥ n.From Hoerl and Kennard (1970), we know that βRIDGE ¼ ðX 0 X þ kI p Þ À1 X 0 y, where the shrinkage term kI p has the effect of reconditioning the eigenvalues of X 0 X in a way that avoids singularity and guarantees invertibility, although at the price of a large bias.Instead, the LASSO acts as a variable selector and is thus oriented to identify a restricted set of predictors from the p input ones, even when in the true model β is not element-wise sparse.
When p is large, it is very likely to have a redundant set of predictors, that is, to have predictor multicollinearity, which inevitably affects the conditioning properties of the sample covariance matrix of x.As a consequence, it is not unreasonable to postulate for the vector of predictors x an approximate factor model of the following kind: where χ is the common component of x, that is, χ ¼ Bf, with B p Â r matrix of factor loadings s.t.B 0 B ¼ I r , and f r Â 1 random vector of common factors s.t.EðfÞ ¼ 0 and EðfÞ ¼ I r , while ϵ is the vector of the so called unique factors of x, that is, a p Â 1 random vector s.t.EðϵÞ ¼ 0 and VðϵÞ ¼ S * , with S * p Â p sparse covariance matrix.From these assumptions, where L * ¼ BB 0 .In other words, (5) states that the covariance matrix of x, Σ X , admits a low rank plus sparse decomposition.
Let us analyse the OLS estimator βOLS ¼ ðX 0 XÞ À1 X 0 y ¼ ΣÀ1 X σXY , where ΣX ¼ X 0 X n is the sample covariance matrix of x and σXY is the vector of sample covariances between the response variable y and the predictors x 1 , x 2 ,…, x p .In order to obtain a computable estimator when p ≥ n, RIDGE regression replaces X 0 X by the matrix X 0 X þ kI p , k > 0, in βOLS .This plug-in process has the effect of reconditioning the eigenvalues of X 0 X, thus producing a computable and very stable estimator of β, at the price of introducing a systematic bias in the estimate, also due to the inversion of the matrix X 0 X þ kI p .For this reason, the need arises to study an alternative estimator of Σ X able to limit this inevitable estimation bias, while reconditioning ΣX , which is not positive definite when p ≥ n.
For this purpose, we propose to exploit the low rank plus sparse structure of Σ X displayed in (5).In particular, since we have assumed the covariance matrix of x to be low rank plus sparse, we can approach the estimation of Σ X by solving the following problem: where ψ and ρ are threshold parameters.Unfortunately, this approach is not feasible, because the composite penalty ψrkðLÞ þ ρkSk 0 is nonconvex, so that problem (6) is NP-hard.A possible way to overcome this drawback is by solving the following problem since it has been proved that kLk * is the tightest convex relaxation of rkðLÞ and kSk 1 is the tightest convex relaxation of kSk 0 (see Fazel, 2002).
Problem ( 7) is thus nonsmooth but convex, which means it is solvable in polynomial time.
Definition 1.A pair of symmetric matrices ðL, SÞ with L,S ℝ pÂp is an algebraically consistent estimate of the low rank plus sparse decomposition (5) for the covariance matrix Σ X if the following conditions hold: (i) the low rank estimate L is positive semidefinite with rank rkðLÞ ¼ rkðL * Þ ¼ r; (ii) the residual estimate S is positive definite with the true sparsity pattern sgnðSÞ ¼ sgnðS * Þ; Parametric consistency holds if the pair ðL, SÞ is close to ðL * , S * Þ in some norm with probability approaching 1.
Definition 2. A pair of symmetric matrices ðL, SÞ with L, S ℝ pÂp is a parametrically consistent estimate of the low rank plus sparse decomposition (5) for the covariance matrix Σ X if the norm g γ ¼ max Parametric consistency is a usual property in statistical analysis, while algebraic consistency is a typical feature of this approach.The word 'ALgebraic' in the ALCE acronym follows from the need to control the degree of transversality of the following algebraic manifolds: where LðrÞ is the variety of matrices with at most rank r and SðsÞ is the variety of (element-wise) sparse matrices with at most s nonzero elements.
The two varieties LðrÞ and SðsÞ can be disentangled if L * LðrÞ is far from being sparse and S * SðsÞ is far from being low rank.It follows the need to impose them to be close to orthogonality, which is enforced by bounding the following rank-sparsity measures: where TðL * Þ and ΩðS * Þ are the tangent spaces to LðrÞ and SðsÞ, respectively.Further, the algebraic and parametric consistency of ð LA , ŜA Þ requires to control the magnitude of the eigenvalues of L * , the sparsity pattern of S * , the smallest eigenvalue of L * and the minimum absolute nonzero element in S * with respect to ξðT ðL * ÞÞ and μðΩðS * ÞÞ.The latent random processes f and ϵ are imposed to be independent and identically distributed, with sub-Gaussian tails.We stress that the r eigenvalues of L * are imposed to scale to γ α p α , with γ α > 0 and α 1 2 ,1 À Ã , which corresponds to allowing for weak factors in (4) and that the sparsity pattern of S * is controlled by imposing kS * k 0,v ≤ γ δ p δ , with γ δ > 0 and δ 0, 1 2 Â Ã , which corresponds to limit the cumulation of residual covariances in a specific row.We refer to Farnè and Montanari (2020) for more technical details.
In this paper, we focus on the ALCE estimator of the regression coefficient (ALCE-reg), defined as βALCE ¼ ΣÀ1 A σXY .Following Farnè and Montanari (2020), we also perform the unshrinkage of estimated latent eigenvalues, as this operation improves the sample total loss as much as possible in the finite sample.Once we set rA ¼ rkð LA Þ and we define the spectral decomposition of LA as LA ¼ ÛA DA Û0 A , with ÛA p Â rA matrix such that Û0 A ÛA ¼ I rA and DA rA Â rA diagonal matrix, we can get the UNALCE (UNshrunk ALCE) estimates as follows: where ψ > 0 is any chosen eigenvalue threshold parameter.Importantly, it can be proved (Farnè & Montanari, 2020) that it holds where For each threshold pair ðψ, ρÞ, we can finally compute the overall UNALCE estimate ΣU ¼ LU þ ŜU and derive the UNALCE estimator of the regression coefficient (UNALCE-reg) as βUNALCE ¼ ΣÀ1 U σXY .

| ESTIMATION FRAMEWORK
We set the standard linear regression model where ε, the residual vector, is assumed to be distributed as MVNð0, σ 2 I n Þ and uncorrelated with the p Â 1 vector of predictors x.First, we consider the OLS coefficient estimator βOLS ¼ ΣÀ1 X σXY .We know that VARð βOLS Þ ¼ σ 2 ðX 0 XÞ À1 .We write the sum of squared errors L 2 OLS ¼ ð βOLS À βÞ 0 ð βOLS À βÞ.We know from Hoerl and Kennard (1970 Hoerl and Kennard (1970), we also get that EðL 2 , respectively.Similarly, we know that βOLS is obtained by minimizing the sum of squares ϕðβÞ ¼ ðy À XβÞ 0 ðy À XβÞ.For a generic estimator β, we thus know that When p ≥ n, however, ðX 0 XÞ À1 does not exist, so that βOLS is unfeasible.Moreover, when p is large, due to the Marcenko-Pastur law (Marčenko & Pastur, 1967), it is likely that λ p ðn ΣX Þ is really small, thus making EðL 2 OLS Þ and VðL 2 OLS Þ explode.Therefore, the need to recondition the eigenvalues of ΣX rises, in order to limit the expected sum of squared errors and its variance.For this reason, first, we construct an alternative estimator of the coefficient vector with this aim, and second, we compare its statistical properties with the OLS and the RIDGE ones.
Theorem 1. Suppose that λ p ðΣ X Þ ¼ Oð1Þ.Under all the assumptions and conditions of Theorem 1 in Farnè and Montanari (2020), there exists a positive ζ A such that for all p ℕ as n !∞, In light of Theorem 1 (proof reported in Section S1), the definition of βALCE ðψ, ρÞ is thus well-posed.In practice, the ALCE solution pair LA ðψ, ρÞ, ŜA ðψ, ρÞ is computed by the algorithm in Section S2.At this stage, we need to decide how to optimally select the thresholds ψ and ρ.
We select them under a validation set scheme, that is, by selecting the pair ðψ val , ρ val Þ ¼ argmin ψ φ,ρ ϱ ðy À X βALCE ðψ,ρÞÞ 0 ðy À X βALCE ðψ, ρÞÞ, where, φ, the vector of candidate eigenvalue thresholds, ψ, is composed by multiples of 1 p , and ϱ ¼ φ= ffiffiffi p p .It is worth stressing that here, differently from Farnè and Montanari (2020), the tuning parameters ψ and ρ are chosen in order to optimize ΣA ðψ, ρÞ taking the linear dependence between X and y into account.In the same way, we derive βUNALCE ðψ val , 12), ( 13) and ( 14).Under all the assumptions and conditions of Theorem 1 in Farnè and Montanari (2020), ΣA ðψ, ρÞ is both algebraically and parametrically consistent, in the sense of Definitions 1 and 2, respectively.Under the same conditions, A.7 in Farnè and Montanari (2020) ensures that ΣX is also parametrically consistent wrt Σ X in spectral norm.Moreover, imposing λ p ðΣ X Þ ¼ Oð1Þ, the same rate also holds for its inverse, although the strict requirement p < n is needed.
Theorem 2. Suppose that λ p ðΣ X Þ ¼ Oð1Þ and p < n.Under all the assumptions and conditions of Theorem 1 in Farnè and Montanari (2020), there exists a positive ζ X such that, for all p ℕ as n !∞, Then, it is possible to prove that, provided that Theorem 1 in Farnè and Montanari (2020) holds, ΣA ðψ, ρÞ is the estimate with the most concentrated possible eigenvalues around the true ones among the estimators Σ ¼ L þ S, under the constraints kLk * ≤ ϕ ψ and kSk 1 ≤ ϕ ρ .
Theorem 3, proved in Section S1, is a guarantee that ΣA ðψ, ρÞ presents the best possible conditioning properties under the constraints kLk * ≤ ϕ ψ and kSk 1 ≤ ϕ ρ , for all p ℕ as n !∞.
Let λpð ΣAðψ,ρÞÞ 2 , respectively.It follows that, when p ≥ n, or p is large, ALCE-reg solution provides a clear improvement over OLS, due to the maximum eigenvalue concentration property of Theorem 3. Also, recalling Corollary 5 in Farnè and Montanari (2020), we learn that UNALCE has more stringent requirements for positive definiteness compared with ALCE, such that λ p ð ΣU ðψ val , ρ val ÞÞ < λ p ð ΣA ðψ val , ρ val ÞÞ by construction.Therefore, although UNALCE is also improving considerably the explosive value of L 2 OLS , it is nonetheless expected to perform worse than ALCE, because it is systematically closer to nonpositive definiteness on average.We now formally compare the performance of βALCE to the one of βOLS and βRIDGE .Let us define ΣR ¼ X 0 X n þ k n I p .We can alternatively define RIDGE estimator as βRIDGE ¼ ΣÀ1 R σXY .Then, (4.6) in Hoerl and Kennard (1970) shows that EðL is the variance of βR , and γ R 2 ðkÞ ¼ k 2 β 0 ðn ΣR Þ À2 β is the squared bias of βRIDGE .In Hoerl and Kennard (1970), the authors claim that there always exists a value of k such that the overall sum of squared errors , because the expected squared bias of βALCE is γ A 2 ðψ val ,ρ val Þ ¼ 0 under the conditions of Theorem 1, we first learn that EðL 2 ALCE Þ can be much lower than EðL 2 OLS Þ when p is large, due to Theorem 3, and, second, that it will be harder to find a value of k ensuring that EðL 1 for the maximum eigenvalue concentration property of Theorem 3.
We can state the following corollary (proved in Section S1) on the error rate of βALCE .q with probability approaching 1, for all p ℕ as n !∞.
Corollary 1 provides the error rate of βALCE , which is related to the spikiness degree of the eigenvalues of L * and the sparsity degree of S * .
When α ¼ 1 and δ ¼ 0 (like in Fan et al., 2013), which corresponds to the case of pervasive latent factors and negligible residual sparsity, the rescaling term 1 p αþδ boils down to 1 p .Let us finally analyse and compare in detail the estimation errors of the three methods.We define the estimation error matrices X σ XY , respectively.First, we can write because Therefore, the comparison as k varies will also depend on the value of n.If n is not that large, it may be the case that kE A k À kE R k < 0, also because kE R k becomes larger and larger after a certain value of k, due to the increasing estimation bias (see Figure 1 in Hoerl & Kennard, 1970).It follows that, if p is large and n is not, ALCE may overcome RIDGE due to the excessive bias in the RIDGE estimate, provided that Theorem 3 holds.
The difference kE A k À kE O k will intrinsically depend on the p=n ratio.When p=n is not smaller than 1, OLS is not feasible.When p=n is slightly below 1, the expected sum of squared errors is such that ALCE is going to prevail, because they are both asymptotically unbiassed, but Moreover, a high sparsity degree in the residual covariance component S * will also certainly favour ALCE, because it leads to even better conditioned covariance matrix estimates.When p is reasonably small and n is large, instead, the situation will be drastically different, with OLS likely to prevail.
Concerning prediction error, we stress that the optimal threshold pair ðψ val , ρ val Þ is specifically chosen by minimizing ϕð βALCE ðψ, ρÞÞ in a validation set.Similarly, in practice, the penalization parameter k is chosen by minimizing ϕð βRIDGE ðkÞÞ under a cross-validation scheme.Theoretically speaking, it is thus enough to note that ϕð βALCE ðψ,ρÞÞ ¼ kXE A k, ϕð βRIDGE ðkÞÞ ¼ kXE R k, ϕð βOLS Þ ¼ kXE O k, to claim that the properties of regression coefficient estimators are directly transmitted to the predictions based on estimated coefficients.

| Data generation
In this section, we describe the simulation study carried out to explore the performance of different estimators of a high-dimensional regression coefficient vector.We set the regression model for i ¼ 1,…, n: where we draw β j $ Nð10,1Þ, j ¼ 1, …, p.The data vector x i is generated in order to have a covariance matrix Σ X respecting ( 5), which is a typical situation in a real high-dimensional setting.For this purpose, we set where S * is element-wise sparse positive definite such that trðS * Þ ¼ 1 À θ.We set The key simulation parameters are as follows: the dimension p and the sample size n; the rank r and θ, the variance proportion of Σ X explained by L * ; the number of off-diagonal nonzeros s in the sparse component S * ; the percentage of nonzeros π S * over the number of off-diagonal elements; the percentage of the (absolute) residual covariance ϱ S * ; the condition number of Σ X , cðΣ X Þ ¼ λ1ðΣX Þ λpðΣX Þ ; N ¼ 100 replicates for each setting.Table 1 describes the scenarios used to test estimation performance.We set three values of p, that is, p ¼ 100,250,500, and two values of n, that is, n ¼ 100,250.Apart from Scenario 1, which is a classical p < n scenario, all the other scenarios present a p ≥ n situation, where the OLS estimator βOLS ¼ ðX 0 XÞ À1 X 0 y does not exist, because ΣX is not positive definite.Under all scenarios, Σ X follows a low rank plus sparse decomposition of type ( 5), where the nonzero elements of S * are extremely small (ϱ S * close to 0).The proportion of residual nonzeros π s is really similar across scenarios and close to 2.5%.The condition number of Σ X increases as p increases.

| Performance metrics
For each scenario, we calculate βOLS ¼ ðX 0 XÞ À1 X 0 y, and βPOET ¼ ΣÀ1 POET σXY , where ΣPOET is derived by POET as in Fan et al. (2013), with the sparsity threshold selected by cross-validation.Then, we derive βALCE ðψ val ,ρ val Þ by the algorithm in Section S2, βUNALCE ðψ val , ρ val Þ as in ( 12), ( 13), ( 14), and we compute βRIDGEÀmin and βLASSOÀmin , which are, respectively, the RIDGE/LASSO estimate with k ¼ k min , that is, the value of k returning the minimum cross-validated mean square error of predictions.
On each replicate t ¼ 1, …, N of model ( 23), we calculate the estimates Σt ¼ Lt þ Ŝt , obtained by ALCE, UNALCE and POET and the matrix ΣR,min ¼ We derive the two following metrics: . We focus on the estimated coefficient vector β via all considered methods, that is, OLS, POET, ALCE, UNALCE, RIDGE and LASSO.Then, we measure their estimation performance as follows: Mð βÞ ¼ 1 We generate for each replicate t * ¼ 1,…, N one test observation, ðx t * , y t * Þ, from model ( 23), we calculate the prediction ŷt * ¼ β0 x t * , and we derive the prediction mean square error Finally, we obtain the following overall performance metrics:

| Simulation results
Here, we report the simulation results about coefficient estimation and prediction performance.For the results on the performance of low rank and sparse component estimates, we refer to Section S3.
We start by the analysis of Scenario 1, which is the most favourable to OLS.In Table 2, we report the error metrics relative to the overall covariance matrix estimates and the regression coefficients.We can note that OLS is by far the best method to estimate β in this case.LASSO is the second best, due to a very limited variance.Note that LASSO does not estimate any zero coefficient in this case.RIDGE is not doing so well, due to a strong bias.Then, we note that UNALCE, ALCE and POET, that is, the methods based on a low rank plus sparse assumption, work poorly in this case.This happens because, in a n > p case, the unnecessary variance introduced by estimation mechanisms involving thresholding procedures leads to too variable estimates.This is also reflected in the prediction performance.
Concerning Scenario 2, Table 3 shows that OLS cannot be computed when p ≥ n.RIDGE regression is extremely biassed.UNALCE performs better than the competitors in terms of covariance loss but worse in terms of coefficient estimates.ALCE offers the best compromise between bias and variance in coefficient estimation, apart from LASSO, which anyway reports an average percentage of zero coefficients equal to 27.75%.
Focusing on prediction performance, we note that LASSO is the best in this case, followed by RIDGE and ALCE.
Analysing the performance in Scenario 3, we can observe in In Table 5, we observe that under Scenario 4 RIDGE and LASSO are prevailing in the RMSE, but ALCE is really close and is the second best (behind RIDGE) in the prediction error.On the contrary, UNALCE and (even more) POET lie far.This occurs because a more biassed estimate of Σ X , but with a lower condition number, results to be more effective for estimating β.We stress however the extreme bias of RIDGE and that LASSO in this case produces an average of 83.8% zero coefficients.
Table 6 shows that, under Scenario 5, POET is completely out of target for coefficient estimation, while ALCE is prevailing in the RMSE against RIDGE and LASSO by a good margin, showing the best balance between bias and variance.Concerning prediction error, ALCE, RIDGE and LASSO are really close, although LASSO presents 76.23% zero coefficients on average.It is remarkable that, when comparing the median squared prediction error, ALCE is prevailing over all the competitors.This means that, when p is large, the larger variance of ALCE and UNALCE coefficients compared with RIDGE and LASSO may occasionally impact on prediction error, while preserving the goodness of systematic performance.
In the end, concerning Scenario 6, Table 7 shows that the variance of ALCE explodes, in a way that awards RIDGE and LASSO in the RMSE.
The gap with RIDGE/LASSO is particularly important in the prediction error, although we must note that 93.95% of coefficients are estimated as zero by LASSO.All in all, the ratio p=n is too large in this case to ensure the effectiveness of Theorems 1 and 3.
T A B L E 4 Scenario 3: Performance metrics on covariance matrix and regression coefficient estimates.

Corollary 1 .
Under the conditions of Theorem 1, for some positive ζ β it holds 1 p αþδ kE A k ≤ ζ β ffiffiffiffiffiffiffi log p n

1
Scenarios 1-6: key parameters.RMSE M are the average bias, standard deviation and root mean square error across all the coefficients, respectively.Two more performance metrics are then derived by just averaging over the N replicates: Loss Σ ¼ MðLoss Σ,t Þ, cð ΣÞ M ¼ Mðcð Σt ÞÞ.
Table4that ALCE is able to overcome in the RMSE even LASSO, which presents an average of 25.71% zero coefficients in the estimated β. Concerning prediction error, ALCE comes first while UNALCE comes second in this case.POET performs instead very badly, due to its excessive variability, coming from bad conditioning properties.In contrast, RIDGE covariance estimate is too regularized and therefore very biassed.Scenario 2: Performance metrics on covariance matrix and regression coefficient estimates.
T A B L E 2 Scenario 1: Performance metrics on covariance matrix and regression coefficient estimates.Abbreviations: ALCE, ALgebraic Covariance Estimator; LASSO, least absolute shrinkage and selection operator; OLS, ordinary least square; POET, principal orthogonal complement thresholding; UNALCE, UNshrunk ALCE.
T A B L E 6