SEARCH

SEARCH BY CITATION

Keywords:

  • Asymptotic theory;
  • lasso;
  • regularized regression;
  • variable selection and estimation;
  • MSC 2010: Primary 62J05;
  • secondary 62E20

Abstract

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. NOTATION AND DEFINITIONS
  5. 3. PARALLELISM AND UNIQUENESS
  6. 4. LARGE SAMPLE ASYMPTOTICS FOR THE DANTZIG SELECTOR
  7. 5. DISCUSSION
  8. Acknowledgements
  9. BIBLIOGRAPHY
  10. APPENDIX

The Dantzig selector (Candès & Tao, 2007) is a popular equation image-regularization method for variable selection and estimation in linear regression. We present a very weak geometric condition on the observed predictors which is related to parallelism and, when satisfied, ensures the uniqueness of Dantzig selector estimators. The condition holds with probability 1, if the predictors are drawn from a continuous distribution. We discuss the necessity of this condition for uniqueness and also provide a closely related condition which ensures the uniqueness of lasso estimators (Tibshirani, 1996). Large sample asymptotics for the Dantzig selector, that is, almost sure convergence and the asymptotic distribution, follow directly from our uniqueness results and a continuity argument. The limiting distribution of the Dantzig selector is generally non-normal. Though our asymptotic results require that the number of predictors is fixed (similar to Knight & Fu, 2000), our uniqueness results are valid for an arbitrary number of predictors and observations. The Canadian Journal of Statistics 41: 23–35; 2013 © 2012 Statistical Society of Canada

Le sélecteur de Dantzig [Candès et Tao, 2007] est une méthode de régularisation equation image pour le sélection de variables et l'estimation en régression linéaire. Nous présentons une condition géométrique très faible sur les prédicteurs observés qui est reliée au parallélisme, et, lorsque vérifiée, elle nous assure l'unicité des estimateurs du sélecteur de Dantzig. Si les prédicteurs proviennent d'une distribution continue, cette condition est vérifiée avec probabilité 1. Nous discutons de la nécessité de cette condition pour l'unicité et nous donnons aussi une condition étroitement reliée qui nous assure de l'unicité des estimateurs lasso [Tibshirani, 1996]. Les propriétés asymptotiques pour le sélecteur de Dantzig, c'est-à-dire la convergence presque sure et la distribution asymptotique, découlent directement de nos résultats d'unicité et d'un argument de continuité. La distribution limite du sélecteur de Dantzig n'est pas habituellement normale. Quoique nos résultats asymptotiques demandent que le nombre de prédicteurs soit fixe (similaire à [Knight et Fu, 2000]), nos résultats d'unicité sont valides pour un nombre arbitraire de prédicteurs et d'observations. La revue canadienne de statistique 41: 23–35; 2013 © 2012 Société statistique du Canada


1. INTRODUCTION

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. NOTATION AND DEFINITIONS
  5. 3. PARALLELISM AND UNIQUENESS
  6. 4. LARGE SAMPLE ASYMPTOTICS FOR THE DANTZIG SELECTOR
  7. 5. DISCUSSION
  8. Acknowledgements
  9. BIBLIOGRAPHY
  10. APPENDIX

Regularized regression methods for variable selection and estimation have become an important tool for statisticians and have been the subject of intense statistical research during the past 15 years (Bickel & Li, 2006; Fan & Lv, 2010; Tibshirani, 2011). These methods provide a tractable approach to the analysis of high-dimensional datasets and are especially useful when the underlying signal is sparse. In this paper, we address some gaps in the literature, which pertain to uniqueness and large sample asymptotic theory for the Dantzig selector (Candès & Tao, 2007), a popular equation image-regularized regression method that is closely related to lasso (Tibshirani, 1996).

First, we develop an intuitive geometric condition related to parallelism which ensures that the Dantzig selector has a unique solution and demonstrate that this condition holds in an overwhelming majority of instances (with probability 1, if the predictors follow an absolutely continuous distribution with respect to the Lebesgue measure). We also give a related necessary condition for the uniqueness of Dantzig selector solutions. These results originally appeared in the first author's Ph.D. thesis (Dicker, 2010) and, to our knowledge, are the first uniqueness results about the Dantzig selector to be found in the literature. In fact, our uniqueness condition for the Dantzig selector is easily translated into a similar prevalent condition which implies that lasso has a unique solution.

Aside from their independent interest, the uniqueness results presented here pave the way for a simple derivation of the almost sure limit and the asymptotic distribution of Dantzig selector estimators, when the number of predictors, p, is fixed (on the other hand, we emphasize that our uniqueness results are valid for arbitrary p). These asymptotic results are analogous to those found in Knight & Fu (2000) for the lasso and further highlight similarities between the two methods, which have been discussed by multiple authors (Meinshausen, Rocha, & Yu, 2007; James, Radchenko, & Lv, 2009). In fact, in comparison with Knight & Fu's (2000) results, uniqueness appears to be the major hurdle to obtaining large sample asymptotics for the Dantzig selector. The Dantzig selector is a convex—but not strictly convex—optimization problem. Thus, unique solutions are not guaranteed in general. However, once uniqueness is understood, asymptotic results for the Dantzig selector follow directly from continuity arguments. More specifically, we show that under the given uniqueness conditions the Dantzig selector may be viewed as a well-defined continuous mapping; asymptotic results then follow from the continuous mapping theorem. By contrast, for the lasso, uniqueness is assured in classical fixed p asymptotic analyses because the associated optimization problem is strictly convex (provided the predictors are non-degenerate). The foregoing discussion highlights the potential usefulness of uniqueness results for the Dantzig selector. More broadly, understanding uniqueness makes certain powerful tools—like the continuous mapping theorem—readily available for further analysis of the Dantzig selector.

Though much of the recent interest in regularized regression methods is spurred by applications that may perhaps be best approximated by an asymptotic regime where equation image, we believe that it remains important to understand classical large sample asymptotics, where p is fixed and equation image, in order to obtain a more complete understanding of these procedures. This paper helps shed light on this issue. Moreover, we believe that our uniqueness results, which are valid for all p, may be useful for formulating and deriving asymptotic results for regularized regression methods in settings where equation image; however, this is a topic for future research and is beyond the scope of this paper (though it is briefly addressed again in our concluding Section 5).

The rest of this paper proceeds as follows. In Section 2 we introduce the notation and definitions. In Section 3 we discuss uniqueness. Propositions 1 and 2 are the main results in Section 3 and summarize important uniqueness properties of the Dantzig selector and lasso vis-à-vis parallelism. In Section 4, we show that the Dantzig selector may be viewed as a continuous mapping from the space of predictors and associated outcomes to the space of parameter estimates (Proposition 3). Corollaries 1 and 2 give the almost-sure limit of Dantzig selector estimators and their asymptotic distribution, respectively. Section 5 contains a brief concluding discussion. The proofs may be found in the Appendix at the end of the paper.

2. NOTATION AND DEFINITIONS

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. NOTATION AND DEFINITIONS
  5. 3. PARALLELISM AND UNIQUENESS
  6. 4. LARGE SAMPLE ASYMPTOTICS FOR THE DANTZIG SELECTOR
  7. 5. DISCUSSION
  8. Acknowledgements
  9. BIBLIOGRAPHY
  10. APPENDIX

Consider the linear model

  • equation image(1)

where equation image and equation image are observed outcomes and predictors, respectively, equation image are unobserved iid integrable random variables with mean equation image, and equation image is an unknown parameter to be estimated. To simplify the notation, let equation image denote the n-dimensional vector of outcomes and equation image denote the equation image matrix of predictors. Also let equation image. Then (1) may be re-expressed as

  • equation image

It will be useful to have a concise method for referring to sub-vectors and sub-matrices of various vectors and matrices. For a vector equation image and a subset equation image, let equation image. Furthermore, for equation image matrices equation image let equation image denote the equation image matrix obtained from X by extracting columns corresponding to elements of A. If equation image is a equation image matrix, and equation image has cardinality equation image, let equation image denote the equation image matrix obtained from C by extracting rows corresponding to elements of A and columns corresponding to elements of B. For equation image, let equation image denote the j-th column of X. Finally, let equation image denote the null-space of the matrix C and let equation image denote the dimension of the vector space V.

The main object of study in this paper is the Dantzig selector—a linear programming problem for obtaining estimates of equation image, which is defined as follows:

  • equation image(2)

where equation image is a tuning parameter, equation image denotes the equation image-norm and equation image denotes the equation image-norm. Solutions to (2), denoted equation image, will be referred to as Dantzig selector estimators.

We also introduce the lasso optimization problem and estimator at this time:

  • equation image(3)

where equation image is the squared equation image-norm. Though the lasso is not our primary concern in this paper, we will sometimes find it instructive to compare aspects of the Dantzig selector and lasso side-by-side. For instance, as discussed in the Introduction, notice that if X has rank p, then lasso is a strictly convex optimization problem, which ensures that equation image is unique. On the other hand, the Dantzig selector (2) is a linear programming problem and the uniqueness properties are less clear, even when X has rank p.

In order to provide some additional context for the present study, we point out that one of the key features of both the Dantzig selector and lasso is that they perform simultaneous variable selection and estimation. By this we mean that equation image and equation image are often non-empty (contrast this with the ordinary least squares estimator for equation image). This implies that equation image and equation image often have reduced dimension (i.e., only a few non-zero entries) and can greatly enhance interpretability, along with estimation accuracy (Tibshirani, 1996; Candès & Tao, 2007; Bickel, Ritov, & Tsybakov, 2009).

3. PARALLELISM AND UNIQUENESS

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. NOTATION AND DEFINITIONS
  5. 3. PARALLELISM AND UNIQUENESS
  6. 4. LARGE SAMPLE ASYMPTOTICS FOR THE DANTZIG SELECTOR
  7. 5. DISCUSSION
  8. Acknowledgements
  9. BIBLIOGRAPHY
  10. APPENDIX

Parallelism plays a large role in the discussion of the uniqueness of Dantzig selector solutions. Roughly speaking, the Dantzig selector has a unique solution if the feasible set,

  • equation image

is not parallel to the equation image-ball. Below, we describe parallelism as a geometric concept which is relevant to the Dantzig selector and then give a more formal definition.

First note that the feasible set F is polyhedral (it is the intersection of finitely many hyperplanes). Solutions of the Dantzig selector are points equation image of minimal equation image-norm. Let equation image be the closed unit equation image-ball centered at the origin. Geometrically, we can find solutions to the Dantzig selector by “growing” equation image, equation image, until it intersects F; the points of intersection are Dantzig selector solutions. More precisely, let equation image. The collection of all Dantzig selector solutions is equation image. When equation image, the 1-dimensional faces of equation image have slope 1 or equation image; the Dantzig selector has multiple solutions only if a 1-dimensional face of F has slope 1 or equation image, that is, only if F is parallel to the equation image-ball, equation image.

As indicated by the situation when equation image, if the Dantzig selector has multiple solutions, then F is parallel to equation image (Figure 1). When equation image, the notion of parallelism which is correct for our purposes is less straightforward. Geometric intuition suggests that parallelism is invariant under translation and scalar multiplication, in the sense that F is parallel to equation image if and only if equation image is parallel to equation image for equation image and equation image. In particular, multiplying X by a (non-zero) scalar and adding vectors equation image to equation image does not affect parallelism. This leads to a definition of parallelism between F and equation image which depends only on the matrix equation image. In fact, in our view, the primitive concept is parallelism between a equation image symmetric matrix C and the equation image-ball.

Figure 1. An instance of the Dantzig selector with multiple solutions. The region F is the feasible set for the Dantzig selector and equation image. The bold line represents the intersection of equation image with F and is the solution set for this instance of the Dantzig selector.

Download figure to PowerPoint

thumbnail image

Definition 1.

  • (a)
    Let C be a equation image symmetric matrix. The matrix C is parallel to the equation image-ball if and only if the condition [Par] (found below) holds. [Par] There exist subsets equation image and a vector equation image such that equation image, equation image, and equation image.
  • (b)
    The feasible set for the Dantzig selector, F, is parallel to the equation image-ball if and only if equation image is parallel to the equation image-ball.

Remarks

  • (i)
    Parallelism, as defined here, is related to degenerate sub-matrices of C, which, in the context of the Dantzig selector, correspond to the nontrivial faces of F. In [Par], the requirement that equation image is related to the fact that the faces of the equation image-ball, equation image, have normal vectors equation image, where equation image for some equation image.
  • (ii)
    When equation image, it is easy to see that F is parallel to the equation image-ball if and only if one of the columns of equation image is a scalar multiple of some point in equation image. This occurs if and only if a one-dimensional face of F has slope 1 or equation image, as depicted in Figure 1.

As discussed above, parallelism is invariant under translation and scalar multiplication. On the other hand, translation and scalar multiplication of the feasible set F gives rise to various instances of the Dantzig selector, some with a unique solution and some, perhaps, with multiple solutions. This suggests that any sufficient condition for the existence of multiple Dantzig selector solutions must, unlike parallelism, involve equation image and λ. To illustrate this concept, suppose that equation image is invertible and is parallel to the equation image-ball. Figure 2a depicts equation image, which is equal to the feasible set for the Dantzig selector when equation image and equation image and is parallel to the equation image-ball. Figures 2b,c depict equation image and equation image, potential feasible sets for the Dantzig selector that are both obtained from equation image by scalar multiplication and translation. The feasible sets equation image and equation image are both parallel to the equation image-ball, and correspond to feasible sets for the Dantzig selector with the predictor matrix X and different values for equation image, λ (not given here). The instance of the Dantzig selector with feasible set equation image has multiple solutions, while the Dantzig selector with feasible set equation image has a unique solution.

Figure 2. (a) equation image is parallel to the equation image-ball, as evidenced by the bold face D. (b) equation image is obtained from equation image by scalar multiplication and translation; the Dantzig selector problem with feasible set equation image has multiple solutions, indicated by the bold line segment labelled equation image. (c) equation image is obtained from equation image by scalar multiplication and translation; the point labelled equation image is the unique solution to the Dantzig selector problem with feasible set equation image.

Download figure to PowerPoint

thumbnail image

The following condition combines parallelism with additional constraints and is a sufficient condition for the existence of multiple Dantzig selector solutions.

  • [Mult]
    There exist subsets equation image and vectors equation image, equation image, such that
    • 1.
      equation image, equation image, and equation image.
    • 2.
      equation image for all equation image.
    • 3.
      equation image.
    • 4.
      equation image for all equation image and equation image for all equation image.

Note that Condition 1 in [Mult] implies that F is parallel to the equation image-ball. Conditions 2–4 in [Mult] constrain the location of F in equation image relative to the origin. Proposition 1 below characterizes the uniqueness properties of the Dantzig selector in terms of [Par] and [Mult]. A related necessary condition for the existence of multiple lasso solutions is given in Proposition 1(c). Proposition 1 is proved in the Appendix at the end of this paper.

Proposition. Proposition 1.

  • (a)
    If [Mult] holds, then the Dantzig selector has multiple solutions.
  • (b)
    If F is not parallel to the equation image-ball, then the Dantzig selector has a unique solution.
  • (c)
    Suppose that equation image and that the lasso has multiple solutions (i.e., equation image contains more than a single element). Then there exists a subset equation image and a vector equation image such that equation image, equation image, and equation image.

Remarks

  • (i)
    Proposition 1 is valid for any n and p.
  • (ii)
    Proposition 1(c) may be rephrased as follows. If the lasso has multiple solutions, then equation image is parallel to the equation image-ball and, moreover, one may take equation image in the definition of parallelism.
  • (iii)
    If equation image, then the lasso has multiple solutions whenever equation image is singular.
  • (iv)
    The condition in Proposition 1(c) implies that equation image is parallel to the equation image-ball. It follows that if equation image is not parallel to the equation image-ball, then both the Dantzig selector and lasso have unique solutions. The relationship between uniqueness for the Dantzig selector and uniqueness for lasso is discussed by Meinshausen, Rocha, & Yu (2007), who give a concrete equation image-dimensional example (with pictures) where lasso has a unique solution, but the Dantzig selector does not.
  • (v)
    A condition similar to [Mult] which ensures the existence of multiple lasso solutions may be developed. This is not pursued further here.

The next proposition suggests that the Dantzig selector and lasso have a unique solution in an overwhelming majority of instances.

Proposition. Proposition 2. Suppose that equation image are iid and drawn from a continuous distribution with respect to the Lebesgue measure on equation image. Then equation image is parallel to the equation image-ball with probability 0. Consequently, the Dantzig selector and lasso have a unique solution with probability 1.

Remarks

  • (i)
    Proposition 2 is proved in the Appendix (a proof also appears in Dicker, 2010). To provide some intuition, note that the parallelism condition requires equation image to both (i) contain a specific point in its range (that is, an element of equation image) and (ii) to have a degenerate range (in the sense that equation image). Proposition 2 implies that this occurs with probability 0, under the specified conditions.

4. LARGE SAMPLE ASYMPTOTICS FOR THE DANTZIG SELECTOR

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. NOTATION AND DEFINITIONS
  5. 3. PARALLELISM AND UNIQUENESS
  6. 4. LARGE SAMPLE ASYMPTOTICS FOR THE DANTZIG SELECTOR
  7. 5. DISCUSSION
  8. Acknowledgements
  9. BIBLIOGRAPHY
  10. APPENDIX

Throughout the rest of this article, assume that p and equation image are fixed. In this section, we formulate the Dantzig selector as a well-defined mapping from sample covariance matrices, equation image, marginal covariances, equation image, and tuning parameters, equation image, to estimators, equation image. To do this, we restrict our attention to symmetric matrices that are not parallel to the equation image-ball—Proposition 2 suggests that this restriction is fairly weak. Then, we show that the Dantzig selector mapping is continuous. With this machinery in place, large sample asymptotics for the Dantzig selector follow easily.

Let equation image denote the collection of equation image positive semidefinite matrices that are not parallel to the equation image-ball and let equation image, where equation image is the collection of all invertible equation image matrices with real entries. Define the Dantzig selector mapping equation image by equation image, where equation image solves the optimization problem

  • equation image(4)

It follows directly from Proposition 1(b) that G is well-defined. Furthermore, notice that equation image. Note that the domain of G may be extended to a subset of equation image, provided one imposes conditions to ensure that the feasible set in the optimization problem (4) is non-empty. More specifically, define equation image. Then (4) defines equation image for equation image.

Proposition. Proposition 3. The mapping G is continuous on equation image.

Remarks

  • (i)
    A proof of Proposition 3 is found in the Appendix. A similar proof shows that G is also continuous on equation image. In other words, assuming that the appropriate (anti-) parallelism conditions hold, if there is non-trivial regularization in the limit (i.e., equation image), then the Dantzig selector is continuous, regardless of whether or not the predictors and the limiting sample covariance matrix are singular.

Corollary 1. Suppose that equation image and that equation image. Then equation image, almost surely, where equation image solves

  • equation image

Remarks

  • (i)
    The corollary follows directly from Proposition 3, which implies that equation image, almost surely.
  • (ii)
    Corollary 1 implies that under the given conditions, the Dantzig selector is consistent for equation image if and only if equation image. Furthermore, it gives the almost sure limit of equation image in cases where the Dantzig selector is not consistent (that is, when equation image).

Corollary 2. Suppose that equation image. Also assume that equation image, that equation image, and that equation image. Let equation image and let equation image denote the complement of equation image in equation image. Then equation image, where equation image denotes convergence in distribution, equation image solves the optimization problem

  • equation image(5)

and equation image.

Corollary 2 is proved in the Appendix.

Remarks 

  • (i)
    The second moment condition on equation image and the condition equation image ensure that equation image is asymptotically normal.
  • (ii)
    If equation image, then equation image has the same asymptotic distribution as the ordinary least squares estimator. If equation image, then the limiting distribution of the Dantzig selector is not normal.
  • (iii)
    Corollary 2 should be compared with Theorem 2 of Knight & Fu (2000), which describes the limiting distribution of equation image. Though the limiting distribution of lasso is determined by an unconstrained optimization problem, the term equation image in the limiting optimization problem for the Dantzig selector (5) also appears in the limiting optimization problem for lasso.

5. DISCUSSION

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. NOTATION AND DEFINITIONS
  5. 3. PARALLELISM AND UNIQUENESS
  6. 4. LARGE SAMPLE ASYMPTOTICS FOR THE DANTZIG SELECTOR
  7. 5. DISCUSSION
  8. Acknowledgements
  9. BIBLIOGRAPHY
  10. APPENDIX

The results in this paper address fairly long-standing open questions about uniqueness for the Dantzig selector and lasso. To summarize, we prove that the Dantzig selector and lasso estimators are unique in almost all instances. Though these results may appear to be somewhat esoteric, Proposition 2 and its corollaries demonstrate their potential usefulness. Indeed, we have shown that once uniqueness is understood, it is straightforward to obtain the almost sure limit and limiting distribution of Dantzig selector estimators. Taking a broader view, the results presented here may help clear the path for a more operator theoretic approach to studying the Dantzig selector, lasso, and other regularized regression procedures. Such an approach may offer additional insights into the properties of these methods in a variety of settings. For instance, one could potentially obtain a better understanding of the Dantzig selector in an asymptotic regime where equation image by defining the Dantzig selector operator on an appropriate infinite dimensional space (analogous to the operator G defined in Section 4 above) and studying its continuity properties in this more abstract setting. Future research in this direction is needed.

Acknowledgements

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. NOTATION AND DEFINITIONS
  5. 3. PARALLELISM AND UNIQUENESS
  6. 4. LARGE SAMPLE ASYMPTOTICS FOR THE DANTZIG SELECTOR
  7. 5. DISCUSSION
  8. Acknowledgements
  9. BIBLIOGRAPHY
  10. APPENDIX

We thank an associate editor and the reviewers for their detailed and very helpful comments. This research was supported by grants from the U.S. National Cancer Institute.

BIBLIOGRAPHY

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. NOTATION AND DEFINITIONS
  5. 3. PARALLELISM AND UNIQUENESS
  6. 4. LARGE SAMPLE ASYMPTOTICS FOR THE DANTZIG SELECTOR
  7. 5. DISCUSSION
  8. Acknowledgements
  9. BIBLIOGRAPHY
  10. APPENDIX
  • Asif, M. (2008). Primal dual pursuit: A homotopy based algorithm for the Dantzig selector. Master's thesis, Georgia Institute of Technology, USA.
  • Asif, M. & Romberg, J. (2010). On the lasso and Dantzig selector equivalence. In 44th Annual Conference on Information Sciences and Systems (CISS), pp. 16. IEEE.
  • Bickel, P. & Li, B. (2006). Regularization in statistics. Test, 15(2), 271344.
  • Bickel, P., Ritov, Y., & Tsybakov, A. (2009). Simultaneous analysis of lasso and Dantzig selector. Annals of Statistics, 37(4), 17051732.
  • Candès, E. & Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics, 35(6), 23132351.
  • Dicker, L. (2010). Regularized Regression Methods for Variable Selection and Estimation. Ph.D. thesis, Harvard University, USA.
  • Efron, B., Hastie, T., & Tibshirani, R. (2007). Discussion: The Dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics, 35(6), 23582364.
  • Fan, J. & Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20(1), 101148.
  • James, G., Radchenko, P., & Lv, J. (2009). DASSO: Connections between the Dantzig selector and lasso. Journal of the Royal Statistical Society: Series B, 71(1), 127142.
  • Knight, K. & Fu, W. (2000). Asymptotics for lasso-type estimators. Annals of Statistics, 28(5), 13561378.
  • Meinshausen, N., Rocha, G., & Yu, B. (2007). Discussion: A tale of three cousins: Lasso, L2Boosting and Dantzig. Annals of Statistics, 35(6), 23732384.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58(1), 267288.
  • Tibshirani, R. (2011). Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society: Series B, 73(3), 273282.

APPENDIX

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. NOTATION AND DEFINITIONS
  5. 3. PARALLELISM AND UNIQUENESS
  6. 4. LARGE SAMPLE ASYMPTOTICS FOR THE DANTZIG SELECTOR
  7. 5. DISCUSSION
  8. Acknowledgements
  9. BIBLIOGRAPHY
  10. APPENDIX

Proof of Proposition 1. The following two lemmas establish the Karush–Kuhn–Tucker (KKT) conditions for the Dantzig selector and lasso optimization problems. The lemmas appear in various forms in several references (Efron, Hastie, & Tibshirani, 2007; Asif, 2008; Dicker, 2010; Asif & Romberg, 2010), and the proofs are omitted.

Lemma A1. The vector equation image is a solution to the Dantzig selector (2) if and only if there is equation image such that

  • equation image(6)
  • equation image(7)
  • equation image(8)
  • equation image(9)

Lemma A2. The vector equation image is a solution to the lasso optimization problem (3) if and only if

  • equation image

To prove Proposition 1 (a), we assume that [Mult] holds and show that the Dantzig selector has multiple solutions. Let equation image, equation image, and equation image be as in [Mult] and take equation image so that equation image and equation image, where equation image is the complement of A in equation image. Then it is clear from Lemma A1 that equation image is a solution to the Dantzig selector. Furthermore, using Lemma A1, it is easy to check that equation image is a solution to the Dantzig selector for equation image sufficiently small (take equation image).

Now suppose that equation image are distinct solutions to the Dantzig selector and let equation image be vectors such that equation image and equation image, equation image satisfy (6)(9). Without loss of generality, assume that equation image and equation image, where we define equation image or 0, according to equation image or equation image, for equation image. Let equation image, equation image. Then (7) and (8) imply that equation image and equation image, equation image. Additionally, (9) implies that equation image. Hence, equation image. It follows that equation image is parallel to the equation image-ball. This proves Proposition 1 (b).

Finally, to prove Proposition 1(c), suppose

  • equation image

are distinct and suppose without loss of generality that equation image. Let equation image and equation image. Notice that for equation image we have

  • equation image(10)

Since (10) must hold for all equation image and since equation image, we must have equation image and equation image. It follows that

  • equation image(11)

and equation image.

Now, let equation image, where equation image is the Moore–Penrose pseudoinverse of equation image. Then Lemma A2 implies that

  • equation image

and equation image. Proposition 1(c) follows from these observations plus (11).

Proof of Proposition 2. To prove Proposition 2, we make use of the following lemma.

Lemma A3. Suppose that equation image and that the rows of X are iid and drawn from a distribution which is continuous with respect to the Lebesgue measure on equation image. Let W be an equation image matrix of rank equation image. Then equation image has rank equation image with probability 1.

Proof of Lemma A3. Let X and W be as in the statement of the lemma. Without loss of generality, suppose that equation image. When equation image, the result is true. For equation image, let equation image. To facilitate a proof by induction, assume that equation image has rank equation image with probability 1. In the event that equation image has rank equation image, the rank of equation image is less than p if and only if

  • equation image(12)

where equation image and equation image. Since W has full rank, it follows that equation image, with probability 1. Thus, conditioning on equation image and using the fact that the conditional distribution of equation image is continuous, it follows that (12) holds with probability 0. We conclude that equation image has rank p with probability 1.

Getting back to the proof of Proposition 2, suppose that the rows of X are iid and drawn from a distribution which is continuous with respect to the Lebesgue measure on equation image. Then X has rank equation image with probability 1. Let equation image and decompose equation image so that equation image, equation image, and equation image, equation image, and J are disjoint. If equation image, then equation image has a non-trivial null space. Suppose for the moment that equation image. When X has full rank, the dimension of the null space of equation image is non-zero if and only if

  • equation image

Furthermore, if X has full rank, then equation image has full rank. Conditioning on equation image and appealing to Lemma A3, it follows that the rank of equation image is equation image with probability 1. Thus the null-space of equation image is non-trivial with positive probability if and only if equation image.

Now suppose that equation image. There are two cases: equation image and equation image. In each case, the probability that there exists equation image such that equation image is 0. We prove this for the case equation image; the case equation image follows similarly. Assume that equation image. Choose equation image such that equation image and let equation image. Suppose that equation image for some equation image and equation image. Then, assuming that X is full rank,

  • equation image

and

  • equation image

Thus, we have

  • equation image

where Lemma A3 guarantees that equation image is invertible with probability 1. Since, conditional on equation image, the rows of equation image are independent and have continuous distributions with respect to Lebesgue measure on equation image, it follows that

  • equation image

with probability 0. Thus, as claimed, the probability that there exists equation image such that equation image is 0.

The results from the last two paragraphs imply that

  • equation image

It follows that equation image is parallel to the equation image-ball with probability 0, as was to be shown.

Proof of Proposition 3. For equation image, let equation image, equation image, and equation image and assume that equation image, equation image, and equation image. Let equation image and let equation image. We show that equation image.

Since equation image, there exists a subsequence equation image and a vector equation image such that equation image. To prove the proposition, it suffices to show that equation image. By continuity of the equation image-norm, we must have

  • equation image

Also, by the optimality properties of equation image, we must have

  • equation image(13)

for any sequence equation image, with equation image and

  • equation image(14)

We consider two cases: equation image and equation image. First suppose equation image and define equation image. Then (14) holds and equation image. From (13), it follows that equation image and the optimality of equation image implies that equation image. Now suppose that equation image and define equation image. Then (14) holds and, as in the previous case, we conclude that equation image. Thus, in either case, equation image, as was to be shown.

Proof of Corollary 2. The conditions equation image and equation image ensure that equation image, by the Lindeberg–Feller central limit theorem. By the Skorokhod representation theorem, we may assume without loss of generality that equation image almost surely.

Now let equation image and notice that the Dantzig selector (2) is equivalent to the optimization problem

  • equation image(15)

In particular, equation image solves (15). We show that equation image, the solution to (5), almost surely. This suffices to prove the corollary.

Since equation image, equation image almost surely, and equation image, it follows that there is an almost surely finite random variable M such that equation image whenever equation image is feasible for the optimization problem (15). Let equation image and notice that if equation image and equation image is feasible for (15), then equation image. It follows that

  • equation image

whenever equation image. Taking equation image, Proposition 3 implies that equation image almost surely and it is straightforward to check that equation image.