Canonical Correlation Analysis Through Linear Modeling

Authors


Summary

In this paper, we introduce linear modeling of canonical correlation analysis, which estimates canonical direction matrices by minimising a quadratic objective function. The linear modeling results in a class of estimators of canonical direction matrices, and an optimal class is derived in the sense described herein. The optimal class guarantees several of the following desirable advantages: first, its estimates of canonical direction matrices are asymptotically efficient; second, its test statistic for determining the number of canonical covariates always has a chi-squared distribution asymptotically; third, it is straight forward to construct tests for variable selection. The standard canonical correlation analysis and other existing methods turn out to be suboptimal members of the class. Finally, we study the role of canonical variates as a means of dimension reduction for predictors and responses in multivariate regression. Numerical studies and data analysis are presented.

1 Introduction

Principal component analysis is one of the most popular dimension reduction tools in high-dimensional data analysis. Since it does not require the inversion of the covariance matrix of the variables, it has convenient applications to data where the sample size inline image is less than the number of variables, namely inline image.

Canonical correlation analysis (CCA) seeks pairs of linear combinations from two sets of variables based on maximisation of the Pearson correlation between each pair. We call the pairs of linear combinations and their correlations canonical variates and canonical correlations, respectively. It is believed that a few pairs of canonical variates can represent the original sets of variables to explain their relationships and variabilities (Johnson & Wichern 2007, pp. 539–574).

Since principal component analysis is based on the marginal covariance structure of each set of variables, it ignores any association between the two sets when effecting dimension reduction. In contrast CCA reduces the dimensions of the two sets while maximizing Pearson correlation. Therefore when a high-dimensional relationship between two sets of variables is of interest with inline image, the latter should be more appropriate than the former and can produce a more reasonable dimension reduction.

It should be noted here that CCA is closely related to classical multivariate regression where

display math(1)

where inline image is the random vector of responses, inline image is a vector of predictors, inline image is an intercept vector, inline image is an unknown coefficient matrix, and the error vector inline image is independent of inline image. The notation inline image indicates the multivariate normal distribution. Also, for a symmetric matrix inline image, the notations inline image and inline image indicate that inline image is positive definite and positive semi-definite, respectively.

It was shown by Tso (1981) that the maximum likelihood analysis of reduced-rank regression under (1) has an intimate connection with CCA. Tso's work showed that the canonical covariates corresponding to predictors can be used as dimension-reduced predictors in multivariate regression. Also, Yoo, Kim & Um (2012) studied the relation of CCA and regression analysis, focusing on a reduced-rank regression framework. A method of estimating the canonical variate more accurately by considering weights was proposed by Ter Braak (1990). The weights were constructed using the residuals, inline image, where inline image and inline image represent the ordinary least squares estimates under (1).

Even though the papers discussed above improved estimation in CCA and clarified the role of CCA in data analysis, they have several practical limitations. The equivalence of maximum likelihood estimation in reduced-rank regression to CCA shown by Tso (1981) holds only under (1). That is, if the random error inline image is not multivariate normal, then this equivalence is no longer valid. Additionally reduced-rank regression typically reduces the dimensions of the predictors alone, although the responses are multi-dimensional. If one considers that CCA was introduced to reduce dimensions of two sets of variables simultaneously, the relation between CCA and reduced rank regression described in Yoo et al. (2012) is not quite satisfactory. Further, optimality through weights as developed by Ter Braak (1990) holds only under (1), just as for Tso (1981). That is, the normality of the random error inline image is an extremely crucial condition. Finally, variable selection in CCA has been largely neglected to date, although this can substantially help the interpretation of canonical variates.

The main purpose of this paper is to overcome these deficiencies in CCA. To achieve this, several steps are required. First, we connect CCA to ordinary least squares coefficients. Then we establish a linear modeling form of CCA based on this connection. Under this setup, the unknown quantities in CCA are estimated optimally in a sense which will be discussed in later sections. Second, we show that standard CCA and the method of Ter Braak (1990) are sub-optimal. In addition, we propose a method of variable selection in CCA under the linear modeling of CCA. Finally, we investigate CCA as a dimension reduction tool in multivariate regression by adopting existing theories of sufficient dimension reduction.

The paper is organised as follows. In Section 'Classical canonical correlation analysis (CCA)', we give a brief explanation of classical canonical correlation analysis. In Section 'Linear modeling of canonical correlation analysis' we develop a linear modeling approach for canonical correlation analysis. The roles of canonical covariates as a means of dimension reduction in multivariate regression are studied in Section 'Canonical correlation in regression'. Sections 'Simulation study' and 'Minneapolis school data analysis' provide numerical studies and a real data application, respectively. Finally, a summary of our work is provided in Section 'Discussion'. To avoid interrupting the discussion, proofs for most results are given in the Appendix.

2 Classical canonical correlation analysis (CCA)

Here we will give short explanation of CCA. Suppose that we have two sets of variables inline image and inline image, and define inline image, inline image, inline image and inline image. Let two linear combinations of inline image and inline image be inline image and inline image, where inline image and inline image. Then we have inline image, inline image, and inline image. Now we determine inline image and inline image to maximise

display math(2)

Canonical correlation analysis seeks such inline image and inline image based on the following criteria:

  1. The first canonical variate pair inline image is obtained from the maximisation of (2) with the restriction that inline image.
  2. The second canonical variate pair inline image is obtained from the maximisation of (2) with the restriction that inline image and inline image and inline image are uncorrelated.
  3. At step inline image the inline imageth canonical variate pair inline image is obtained from the maximisation of (2) with the restriction that inline image and inline image are uncorrelated with the previous inline image canonical variate pairs.
  4. Repeat steps 1 and 3 until inline image.
  5. Select inline image pairs of inline image to represent the relationship between inline image and inline image.

Finally, it can be shown that pairs inline image based on the criteria above are constructed as follows: inline image and inline image for inline image, where inline image and inline image are respectively the inline image eigenvectors of inline image and inline image with corresponding common ordered-eigenvalues inline image. The matrices inline image and inline image are called canonical direction matrices.

The selection of the inline image pairs of canonical variates is equivalent to testing how many non-zero eigenvalues the matrix inline image has. This is also the same as the estimation of the rank of inline image. Throughout the rest of the paper, the symbol inline image represents the true rank of inline image. A selection criterion for determining the value of inline image will be discussed in later sections. For more details regarding CCA, readers are referred to Johnson & Wichern (2007).

3 Linear modeling of canonical correlation analysis

3.1 Relation of canonical direction matrices and least squares

In this section, we introduce a new approach to CCA for two sets of variables inline image and inline image. Throughout the rest of the paper we will directly follow the notation of the previous two sections. For inline image, the expressions inline image and inline image represent the column rank and the subspace spanned by the columns of inline image, respectively. In addition, we define inline image, inline image, and inline image. By symmetry we have inline image.

Using inline image and inline image, inline image is simplified as follows:

display math(3)

The relation (3) directly implies that inline image.

Let inline image. It can be shown that the columns of inline image form an orthonormal basis for inline image because inline image. It follows that the columns of the matrix inline image form a basis for inline image, equivalently inline image. Since post-multiplication by any non-singular matrix does not change the rank and column space of a matrix, we establish the following key equivalences:

display math(4)

The quantity inline image in the last equivalence of (4) consists precisely of the ordinary least squares (OLS) coefficients of the regression of inline image given inline image. This directly indicates that any orthonormal basis matrix for inline image can also be used to construct a canonical direction matrix for inline image. By applying the same arguments to inline image, it is easily shown that inline image. Again, the quantity inline image consists of the OLS coefficients of inline image. Based on these results, we will present linear modeling of CCA by means of inline image and inline image.

3.2 Linear modeling of canonical correlation analysis

Recalling the definitions of inline image and inline image, we first consider inference regarding inline image. The estimation of inline image requires two parts, which consist of determining its dimension, inline image, and its orthonormal basis, inline image.

Since inline image is also an orthonormal basis matrix of inline image according to (4), we have the relation inline image where inline image. Unknown population quantities inline image and inline image are replaced by their usual moment estimators inline image and inline image. We can then consider the estimation of inline image and inline image with arguments inline image and inline image that minimise the following objective function over inline image and inline image:

display math(5)

where inline image is a inline image inner-product matrix and inline image indicates a inline image vector constructed by stacking the columns of a inline image matrix inline image; inline image.

As discussed in Shapiro (1986), any solution inline image provides a consistent estimator inline image for any choice of inline image in (5) in the sense that inline image converges to inline image, where inline image stands for the orthogonal projection operator with respect to the usual inner-product space structure, onto inline image. Therefore we can construct a class of estimators for canonical direction matrices depending on choices of inline image. In addition it is clear that the minimisation of (5) and its asymptotic behavior depend on the choice of inline image. According to Shapiro (1986) a best choice for inline image is any consistent estimate of the inverse of the covariance matrix of the asymptotic distribution of inline image, which will be denoted inline image. The quantity inline image is exactly equal to inline image in Yoo & Cook (2007) and its explicit expression is

display math

where inline image.

Here the following quantity is used as a consistent estimate of inline image:

display math

where inline image and inline image.

Then we define the following best quadratic function as follows:

display math(6)

We define inline image and inline image to be the solutions of the minimisation of (6) with respect to inline image and inline image.

Now the estimation of inline image is performed via a sequence of hypothesis tests (Rao 1965): beginning with inline image, test inline image vs. inline image. If inline image is rejected, increment inline image by 1 and repeat the test, stopping the first time inline image is not rejected and setting inline image. This estimation procedure relies on obtaining a test statistic for inline image, and, as the statistic we propose inline image Under inline image, the statistic inline image is distributed asymptotically as inline image. Then inline image is estimated by inline image, where inline image is constructed from (6) with inline image. Also, the estimator inline image is asymptotically efficient. The asymptotic efficiency of inline image means that it has minimum asymptotic variance within family (5). The results regarding the asymptotic chi-squared distribution of inline image and the asymptotic efficiency of inline image are directly derived from theorem 2 in Yoo & Cook (2007), and the results are guaranteed by the existence of finite fourth moments of simple random samples inline image of inline image, inline image, according to theorem 1 of Cook & Ni (2005).

Applying the same rationale to inline image, we can estimate inline image and its orthonormal basis of inline image by minimising the following quadratic objective function over inline image and inline image:

display math(7)

where inline image is a consistent estimator of the inverse of the asymptomatic variance of inline image.

Let inline image and inline image be the solutions of the minimisation of (7). As a test statistic for inline image vs. inline image, we propose inline image which is asymptotically inline image under inline image.

The estimates of inline image determined from (6) and (7) are not always equal, but the same number of pairs of canonical variates for inline image and inline image should be selected in the CCA. Therefore, we will consider the following Bonferroni determination of inline image throughout the rest of the paper:

  1. Starting with inline image, compute the inline image-values from inline image and inline image with inline image. Let them be inline image and inline image, respectively.
  2. If either inline image or inline image is less than inline image, reject inline image and increment inline image by 1.
  3. Repeat Step 2 until both inline image and inline image are bigger than inline image for the first time. Then set inline image.

Effecting CCA by means of the Bonferroni determination of inline image and the estimation of canonical direction matrices through minimising (6) and (7) will be called optimal linear modeling (OLM) of CCA.

3.3 Sub-optimality of existing approaches

From Section 'Classical canonical correlation analysis (CCA)', the sample canonical direction matrix inline image for inline image , from the standard application of CCA, is constructed from the spectral decomposition of inline image, by taking the eigenvectors corresponding to the first inline image largest eigenvalues inline image. For the determination of inline image through sequential hypothesis tests of inline image versus inline image, inline image, the following statistic proposed by Bartlett (1938) is widely used:

display math

Under the joint normality of inline image and inline image, inline image is asymptotically distributed as inline image.

Let the pairs inline image, inline image, be the eigenvalues and their corresponding eigenvectors of inline image. Let inline image. Consider the minimisation of the following quadratic objective function over inline image and inline image:

display math(8)

Let inline image and inline image be the minimisers of (8) and define inline image. Then a pair inline image can be obtained from inline image and inline image by the following lemma.

Lemma 1. The following two equations hold: (i) inline image for inline image. (ii) inline image with inline image, if inline image.

The next lemma demonstrates that the proposed approach is asymptotically more efficient than standard canonical correlation application.

Lemma 2. Let inline image be the covariance matrix of the asymptotic distribution of inline image constructed from (6). Define inline image to be that of inline image constructed from (8). Then, for any inline image, inline image.

Another difference between (6) and (8) is the asymptotic distributions of inline image and inline image. The former is always asymptotic inline image regardless of joint normality of inline image and inline image, while the latter requires a multivariate normal distribution of inline image and inline image for inline image . Therefore, if the normality is a cause of concern, inline image will be problematic with respect to determination of inline image. Being deficient in these two desirable properties namely efficiency and validity of the associated inline image distribution, the standard CCA application can be said to be sub-optimal.

Also, Ter Braak's approach can be viewed as a particular case of the linear modelling approach by setting inline image where inline image. The optimality of Ter Braak's approach is guaranteed under the following condition:

display math

If inline image and inline image are independent then, the two quantities are equal, and hence Ter Braak's approach is optimal. However, if they are not, equality is not guaranteed all the time so Ter Braak's approach may be sub-optimal. It is interesting to note that Ter Braak's approach coincides with the results of Cook & Setodji (2003), who developed a model-free reduced-rank regression.

Simulation studies in Section 5.1 show that the potential advantages of using the OLM by minimising (6) and (7) are most noticeable in the estimation of the canonical direction matrices when there exists a complicated association between inline image and inline image, such as non-trivial noise and high skewness in variables.

3.4 Variable selection

Variable selection in canonical correlation analysis has been largely neglected possibly due to the difficulty of deriving a proper methodology for it. Since CCA is done through inline image and inline image, variable selection in CCA should be based on choosing variables in inline image and inline image which contribute substantially to inline image and inline image. Direct use of (6) and (7) enables us to do this without knowing inline image. In other words, we can perform variable selection prior to carrying out canonical correlation analysis.

In canonical correlation analysis, the importance of inline image and inline image is measured only through inline image and inline image, respectively. For example, if inline image, which is the first coordinate of inline image, does not contribute to canonical correlation analysis, the corresponding first row of inline image should be zero. This condition can be written as inline image where inline image. It is straightforward to test an hypothesis such as inline image through the linear modeling of canonical correlation analysis given in (6) and (7).

Since the variables significant to inline image and inline image must be significant to inline image and inline image, we test the following hypotheses for variable section of inline image and inline image:

display math

where inline image represents the inline image-dimensional canonical basis vector with the inline imageth entry equal to one and other entries equal to zero.

If inline image is not rejected, then the inline imageth coordinate inline image in inline image does not contribute to inline image. Thus we can remove inline image before conducting canonical correlation analysis. Also, if inline image is not rejected, the inline imageth coordinate inline image in inline image can be removed.

The hypotheses are tested by using the Wald-type statistic:

display math

Under inline image and inline image, inline image and inline image asymptotically converge to inline image and inline image, respectively, which immediately follows from Slutsky's theorem.

4 Canonical correlation in regression

We consider a multivariate regression of inline image with inline image. Define inline image and inline image with the smallest possible ranks of inline image and inline image so that inline image, where inline image is a inline image matrix. If inline image, we are interested in the equation inline image.

This says that inline image can be thought of as influencing inline image and all other conditional mean components are determined from inline image via inline image. It can be shown that inline image is a generalised inverse of inline image: inline image. Without loss of generality, we take inline image. Then inline image forms the orthogonal projection operator inline image for inline image relative to the inner product inline image, so we have

display math(9)

This says that inline image varies in the subspace spanned by inline image only through dependence on inline image. In other words, we pursue dimension reduction of inline image and inline image through linear projection without loss of information about inline image. We call this type of dimension reduction in regression sufficient dimension reduction for inline image (Yoo & Cook 2007, 2008).

Considering sufficient dimension reduction of inline image for inline image, the following condition on the marginal distribution of inline image is typically imposed:

C1. inline image is linear in inline image.

Condition C1 is called the linearity condition and will hold to a reasonable approximation in many problems (Hall & Li 1993). If inline image has an elliptically contoured distribution, condition C1 is automatically satisfied. In the case that condition C1 does not hold, inline image can often be one-to-one transformed so as to satisfy this condition. Under condition C1, inline image. Hereafter, we will assume that inline image for exhaustive estimation of inline image. Then, from (4), we have

display math

This relation implies that the canonical variates for inline image can replace the original predictor inline image without loss of information on inline image under condition C1.

The use of inline image as given in (9) is implicit in the method in Yoo & Cook (2008, section 2.1). Expressing their results in our terms we would say that inline image has full information on inline image in the sense that inline image. We then have the following equivalence:

display math

The quantity inline image in the last equivalence above is used for the usual construction of the canonical variates for inline image. Thus the original response inline image can be replaced by the canonical variates for inline image without loss of information on inline image.

5 Simulation study

To confirm that the proposed OLM method has potential advantages in the estimation of the dimension and bases of the canonical correlation subspace, we consider joint distributions between inline image and inline image. First, the coordinate variables of inline image were independently generated from inline imageinline image Gamma (0.25, 1) and inline imageinline imageinline image. Based on this, we have constructed the following four joint distributions between inline image and inline image: inline image, inline image, inline image, and inline image, where the variates inline image, inline image 4 are independent standard normal variates independent of inline image. In the simulation, sample sizes of 100, 200, 400 and 800 were considered, and the number of simulation replicates for each sample size was 500.

In the simulation model, the inline image variates are quite skewed and prone to outliers. If we reduce the dimensions of inline image and inline image and focus on inline image, which is a very common target in regression, then it can be seen that inline image. Under the simulated model, the directions of inline image and inline image for inline image and inline image and inline image for inline image are the results of the dimension reduction. In other words, two-dimensional sets of variables inline image and inline image should be sufficient to represent the association between inline image and inline image for inline image. Therefore the true number of pairs of canonical variates, which is inline image, is two. For methodological comparison, we have used the proposed OLM method, the standard CCA method, and the methods of Ter Braak (1990) and Yoo & Cook (2007).

In the estimation of inline image, we sequentially tested hypotheses H0: inline image  =  m versus H1: inline image  >  m for m = 0; 1; 2; 3 with a significance level equal to 0.05. In respect of dimension estimation, we computed the percentages of the time that the estimate inline image > 2 and those of the time that the estimate inline image = 2. The former percentages are called the observed significance levels. The two percentages are reported in Figure 1. In Figures 1(a) and (b), the horizontal lines represent the reference 5% and 95% lines respectively. The observed significance levels were reported in Figure 1(a) by computing the rejection percentages of estimates inline image > 2. The Yoo-Cook method shows the best results, and standard CCA application and the Ter Braak method are similar to each other. Since the proposed method uses the Bonferroni procedure, estimated significance levels should be at most 5%, which is observed in Figure 1(a). Figure 1(b) shows that the OLM method which invokes the Bonferroni procedure produces the highest percentages of inline image = 2 regardless of sample sizes, although the Yoo-Cook method is close to the OLM method. The other two methods under consideration do not quite match the OLM method with smaller sample sizes. This confirms that the OLM with the Bonferroni procedure has potential advantages over the three other methods in dimension estimation.

Figure 1.

Dimension estimation of the canonical correlation subspace in Section 5.1. (a) Percentages of the decisions that inline image. (b) The observed significance levels. OLM, the optimal linear modeling approach; CCA, standard canonical correlation analysis; YC, Yoo–Cook method; TR, Ter Braak method.

Let inline image be the estimated canonical direction matrices of inline image and inline image from either the space approach or the standard CCA given inline image, respectively. To measure how well the true basis of inline image for inline image was estimated, inline image and inline image were computed as the square roots of the inline images from the ordinary least squares regressions of inline image on inline image and of inline image on inline image. The same criteria were applied to summarise the estimation of inline image and inline image, and the similar notation of inline image and inline image was adopted. Since there were no notable differences between the two approaches in the basis estimation of inline image for both inline image and inline image, we report the averages of inline image and inline image in Figure 2 and those of inline image and inline image in Figure 3.

Figure 2 shows that the OLM method outperforms the other three approaches in the estimation of the direction of inline image. For the estimation of the direction of inline image, inline image and inline image, all four methods are quite similar. So, it can be concluded that the OLM method shows equally good or better asymptotic performances in the estimation of canonical direction matrices over the three other methods.

Figure 2.

Basis estimation of the canonical correlation subspace for inline image in Section 5.1. (a) Percentages of inline images. (b) Percentages of inline images. OLM, the optimal linear modeling approach; CCA, standard canonical correlation analysis; YC, Yoo-Cook method; TR, Ter Braak method.

Figure 3.

Basis estimation of the canonical correlation subspace for inline image in Section 5.1. (a) Percentages of inline images. (b) Percentages of inline images. OLM, the optimal linear modeling approach; CCA, standard canonical correlation analysis; YC, Yoo-Cook method; TR, Ter Braak method.

Next we tested each coordinate effect on both canonical direction matrices with significance level 5%. Since inline image, inline image, inline image and inline image contribute to inline image and inline image, inline image and inline image for inline image and inline image should be rejected 100% of the time and for inline image and inline image should be rejected about 5% of the time. We report the percentages of rejections of inline image, inline image, inline image and inline image in Table 1, because testing behaviors of inline image, inline image, inline image and inline image are similar to those of inline image, inline image, inline image and inline image in order. Table 1 shows that the variable selection with respect to each canonical direction matrix is not a cause for concern with mild sample sizes.

Table 1. Percentages of rejections of inline image, inline image, inline image and inline image in Section 4.1.
inline image inline image inline image inline image inline image
100100.011.3100.018.5
200100.06.40100.012.2

6 Minneapolis school data analysis

To illustrate the CCA methodology explained in Section 'Canonical correlation in regression', we use data on the performance of students in inline image Minneapolis schools introduced in Cook (1998). The inline image dimensional variables inline image consist of the percentages inline image of students in a school scoring above (A) and below (B) average on standardised fourth and sixth grade reading comprehension tests, inline image. Subtracting either pair of grade specific percentages from 100 gives the percentage of students scoring about average on the test. From the collection of all variables in the dataset, the following five variables were chosen to be components of inline image for the purpose of illustration: (i) the percentage of children receiving Aid to Families with Dependent Children (AFDC); (ii) the percentage of children not living with both biological parents (B); (iii) the percentage of adults in the school area who completed high school (HS); (iv) the percentage of persons in the area below the federal poverty level (PL); and (v) the pupil teacher ratio (PT). The first four variables in inline image were square-root transformed to satisfy Condition C1. The efficacy of the transformation was confirmed by graphical inspection (not reported).

Variable selection for inline image and inline image was performed; the related inline image-values are summarised in Table 2. According to the table, inline image and inline image in inline image and inline image and inline image in inline image are determined to be significant at level 0.05. For the purpose of illustration, both the proposed linear modeling approach and the standard CCA procedure were considered.

Table 2. inline image-values for variable selections in Minneapolis school data.
inline image inline image inline image inline image inline image inline image inline image inline image inline image
0.0000.0870.0010.3840.4490.3840.6490.0000.000

To determine the number of pairs of canonical variates, the Bonferroni procedure and Bartlett statistics were applied to two different cases of before and after variable selection. Table 3 presents the corresponding inline image-values. Before the selection, with significance level 5%, the Bonferroni procedure yields the estimate inline image, while the standard CCA yields inline image. However, after selection, both procedures yield inline image. The different conclusions before and after variable selections may result from standard CCA failing to properly detect the quadratic relationship between the first two canonical variates of inline image because of noise induced by extraneous variables.

Table 3. inline image-values for the rank estimations in canonical correlation analysis in Minneapolis school data.
 Before variable selectionAfter variable selection
  inline image inline image inline image inline image inline image inline image
inline image 0.0000.0000.0000.0000.0000.000
inline image 0.1230.0170.0700.0020.0000.002
inline image 0.5830.5750.613N/AN/AN/A

To further explain the relationship between inline image and inline image through multivariate regression analysis one can commence modeling using inline image and inline image as responses and inline image and inline image as predictors.

7 Discussion

In this article, we propose linear modeling of canonical correlation analysis by considering a quadratic objective function. In the linear modeling approach, we construct an optimal class, which is discussed herein. Canonical direction matrices are then estimated through the minimisation of the objective functions in the optimal class, and a Bonferroni procedure is proposed to determine the number of pairs of canonical variates. It can be shown that standard canonical correlation analysis, as well as other existing methods, can be expressed in the same form as the linear modeling approach, and that they are sub-optimal cases of this approach. In addition the proposed approach enables us to conduct variable selection in canonical correlation analysis.

We investigate the role of canonical correlation analysis in multivariate regression, and it turns out that the canonical variates can be used as dimension-reduced responses and predictors without loss of information on the conditional mean under mild conditions.

It is believed that this paper will re-emphasize the importance and usefulness of canonical correlation analysis in multivariate data analysis. The code for the proposed approach is available upon request.

Appendix: Justifications

Proof of Lemma 1. For notational convenience, let inline image, set inline image, where inline image the inline imageth column of inline image. Therefore inline image. Defining inline image, the maximum number of non-zero eigenvalues acquired from the spectral decomposition of inline image is inline image. We denote the eigensystem of inline image as follows: inline image with inline image where inline image is the eigenvector corresponding to inline image, for inline image. We will not consider the eigenvectors corresponding to the inline image zero eigenvalues, because they are not informative in the reduction of inline image.

Let inline image. According to lemma A.1 of Cook & Ni (2005), we have

display math

where inline image is the usual Euclidean norm and inline image is the value of inline image that minimises inline image.

We re-express inline image as inline image and define inline image, inline image, and inline image. It then follows that:

display math

The last equation is the same as the quadratic objective function in (8). Therefore, we have that inline image for inline image, and inline image. From these results, the conclusions follow directly.

Proof of Lemma 2. Recall that inline image is the covariance matrix of the asymptotic distribution of inline image and that the pairs of inline image and inline image are the OLM solutions of the minimisation of (6). Define inline image and let inline image and inline image be the solutions of the minimisation of (8).

According to theorem 2 in Yoo & Cook (2007) and lemma A.4 in Cook & Ni (2005), the explicit expressions for inline image and inline image are as follows:

display math(10)

where inline image and inline image is the Jacobian matrix

display math

Define inline image. Then for any inline image, inline image, because inline image. Using (10), the explicit form of inline image is as follows:

display math

Replacing inline image to inline image by the corresponding quantities, we see that inline image is equivalent to inline image, and this completes the proof.

Acknowledgements

This work was supported by Basic Science Research Program through the National Research Foundation of Korea (KRF) funded by the Ministry of Education, Science and Technology (2012-004002) for Keunbaik Lee and (2012-040077) for Jae Keun Yoo, respectively. The authors are also grateful to the associate editor and the three referees for many insightful and helpful comments.

Ancillary