Summary In many clinical studies, Lin's concordance correlation coefficient (CCC) is a common tool to assess the agreement of a continuous response measured by two raters or methods. However, the need for measures of agreement may arise for more complex situations, such as when the responses are measured on more than one occasion by each rater or method. In this work, we propose a new CCC in the presence of repeated measurements, called the matrix-based concordance correlation coefficient (MCCC) based on a matrix norm that possesses the properties needed to characterize the level of agreement between two p× 1 vectors of random variables. It can be shown that the MCCC reduces to Lin's CCC when p= 1. For inference, we propose an estimator for the MCCC based on U-statistics. Furthermore, we derive the asymptotic distribution of the estimator of the MCCC, which is proven to be normal. The simulation studies confirm that overall in terms of accuracy, precision, and coverage probability, the estimator of the MCCC works very well in general cases especially when n is greater than 40. Finally, we use real data from an Asthma Clinical Research Network (ACRN) study and the Penn State Young Women's Health Study for demonstration.
In many scientific studies, one of the research objectives is to assess agreement of observations made by two raters or methods. For example, in the area of medical diagnostic testing, the main research interest is to compare the results of a new technique with the gold standard practice. When the observations are measured on a continuous scale, the concordance correlation coefficient (CCC), introduced by Lin (1989), is one of the most popular measures of agreement. The CCC evaluates the agreement between two readings from the same sample by measuring how far each paired data point deviates from the 45 ° line through the origin, called the concordance line. The value of CCC ranges between −1 and 1 with the equality to 1 for perfect positive agreement, 0 for no agreement, and −1 for perfect negative agreement. Unlike the traditional approaches, for example the Pearson correlation coefficient, the paired t-test, and the least squares test, which sometimes fail to detect departure from the concordance line or falsely reject strong agreement, the CCC can fully assess the desired reproducibility characteristics.
However, in many fields of science, especially medical sciences, the need of measure of agreement between the two raters or methods often arises when the data are obtained at several occasions. For example, in a longitudinal asthma clinical trial, one of the research goals was to study the amount of agreement between plasma cortisol AUC (area under the curve) measured every hour and every two hours at several visits. In this situation, we need a repeated measure CCC that can quantify the overall agreement between two random vectors of repeated measurements.
For paired or unpaired repeated measurements study design, Chinchilli et al. (1996) developed a weighted CCC based on a random coefficient model that allows the within-subject variances to change across subjects. For each subject, the CCC was constructed as an average of q CCC's of the least squares random vectors, whose variance–covariance matrices were of dimension q×q. Then, the global CCC was defined as a weighted average of the coefficients using a weight function based on the amount of variation within each subject.
King, Chinchilli, and Carrasco (2007) proposed another version of the CCC in the presence of repeated measurements. They characterized the amount of agreement between two p× 1 random vectors, X and Y, by , where D is a p×p nonnegative definite matrix of weights among the different repeated measurements. Then, the repeated measure CCC was defined as
Carrasco, King, and Chinchilli (2009) developed a CCC for longitudinal repeated measurements through the appropriate specification of the intraclass correlation coefficient from a variance components linear mixed model. The authors showed that this CCC is equivalent to the repeated measure CCC proposed by King et al. (2007) when the weight matrix D is the identity matrix.
In this work, we introduce a new repeated measure CCC that not only can be proven to possess the properties needed to measure the amount of overall agreement between two p× 1 vectors of random variables but also has more intuitive appeal than the former methods. In Section 2, we first construct a matrix that characterizes the overall agreement between the two vectors. Then, to ease the problem of interpretation, we transform this matrix to a scalar based on a matrix norm and scale its value to range between −1 and 1. We call our repeated measure CCC the matrix-based concordance correlation coefficient (MCCC). To estimate the MCCC, we consider an estimator based on U-statistics. For inference, we derive the asymptotic distribution of the proposed estimator. To obtain a confidence interval or a test statistic, we consider a consistent estimator of the asymptotic variance and the Z-transformation to improve the normal approximation and bound the confidence limits. In Section 3, a Monte Carlo simulation is performed to assess the properties of the estimator of the MCCC based on finite samples. Finally, in Section 4, some real examples are used to demonstrate the application of the MCCC.
Let (X, Y) be a 2p× 1 random vector from a 2p-variate distribution with a finite 2p× 1 mean vector and a positive definite 2p× 2p covariance matrix
where X and Y are the measurements of each method with
To characterize the level of agreement between the two p× 1 vectors X and Y, let us consider the following p×p matrices,
Then, we construct a matrix that characterizes the overall agreement between the two vectors denoted by , as follows:
where Ip×p denotes the p×p identity matrix, VD is nonnegative definite, VI is positive definite, and VI−1/2 denotes the symmetric square-root decomposition of the inverse of VI.
For ease of notation, we write A > 0p×p if a p×p, symmetric matrix A is positive definite and A≥0p×p if A is nonnegative definite.
has the following properties (see the proof of each property in Web Appendix A).
2 if and only if .
3 if and only if X=Y with probability one.
4 if and only if X=−Y with probability one and .
5If p= 1, then reduces to Lin's CCC.
6If and are diagonal matrices and , then each of the diagonal elements of corresponds to Lin's CCC.
Based on these properties, one can use to measure the amount of agreement between two vectors of random variables. The closer is to the identity matrix, the higher the level of positive agreement between the two vectors. Conversely, the closer is to the negative identity matrix, the higher the level of negative agreement between the two vectors. If is equal to the zero matrix, then it means that the two vectors are independent, in other words, there is no agreement between the two vectors.
In most circumstances, it will not be straightforward to gauge the “closeness” of to the identity matrix, and most researchers will want a numerical value to represent “closeness.” For example, suppose that follows a joint distribution with
The resultant matrix is
which is somewhat close to the identity matrix, indicating a good level of agreement between X and Y.
Suppose that in the above example, the expectation vector for Y actually is Then
and it is not clear how to describe the level of agreement between X and Y.
Therefore, we recommend the construction of a matrix norm to assess the distance between and the identity matrix. For p×p symmetric matrices A and B, a function g is a matrix norm if it satisfies the following conditions (Stewart, 1973):
1if A≠0, then g(A) > 0.
2if c is a constant, then g(cA) = |c|g(A).
3g(A+B) ≤g(A) +g(B).
An appealing matrix norm for our situation is the Frobenius norm, defined as
where λ1, λ2, … , λp represent the eigenvalues of A. A generalization of the Frobenius norm is to let D be a positive definite matrix of weights and construct . Setting D equal to the identity matrix yields the Frobenius norm, which we use throughout the remainder of the manuscript.
In addition to determining the distance between and the identity matrix, we want to scale and center the matrix norm so that it ranges between −1 and +1, where the latter corresponds to perfect agreement and the former corresponds to perfect negative agreement (in a manner comparable to other agreement coefficients). Therefore, we propose the matrix-based MCCC, ρg, as follows:
Note that ρg ranges between −1 and 1. By the definition of ρg in (4) and the properties of g, it can easily be seen that ρg= 0 if , which is the case that there is no agreement between the two vectors; ρg = 1 if , which is the case that there is a perfectly positive agreement between the two vectors; and ρg=− 1 if , which is the case that there is a perfectly negative agreement between the two vectors. In addition, the closer is to Ip×p, the closer ρg is to 1 and the closer is to −Ip×p the closer ρg is to −1.
To estimate and ρg, we first shall estimate VD and VI with unbiased estimators based on U-statistics.
Assume that are independent and identically distributed random vectors from a 2p-variate distribution with finite fourth moments. Define
Now, we construct the estimator of and ρg as
where the vec operator vectorizes a matrix by stacking its columns.
That is, and are U-statistics with kernels and , respectively. Since and and are unbiased estimators of and , respectively.
To make inference about ρg, we derive the asymptotic distribution of . By (8), is a function of . Thus, we first derive the limiting distribution of . The proof of this theorem appears in Web Appendix B.
Theorem 1Assume that(X1, Y1), …, (Xn, Yn)are independent and identically distributed random vectors from a2p-variate distribution with finite fourth moments. Letandbe defined as in (9) and (10), respectively. Then
whereis an arbitrary fixed vector and the expected value is taken with respect to the random vector .
Finally, we apply the theory on functions of asymptotically normal vectors (Serfling, 1980) to the result from the above theorem to obtain the asymptotic distribution of as follows.
Theorem 2Assume thatare independent and identically distributed random vectors from a2p-variate distribution with finite fourth moments. Letbe defined as in (8) and g represents the Frobenius norm. Then
where , andis defined as in Theorem 1.
Proof. By applying the theory on functions of asymptotically normal vectors (Serfling, 1980) to (11), we have
By the properties of matrix derivatives,
To obtain confidence intervals or test statistics for hypothesis testing about ρg, we need to calculate the estimates of the parameters of the asymptotic variances in (12). In addition to the estimates of and , defined in (9) and (10), we need the estimate of the variance–covariance matrix . According to Sen (1960), can be consistently estimated by
where , and
As shown in the paper by Lin (1989), the normal approximation of Lin's CCC can be improved by using the inverse hyperbolic tangent transformation or the Z-transformation. Confirmed by Monte Carlo study in Lin's paper, the Z-transformation accelerates the convergence to normality of the sample CCC not only when the samples are from the normal distribution but also when the samples are from (a) short-tailed symmetric distributions like the uniform and (b) long-tailed skewed-to-the-right distributions like the Poisson. The Z-transformation was also shown to effectively improve the normality approximation of the sample repeated measure CCC for both normal and contaminated normal data in the paper by King et al. (2007).
To improve the asymptotic normality of our sample MCCC , we also will invoke the Z-transformation for inference about the MCCC, ρg. Let be
Then it follows from the theory on functions of asymptotically normal statistics (Serfling, 1980, Theorem 3.1) that is asymptotically normal with mean
By replacing the parameters in the variance of with their estimates, we can obtain the confidence interval for Z denoted by and then by transformation we can obtain the confidence interval for ρg based on the Z-transformation as follows:
As noted by Lin (1989), the confidence interval of the MCCC based on the Z-transformation will be bounded in the open interval (−1,1) and more realistically asymmetric.
3. Simulation Studies
To assess the finite-sample properties of the sample MCCC, , and the corresponding Z-transformation, , when g is the Frobenius norm as described in Section 2.1, we performed a Monte Carlo simulation by generating the data from three distributions: multivariate normal distribution, multivariate Student's t-distribution, and multivariate lognormal distribution with three combinations of location and scale shifts and levels of correlation between X and Y. In each case, we consider three and five repeated measurements per unit for three levels of within-unit correlation (ρ= 0, 0.4, 0.8) with sample sizes of n= 20, n= 40, n= 80, and n= 160. For each of these situations, 1000 runs were performed using SAS/IML software. The scenarios considered here for this simulation study are similar to those considered by King et al. (2007).
3.1 Multivariate Normal Distribution
In this section, five repeated measures paired samples were generated from each of the following cases of the multivariate normal distribution.
Case 1 : Means and and covariance matrix Λ1⊗Λ2 where
and Λ2 is a 5 × 5 compound symmetric within-unit correlation structure with ρ= 0.4, assuming repeated measures have equal variance for both X and Y. This case represents a slight difference in location and scale parameters, and strong positive correlation between X and Y.
Case 2 : Means and and covariance matrix Λ1⊗Λ2 where
and Λ2 is a 5 × 5 compound symmetric within-unit correlation structure with ρ= 0.4, assuming repeated measures have equal variance for both X and Y. This case represents a moderate difference in location and scale parameters, and moderate positive correlation between X and Y.
Case 3 : Means and and covariance matrix Λ1⊗Λ2 where
and Λ2 is a 5 × 5 compound symmetric within-unit correlation structure with ρ= 0.4, assuming repeated measures have equal variance for both X and Y. This case represents a large difference in location and scale parameters, and weaker positive correlation between X and Y.
Then, the three repeated measures paired samples were generated from the same three cases of the multivariate normal distribution using the first three observations in and and a 3 × 3 compound symmetric within-unit correlation structure. In addition, all six situations were repeated with ρ= 0 and 0.8 instead of 0.4.
In each run, we calculated , their estimated asymptotic variances, and the 95% confidence intervals as described in Sections 2.2 and 2.3. Based on 1000 runs, for each scenario, we evaluated the normality, accuracy, precision, and coverage probability of the confidence intervals. To assess the normality, we examined Q–Q plots of and . To evaluate accuracy, we calculated the means of the estimates and their relative biases (relative bias = ). To assess precision, we calculated the average estimated asymptotic variances and the empirical variances of .
The Q–Q plots of and for sample size 20 based on a simulation of 1000 runs from the multivariate normal distribution for the scenario with three repeated measures and high within-unit correlation (ρ= 0.8) are shown in Web Figure 1. These Q–Q plots confirm that the distribution of is much improved by the Z-transformation for all cases. As expected, the distribution of both and get closer to the normal distribution for all cases as sample size increases. In addition, these results are similar for the scenarios with zero and moderate within-unit correlation and for the case of five repeated measures.
The means of estimates and their relative biases, the average estimated asymptotic variances (), the empirical variances of (Var()), and the empirical coverage probabilities for the 95% confidence intervals of ρg based on a simulation of 1000 runs from the multivariate normal distribution for three repeated measurements (p= 3) are shown in Table 1. For all cases, the average relative biases of are positive and become closer to zero as the sample size increases. With strong between-unit correlation (Case 1), the average relative biases of are less than 0.035 for all sample sizes and all levels of within-unit correlation. With moderate between-unit correlation (Case 2), the average relative biases of are less than 0.04 when n is greater than or equal to 40 and is about 0.08 when n is equal to 20 for all levels of within-unit correlation. For weaker between-unit correlation (Case 3), the average relative biases of are less than 0.07 when n is greater than or equal to 40 and is about 0.14 when n is equal to 20. To evaluate precision, we compared the mean of the estimated asymptotic variances of with the empirical variance of in each situation. For all scenarios, we found that the average asymptotic variance estimates of were very close to the empirical variances. Lastly, the coverage probabilities for the 95% confidence intervals of ρg based on the Z-transformation are close to 0.95 faster for Cases 1 and 3 but a little bit slower for Case 2. The results for the scenarios with five repeated measurements are similar to those for the scenarios with three repeated measures but the rate of convergence is somewhat slower than the three repeated measures case (results are shown in Web Table 1). In general, based on the accuracy, precision, and coverage probabilities, when the data are from the multivariate normal distribution, the estimator of the MCCC based on the Frobenius norm performs notably well especially when n is greater than or equal to 40.
Table 1. Mean of estimates and relative biases of, the asymptotic and empirical variance of, and the coverage probability for the 95% confidence intervals ofρgbased on a simulation of 1000 runs from the multivariate normal distribution with three repeated measurements(p= 3)
3.2 Multivariate Student's t-distribution
In this section, three and five repeated measures paired samples were generated from the multivariate Student's t-distribution with 10 degrees of freedom using the same scenarios as in the case of the multivariate normal distribution. As in Section 3.1, for each scenario, we evaluated the normality, accuracy, precision, and coverage probability of the confidence intervals based on a simulation of 1000 runs.
As in the normal case, for small sample size, the distribution of is much closer to normality than that of for all levels of correlation (some results are shown in Web Figure 2). Overall, based on the accuracy, precision, and coverage probabilities, when the data are from the multivariate Student's t-distribution, the estimator of the MCCC based on the Frobenius norm performs as well as in the normal case for all scenarios (results are shown in Web Tables 2 and 3).
3.3 Multivariate Lognormal Distribution
In this section, three and five repeated measures paired samples were generated from each of the three cases of the multivariate normal distribution and then transformed to multivariate lognormal distribution. As in Sections 3.1 and 3.2, for each scenario, we assessed the normality, accuracy, precision, and coverage probability of the confidence intervals based on a simulation of 1000 runs.
In general, based on the accuracy, precision, and coverage probabilities, when the data are from the multivariate lognormal distribution, the estimator of the MCCC based on the Frobenius norm performs very well for all levels of correlation especially when sample size is greater than or equal to 40 (results are shown in Web Tables 4 and 5).
In this section, we demonstrate the use of the MCCC for measuring an overall agreement between two vectors of repeated measures presented in Section 2 using some real examples.
4.1 Blood Draws Data
The data for this example are taken from an Asthma Clinical Research Network (ACRN) study reported by Martin et al. (2002). The main objective of this trial was to develop a reliable method to compare six different available inhaled corticosteroid (ICS) preparations in terms of systemic bioavailability as measured by effect on cortisol suppression. Three different outcomes were considered to evaluate this systematic effect, namely hourly plasma cortisol concentrations, 12- and 24-hour urine cortisol concentrations, and a morning blood osteocalcin. After one week of placebo run-in period, corticosteroid-naive asthma subjects enrolled at six ACRN centers were randomized to one of the six ICS and matched placebo groups. Following randomization, another placebo week was continued and then the subjects were admitted for an overnight testing at each of the next five weekly visits. During an overnight stay, an out-of-laboratory 12-hour urine collection was conducted between 8 A.M. and 8 P.M. and then in-laboratory urine cortisol collection and hourly blood sampling for cortisol was performed between 8 P.M. and 8 A.M.; blood for osteocalcin concentration was taken at 7 A.M. The area under the concentration-time curve (AUC) for hourly plasma cortisol measurements is considered the most reliable method to assess systematic effect. An additional interesting goal was to assess the agreement between the plasma cortisol AUC calculated from measurements taken every hour and measurements taken every two hours. This is very useful for future studies because the every two-hour analysis requires less sleep interruption and lower budget.
Table 2 shows the summary statistics of the hourly data (CortAuc1) and every two-hour data (CortAuc2) for each visit including the Pearson correlation coefficients between the two measurements. The scatter plots of the blood draw data for each visit are shown in Figure 1. Lin's sample CCC and the corresponding 95% confidence interval for each visit are also included in the graph. The scatter plots and Lin's CCCs indicate strong agreement between the plasma AUCs based on the hourly and every other hour data for all five visits. To measure the overall agreement between hourly and every two-hour data based on all five visits without any specific assumption about the pattern of agreement, the MCCC is a proper coefficient. The matrix is
which is somewhat close to the identity matrix, indicating a good level of agreement between hourly and every two-hour measurements. For these data, the point estimate and the 95% confidence interval of the MCCC based on the Frobenius norm are calculated as . Using the same data, the estimate of the repeated measures CCC proposed by King et al. (2007) is 0.958 with 95% confidence interval = . This result is based on the weight matrix consisting of equal on-diagonal elements and zero off-diagonal elements. According to King et al. (2007), the point estimate of the weighted CCC created by Chinchilli et al. (1996) for these data is 0.971 and the corresponding 95% confidence interval is . All of these results suggest high level of overall agreement between the two sets of plasma AUCs.
Table 2. Summary statistics of hourly data (CortAuc1) and every two-hour data (CortAuc2) for each visit with Pearson correlation coefficients
Mean of Cortauc1
Std dev of Cortauc1
Mean of CortAuc2
Std dev of CortAuc2
Pearson correlation coefficients
4.2 Body Fat Data
For this example, we use the data from the Penn State Young Women's Health Study conducted by Lloyd et al. (1998). In these data, percentages of body fat are obtained from 82 white female subjects at age 12.5, 13, and 13.5 years based on whole-body composition measurements made by dual-energy x-ray absorbtiometer (DEXA) and skinfold caliper. The summary statistics of these data are shown in Table 3. Figure 2 shows the scatter plots of the percentages of body fat along with the estimates of Lin's CCC and 95% confidence intervals, indicating moderate agreement for all three visits. The matrix for these data is
which is definitely not close to the identity matrix, but is not clear how far from the identity matrix. For ease of interpretation, the point estimate of the MCCC based on the Frobenius norm along with the corresponding 95% confidence interval is . This result indicates moderate agreement between the percentage of body fat measured by the DEXA and skinfold caliper. Based on this data set, the estimate of the repeated measures CCC using the approach suggested by King et al. (2007) using uneven weighting of the diagonal elements and zero off-diagonal elements is 0.568 and the 95% confidence interval is . Reported by King et al. (2007), the weighted CCC estimate proposed by Chinchilli et al. (1996) for the same data is 0.597 with the corresponding 95% confidence interval = . These results also suggest moderate agreement between the two data sets.
Table 3. Summary statistics of percentages of body fat measured by DEXA and skinfold caliper for each visit with Pearson correlation coefficients
Mean of DEXA
Std dev of DEXA
Mean of CALIP
Std dev of CALIP
Pearson correlation coefficients
We have introduced an index of overall agreement between two responses in the presence of repeated measurements, which is an extension of Lin's CCC. First, we developed a matrix that possesses the properties needed for assessing the amount of agreement between two vectors of random variables. For ease of interpretation we used a matrix norm called the Frobenius norm to transform this matrix to a scalar and scale its value to range between −1 and 1. We called this new repeated measures CCC “the MCCC.” This MCCC has desirable characteristics and can easily be used without any specific assumption about the model. For inference, we constructed an asymptotically unbiased estimator based on U-statistics and derived its asymptotic distribution. A consistent estimator of its asymptotic variance has also been proposed for obtaining confidence intervals or testing hypotheses. Moreover, we used the Z-transformation to bound the confidence limits and improve the rate of convergence. The simulation results confirmed that overall in terms of accuracy, precision, and the coverage probabilities, the estimator of the MCCC based on the Frobenius norm works very well in general cases especially when n is greater than or equal to 40.
It seems that the MCCC proposed here is similar to the repeated measure CCC (RMCCC) suggested by King et al. (2007) and Carrasco et al. (2009). However, it can be shown that our MCCC is totally different from the other two existing methods. First, the King et al. (2007) article proposes the RMCCC as
where D is a p×p nonnegative definite matrix of weights. Although ρg and ρc,rm have a similar structure when D is the identity matrix, they define very different parameters because the latter is based on the trace function and the former is based on a matrix norm. The trace function is not a matrix norm because property (2) is violated, for example, . Given the rigorous construction and more intuitive appeal of the statistical approach in our manuscript, we would prefer the MCCC to the RMCCC by King et al. (2007) for assessing agreement between X and Y in a repeated measurement setting. The article by Carrasco et al. (2009) builds on the King et al. (2007) article by invoking random effects assumptions and estimating the variance components accordingly, but it uses the same construction as ρc,rm.
Here, we used the U-statistics approach instead of applying the sample counterparts of mean, variances, and covariances because U-statistics possess many desirable properties such as unbiasedness and asymptotic normality under mild conditions (Lenth, 1983). To estimate the MCCC ρg we need the summation of dependent random variables for estimating , which is more complex than the usual summation of independent random variables. However, U-statistics can cope with this complex summation and have been proven to have some decent properties under minimal assumptions (Hoeffding, 1948).
In the future, the MCCC based on other forms of distance function may be considered and compared to the one based on the Frobenius norm. Furthermore, when the data are obtained by stratified random sampling, where each sample comes from a different subpopulation, one may need a weighted average of the MCCCs to evaluate overall agreement across strata. In addition, the MCCC may be generalized to evaluate agreement among more than two vectors of variables. These topics for extensions will be explored in future work.
6. Supplementary Materials
Web Appendices, Tables, and Figures referenced in Sections 2 and 3 are available under the Paper Information link at the Biometrics website http://www.biometrics.tibs.org.
We would like to thank the two anonymous reviewers and the associate editor for valuable comments and suggestions that greatly helped improve this current manuscript. We would also like to thank Professor Tonya S. King for her kindness and very useful recommendations about programming.