Meta-analysis of skewed data: Combining results reported on log-transformed or raw scales

When literature-based meta-analyses involve outcomes with skewed distributions, the best available data can sometimes be a mixture of results presented on the raw scale and results presented on the logarithmic scale. We review and develop methods for transforming between these results for two-group studies, such as clinical trials and prospective or cross-sectional epidemiological studies. These allow meta-analyses to be conducted using all studies and on a common scale. The methods can also be used to produce a meta-analysis of ratios of geometric means when skewed data are reported on the raw scale for every study. We compare three methods, two of which have alternative standard error formulae, in an application and in a series of simulation studies. We conclude that an approach based on a log-normal assumption for the raw data is reasonably robust to different true distributions; and we provide new standard error approximations for this method. An assumption can be made that the standard deviations in the two groups are equal. This increases precision of the estimates, but if incorrect can lead to very misleading results. Copyright © 2008 John Wiley & Sons, Ltd.


INTRODUCTION
Meta-analyses of clinical trials, epidemiological studies and other types of study may involve continuous outcome data. Continuous data can be skewed, typical examples being concentrations (e.g. of plasma triglycerides), other ratio or reciprocal measures (e.g. percentage reduction), measures related to resource use (e.g. recovery time) or assessment scales when there is a large proportion of 'normal' participants with scores towards one extreme of the scale (e.g. measures of cognition in population-based studies). Standard inferences on the means of skewed data are valid for large sample sizes due to the central limit theorem, which determines that the mean of The geometric mean may be obtained as g = exp(z). A 95 per cent confidence interval for the geometric mean is given by Data available to a meta-analyst might be in one of the following formats, although the list is not exhaustive: (1) Mean and standard deviation of raw measurements (x and s x ).
(2) Mean and standard error for raw measurements (x and s x / √ n). (3) Mean and confidence interval for raw measurements (x,x l andx u ). (4) Mean and standard deviation of log-transformed measurements (z and s z ). (5) Mean and standard error for log-transformed measurements (z and s z / √ n). (6) Mean and confidence interval for log-transformed measurements (z,z l andz u ). (7) Geometric mean and confidence interval (g, g l and g u ). (8) Geometric mean and incorrect standard deviation (g and exp(s z )).
The formulae above can be used to convert any of (1)- (8) to either (i) the mean (x) and standard deviation (s x ) for raw measurements or (ii) the mean (z) and standard deviation (s z ) for log-transformed measurements. This should be undertaken before applying the transformation methods below.

METHODS FOR TRANSFORMING SUMMARY DATA
For variable X with a log-normal distribution, such that Z = ln(X ) ∼ N( , 2 z ) it is a standard result that the mean and variance of X are given by E[X ] = exp + 2 z 2 and var(X ) = (exp( 2 z )−1) exp(2 + 2 z ) We consider three methods for transforming between log-transformed and raw scales, that is, for estimating the mean and variance of X from the sample mean and variance of Z , or vice versa. The first two methods exploit the result above. In Method 1, we transform the mean and standard deviation within each group, and then make the comparison across groups. The standard deviations are thus allowed to differ in the two groups. Method 2 follows the same approach as Method 1, but assumes a common standard deviation underlying both groups. This assumption of common standard deviation could be made on either the raw or the log-transformed scale; we choose the latter as a generally more plausible assumption. Method 3 targets the difference between the groups rather than the group means separately. It does not assume a log-normal distribution for the raw data, and is applicable to other transformations as well as the log transformation. 6075 We also derive expressions for the standard errors of the estimators. One possibility for Methods 1 and 2 is to apply standard methods to the converted means and standard deviations for the two groups to obtain a difference in means and its standard error: we call this the 'ad hoc' approach. However, estimators based on the mean and standard deviation on the log scale are more efficient (have smaller standard errors); hence, the resulting standard errors are too small for conversions from raw to logarithm and too large for conversions from logarithm to raw. We therefore derive alternative standard errors from asymptotic Taylor series approximations. All the estimators below are 'plug-in' estimators derived by replacing the population parameters with their estimates; they are therefore likely to be unbiased in large samples but biased in small samples. Further work would be required to remove the small-sample bias.

Method 1 (separate standard deviations)
An approximate transformation from Z to X is obtained by substituting estimates for the unknown quantities in the standard result above. Solving the formulae for and 2 z yield the expressions for the opposite conversions. This moment-based approach has been described previously by Whitehead et al. [3]. For this method and Method 2, we denote the two exposure (or treatment) groups as i = 1 and i = 2.
From raw to logarithm: To convertx i and s x,i to an approximate mean and standard deviation on the log-transformed scale, takē (where the single dash onz i denotes transformation using Method 1), and The required difference in means on the log scale from Method 1 is given by The standard error is given by The 'ad hoc' estimator of var(z i ) uses the t-test formula: However, this wrongly assumes thatz i has been computed as an arithmetic mean. The alternative standard error is given by the Taylor series approximation The last two expressions were obtained by approximating s 2 2 , whose asymptotic accuracy was confirmed by simulation. Then we computed by expanding and using E[X n ] = E[e n Z ] = e n +n 2 2 /2 . A similar argument applies for the covariance.
From logarithm to raw: To convertz i and s z,i to an approximate mean and standard deviation on the raw scale, takex and The required difference in means is now The 'ad hoc' standard error is estimated using It can be seen that var B (x i ) < var A (x i ), and so the alternative standard error is smaller than the 'ad hoc' one.

Method 2 (common standard deviation)
Method 2 is similar to Method 1, but assumes a pooled standard deviation on the log-transformed scale.
From raw to logarithm: To convertx i and s i to an approximate mean and standard deviation on the logarithmic scale, we first transform the standard deviations and then pool them.
2) (where the double dash denotes transformation using Method 2). The required difference in means on the logarithmic scale is given by The 'ad hoc', t-test-type, standard error is given by and the standard error, based on Taylor approximation, is given by From logarithm to raw: To convertz i and s z,i to an approximate mean and standard deviation on the raw scale, we first pool the standard deviations. The required difference in means, an 'ad hoc' standard error and a standard error by Taylor series approximation are given respectively by

Method 3
Our third method follows from the following general result and applies directly to the difference between groups rather than the two group means separately. Let A = g(B) be the transformation of interest. Then, for example, g(B) = ln(B) or g(B) = exp(B) for the current application. Suppose the data have been analysed under a linear model for B: where T k represents covariates for individual k. For the simple comparison of two groups, T k represents only group allocation, and is the difference in means. Now let B be the overall mean, across values of T . Then a first-order Taylor series expansion about B gives The difference between the means of the two groups can then be estimated, by subtraction, as g (ˆ B )ˆ . The standard error is obtained similarly as g (ˆ B )SE(ˆ ). This first-order approximation neglects terms involving 2 and beyond, and neglects the term involving the variance of B. The former should be acceptable for small effect size , and the latter if the variance does not depend on T , i.e. if the spread of the distribution is similar across groups. The derivatives g (ˆ B ) turn out to be the overall geometric mean when transforming from logarithm to raw, and the reciprocal of the overall arithmetic (raw) mean when transforming from raw to logarithm. From raw to logarithm: To convert a difference in means on the raw scale to an approximate difference on the logarithmic scale, takex to be the overall arithmetic mean across groups on the raw scale, and use where d x and SE(d x ) are the difference in means and its standard error from raw means. From logarithm to raw: To convert a difference in means on the logarithmic scale to an approximate difference on the raw scale, takex geom to be the geometric mean of the geometric means 6079 across groups (equivalent to the exponential of the arithmetic mean of the means of log-transformed values), and use where d z and SE(d z ) are the difference in means and its standard error from log-transformed values.

APPLICATION: D9N POLYMORPHISM IN THE LIPOPROTEIN LIPASE GENE AND TRIGLYCERIDE LEVELS
Sagoo et al. conducted a systematic review of association between polymorphisms in the lipoprotein lipase (LPL) gene and coronary heart disease, and also studied plasma levels of cholesterol and triglycerides [4]. We address one particular meta-analysis of 14 studies of the association between triglyceride level and being a carrier or non-carrier of the D9N polymorphism in the LPL gene. Triglyceride levels are typically skewed, and are sometimes presented on the log scale. Through a combination of data extraction from the published reports and correspondence with the original investigators, the review authors obtained means and standard deviations on both logarithmic and raw scales for five studies, on the logarithmic scale only for one study and on the raw scale only for eight studies (Table I). Results for individual studies and meta-analyses are provided in Table II and Figures 1 and 2, for available ('true') data and for transformations using our various methods. Available data on the raw scale allowed meta-analysis of 13 of the studies. We also undertook meta-analyses of all 14 studies, making transformations from the logarithmic to the raw scale wherever this was possible. For five studies, the 'true' results can be compared directly with transformations from logarithmic data, and the results are similar in all cases ( Figure 1). Furthermore, there are no substantial differences across the different transformation methods (Table II, Figure 1). It is possible for the effect direction to change on transforming between metrics when assuming separate standard deviations. For example, the Boer 2003b transformed to the raw scale using Method 1 (Table II,  the carriers than in the non-carriers, compared with a lower mean (by 0.022) of logs in the observed data. This is because of the larger standard deviation of carriers than non-carriers on the log scale. However, the change in the point estimate is trivial in the context of its confidence interval. Available data on the log scale allowed meta-analysis of six of the studies. We also undertook meta-analyses of all 14 studies, making transformations from the raw to the logarithmic scale wherever this was possible. Again, for five studies, the 'true' and transformed results can be compared directly (Figure 2). One notable discrepancy is in the Copenhagen study, in which the 'true' mean difference is smaller than the values estimated by our transformations, and has a somewhat smaller standard error. The bias in the transformations may be because the standard deviations of raw triglyceride levels are relatively large compared with their means, combined with sample size imbalance (see also later simulation results, Table V), or because the data depart more substantially from a log-normal distribution in this study. Point estimates for Method 3 agree well with those for Method 2. In three studies (EARS, FOS and Reykjavik), the assumption of a common standard deviation has a more noticeable effect on the point estimate, so that Method 1 differs from Methods 2 and 3. The studies are also responsible for introducing heterogeneity into the meta-analyses and increasing the summary effect estimate for Method 1. These three studies have substantially different observed standard deviations between carriers and non-carriers (see also later simulation results, Table VI).

SIMULATION STUDY
We undertook a simulation study to compare the methods. Continuous outcome data were simulated for a single, two-group study, according to various distributions, and subjected to the three transformation methods, both converting the raw simulated data to the logarithmic scale and converting the logs of the simulated data to the raw scale. Since we knew the means and standard deviations on both scales (either theoretically or empirically), we could compare the estimated differences in means (and their standard errors) with those that would have been obtained had the data been analysed on the desired scale.
Our initial set of simulations used log-normally distributed data with equal standard deviations across groups (on the log scale), thus the distributional assumptions underlying all methods hold exactly, and only the asymptotic approximations would affect results. Each group had a sample size of 100. We then evaluated, with further simulations, (i) small sample sizes (10 rather than 100); (ii) imbalance in sample sizes across the two groups; (iii) different standard deviations in the two groups; (iv) a different skewed distribution (gamma distribution); and (v) lack of serious skew (normal distribution, with negative values rejected). The gamma and normal distributions were chosen to have (before rejection of samples) identical means and standard deviations on the raw scale to the initial log-normally distributed data. Full details of the data generation and the parameter values are provided in Table III. Illustrations of all distributions simulated are included in Figure 3.
For each scenario and parameter set (each row in Table III), we undertook 10 000 simulations. Each simulation produced three estimates (d z , d z and d z ) with five standard errors (SE A (d z ), and SE(d z )) for transformations from the raw to the log scale, and the corresponding numbers for transformations from the log to the raw scale. We summarized them using measures of bias, precision and coverage as follows, where d represents one of the three estimates.  Basic set: assumptions Bias: Bias was defined as mean estimated difference in means (d) minus true difference in means. For log-normal simulations and gamma simulations (raw scale only), the true values were known theoretically. For the others, the true mean differences were estimated empirically across simulations. We present mean bias for log-normal simulations and median bias for gamma and normal simulations due to some extreme and influential values.
Precision: We present mean values of estimated standard errors across simulations, separately for the Taylor series method, SE A (d), and the t-test ('ad hoc') method, SE B (d). We also present empirical standard errors of the estimated mean differences. For the log-normal simulations, these are calculated as empirical standard deviations over all 10 000 simulations. For the gamma and normal simulations, we present the difference between the 69th and 31st percentiles as an approximately equivalent measure (for a normal distribution, this difference equals the standard deviation).
Coverage: Coverage was defined as the percentage of simulations in which a 95 per cent confidence interval, obtained as d ±1.96×SE(d), included the true difference in means (theoretically or empirically obtained).
Monte Carlo errors for each reported value were calculated, as SD(d)/ √ 10 000 for mean bias, as SD(SE(d))/ √ 10 000 for estimated standard errors, as √ P(1− P)/10 000 for estimated coverage P, and from confidence intervals for medians.

Results of simulation study
Results for some of the simulations are provided in Tables IV-VI. Distributional assumptions met (log-normal distribution) : Table IV. For log-normally distributed data with equal standard deviations (on the log scale) and equal sample size, all methods work well when the standard deviation is small (Table IV, Sets 2 and 4). With a large standard deviation; however, three potential problems are apparent (Table IV, Sets 1 and 3). First, there is bias towards the null in Method 3 for the transformation from the log to the raw scale when the means are not equal. This is because of the omitted third and higher-order terms in the Taylor series. Indeed, we can show that for small difference between the groups and equal standard deviations, Method 3 estimates a fraction e − 2 z /2 of the true difference. Second, standard errors using the Taylor approximation are inflated when transforming from the raw to the log scale for Method 1. We believe this is because the asymptotic formula requires very large samples to be valid in this case, perhaps because of the large exponential terms. Third, t-test-based standard errors are too low for raw to log, and too high for log to raw, with corresponding under-or over-coverage. The conversion is in reality less efficient in the former direction and more efficient in the latter direction than is reflected in these 'naïve' standard errors. Empirical standard errors for large standard deviations are larger for Method 1 than for Method 2 converting from log to raw (Table IV, Sets 1 and 3) since in Method 1 the two standard deviations (which are not pooled) are subject to greater variability than the pooled standard deviation in Method 2; empirical standard errors are much smaller for Method 3 from log to raw due to the bias towards the null.
Small sample sizes: Results not shown; and imbalanced sample sizes: Table V. Findings were very similar for small sample sizes. The only identifiable sample-size-related problem is an increase in the standard errors for the Taylor approximation method from raw to log, resulting in lower coverage compared with the larger sample size (although in fact producing coverage around 95 per cent for Method 1). When sample sizes are unbalanced, there is bias for large standard deviations in all methods for both transformations (Table V, Sets 9 and 11). This is likely due to a small-sample bias that cancels out across groups when the sample sizes are equal. Coverage for Methods 3, which is adequate when sample sizes are the same, is reduced for unbalanced sample sizes when transforming from raw to log with large standard deviations. Table VI. Bias in Methods 2 and 3 (which assume a common standard deviation on the log scale) can be considerable (Table VI, Sets 13 and 15) when the standard deviations are genuinely different. Coverage drops to as low as 1 per cent in one scenario. Method 1 has broadly similar properties to the case of equal standard deviations, although there is a small bias in the point estimate.

Different standard deviations:
Alternative skew (gamma distribution); and no skew (normal distribution): results not shown. The transformation from raw to log scales is associated with very little bias. Taylor approximation standard errors are again high for Method 1 when the standard deviation is large. T -test-based    Transformations of normally distributed data to the logarithmic scale have good properties in the scenarios simulated. The opposite transformation produced some bias for all three Methods for one scenario with non-zero effect and large standard deviations.

DISCUSSION
In meta-analysis, it is desirable to combine effects measured on a common scale from as many studies as possible. One obstacle to achieving this is when results are reported on a log-transformed scale for some studies, but on the raw scale for other studies. We have presented several methods for transforming data from two-group studies presented on a logarithmic scale to a raw scale and from a raw scale to a logarithmic scale, thus enabling meta-analyses of all studies to be conducted on one or other scale. The methods also allow a meta-analysis to be undertaken on a log-transformed scale even if all studies report data on the raw scale. This enables estimation of a meta-analytic ratio of geometric means. Such a metric may provide a natural 'standardization' across studies, hence reducing heterogeneity, and provides an alternative to the ratio of arithmetic means that is sometimes used [5].
Our first method (Method 1) assumes log-normal distributions with different standard deviations, Method 2 assumes log-normal distributions with a common standard deviation (on the log scale), while Method 3 assumes no particular distribution, but requires similar distributional shapes in the two groups and small effect sizes. On application of the methods to an example, in which most data were reported on the raw scale, we observed some differences between the three methods. Some studies gave substantially different results for Method 1 because of a difference in standard deviations across groups; other studies gave different results for Method 3 because its associated standard errors can be different. In one study, all transformations produced a biased result.
We evaluated the properties of the three methods in a simulation study. This did not reveal a uniformly preferable method. All methods were reasonably robust to data having distributions other than the log-normal. The most serious threat to validity from among the scenarios we simulated was when the standard deviations differed between the groups. Method 1 offers clear advantages in this situation. When standard deviations are large compared with means, biased estimates (in either direction) can be obtained and there is a variation in the precision with which the three methods estimate differences in means: Method 3 produces the most precise estimates when transforming from the log to the raw scale; methods are similar when transforming from the raw to the log scale. We derived a Taylor approximation to the standard error and compared it with a t-test-based approach. The Taylor approximation can overestimate standard errors (particularly for raw to log transformations with large standard deviations), but otherwise seems to perform well. The more naïve t-test-based approach is less good as it treats transformed means and standard deviations as if they were simple arithmetic means. However, it can be implemented more readily in commonly used meta-analysis software such as RevMan [6], metan [7] (for Stata) and Comprehensive Metaanalysis [8]. Its performance is probably adequate for most meta-analytic purposes.
One possible extension to our proposed methods would be to replace our estimators, which are maximum likelihood and therefore may have small-sample bias, with bias-corrected estimators [9]. However, no closed-form standard error is available to our knowledge.
In conclusion, we recommend the use of Method 1 whenever standard deviations are likely to be different in the two groups, with Taylor approximation standard errors for the log to raw transformation. For transformations from raw to log scales, the Taylor approximation standard errors can be large, resulting in down-weighting of these studies in a meta-analysis. When standard deviations are similar, greater precision can be obtained using Method 2, especially when transforming to the raw scale. Method 3 offers a general framework that can be used for different data transformations.
Since the methods allow meta-analyses to be conducted on either the raw or the log-transformed scale, decisions on which scale to use will be required. Several considerations may guide the choice of scale, including (i) fidelity to the data available, by using the scale most frequently reported; (ii) best meeting meta-analytic assumptions, by using the scale believed to have less skew; (iii) minimizing consistency (heterogeneity) of results; (iv) applying the results to another problem (for example, if the results are to feed into a further analysis that requires data on a specific scale). The simulation study did not indicate consistently better properties of one direction of transformation over the other.