Seriously misleading results using inverse of Freeman‐Tukey double arcsine transformation in meta‐analysis of single proportions

Standard generic inverse variance methods for the combination of single proportions are based on transformed proportions using the logit, arcsine, and Freeman‐Tukey double arcsine transformations. Generalized linear mixed models are another more elaborate approach. Irrespective of the approach, meta‐analysis results are typically back‐transformed to the original scale in order to ease interpretation. Whereas the back‐transformation of meta‐analysis results is straightforward for most transformations, this is not the case for the Freeman‐Tukey double arcsine transformation, albeit possible. In this case study with five studies, we demonstrate how seriously misleading the back‐transformation of the Freeman‐Tukey double arcsine transformation can be. We conclude that this transformation should only be used with special caution for the meta‐analysis of single proportions due to potential problems with the back‐transformation. Generalized linear mixed models seem to be a promising alternative.


INTRODUCTION
A key application of meta-analytical methods is the pooling of proportions, such as prevalence of a specific infection or disease. [1][2][3][4] Classic fixed-effect and randomeffects meta-analysis methods 5 are typically used to combine single proportions. In order to use these methods, proportions are generally transformed using either the log, 6 logit, 7 arcsine, 8 or Freeman-Tukey double arcsine 9 transformations. These transformations are implemented for pure mathematical reasons, eg, variance stabilization (details on the transformations are given in Appendix A and summarized in Table A1). For pooling, the trans-formed proportions and corresponding standard errors are used in the generic inverse variance method. 5 An alternative yet more elaborate approach based on the logit transformation are generalized linear mixed models (GLMMs), 10 which account for the binomial structure of the data and thus avoid the generic inverse variance method. Irrespective of the meta-analysis method and transformation, results are usually presented on the original probability scale after using the corresponding back-transformation.
Whereas the back-transformation of meta-analysis results is straightforward for the log, logit, and arcsine transformations, this is not the case for the Freeman-Tukey double arcsine transformation, albeit possible. 11 In order to calculate the inverse of the Freeman-Tukey double arcsine transformation, a single sample size has to be specified. Accordingly, for a single study, a one to one relation exists between transformation and its inverse, however, in a meta-analysis with different sample sizes the value of the back-transformation depends on the specified sample size. Typically, the harmonic mean of sample sizes is used in the back-transformation. 11

CASE STUDY: META-ANALYSIS ON PREVALENCE OF HEPATITIC C VIRUS INFECTIONS
We report results of meta-analyses with five studies estimating the prevalence of hepatitis C virus (HCV) infections in the general population of Nepal, which constitute a subset of an unpublished dataset with 28 studies. 12 This unpublished dataset comprises testing for a total of 972 123 individuals among whom 3696 were HCV antibody positive. The prevalence across studies ranged from 0% to 18.4% with a median of 0.5%. We restrict ourselves to the five-study subset for didactic reasons; the same issues encountered in this subset also exist in the full dataset.
We conducted classic meta-analyses using the arcsine, Freeman-Tukey double arcsine, and logit transformations, respectively. Furthermore, we fitted GLMMs implicitly using the logit transformation. Details on the statistical methods are provided in Appendix A. We used R function metaprop() from R package meta 13 (see Supporting  Information). Results are summarized in Table 1.
Under the fixed-effect model, results depicted as transformed proportions (middle column in Table 1) are very similar for the two methods using the arcsine and logit transformations, respectively. Whereas the random-effects estimates are also very similar with a slightly smaller confidence interval for the arcsine transformation, the results for the two logit methods are rather different due to a very different estimate for the between-study variance.
For easier interpretation, results are back-transformed to the original scale. Due to the small prevalences, we express results as HCV infections per 1000 observations. In Table 1    Looking at Figure 1, we see that the meta-analysis estimators are reasonable summaries of transformed prevalences. On the other hand, back-transformed meta-analysis results are clearly off the mark in Figure 2 with meta-analysis estimators smaller than all individual study results. Note that the back-transformation works as expected for individual study results, eg, the prevalence is 1∕29 = 0.03448 for study 26, which corresponds to 34.48 HCV infections per 1000 observations. The harmonic mean of 85 is obviously the wrong choice in this meta-analysis with sample sizes ranging from 29 to more than 200 000. Figure 3 shows the influence of sample size on meta-analysis results (see also Table A2). For sample sizes between 10 and around 120, results are exactly zero for the back-transformation of the Freeman-Tukey double arcsine transformation. The number of HCV infections per 1000 observations then steeply increases up to a sample size of 500 when the effect of sample size starts to slowly level out.
As noted earlier, the results of the random-effects model are very different for the two logit methods due to different between-study variance estimates. This discrepancy can be explained by looking at the confidence intervals of individual studies in the corresponding forest plots (Figures 4 and 5). Confidence intervals, based on the normal approximation, are much narrower for the two smallest studies in the classic random-effects meta-analysis ( Figure 4) than the confidence intervals, based on the Clopper-Pearson method taking the binomial distribution into account, 14,15 in the GLMM meta-analysis ( Figure 5). Apparently, in these two small studies with only 1 HCV infection and less than 50 observations, the assumption of a normally distributed logit transformed proportion is not fulfilled. With increasing numbers of infections and sample sizes, approximate and Clopper-Pearson confidence intervals get closer to each other. Obviously, the very narrow confidence intervals of the two smallest studies result in an inflated between-study variance estimate leading to a larger estimate for the pooled mean HCV prevalence and a

DISCUSSION
Our case study shows that meta-analysis results based on the back-transformation of the Freeman-Tukey double arcsine transformation 11 can be very misleading and even smaller than all individual study results. We observe similar undesirable results in a meta-analysis using the complete dataset with 28 studies. To our knowledge, this is the first publication reporting such an anomaly and erratic results.
In our view, the main reason for this unexpected behaviour is the very extreme pattern of sample sizes that range from 29 to more than 200 000. The harmonic mean of 85 is much smaller than 3 of the 5 sample sizes. For such highly skewed sample sizes, the harmonic mean is by definition rather small, which may result in nonsensical back-transformed probabilities.
In order to prevent misleading conclusions for the Freeman-Tukey double arcsine transformation, several sample sizes could be used to evaluate the sensitivity of meta-analysis results; however, this may lead to diverging meta-analysis estimates. In our example, using the arithmetic or geometric mean in the back-transformation (see Table A2) would result in random-effects estimates of 1.96 and 1.59 HCV infections per 1000 observations, respectively. Here, results for the harmonic mean are obviously wrong; however, it is rather unclear whether to rely on the results for the arithmetic or geometric mean. All other transformations (arcsine, logit, and log) do not have this intrinsic problem in the presentation of meta-analysis results.
Overall, the arcsine transformation appears to be the best classic method for the meta-analysis of single proportions. However, as application of GLMMs for meta-analysis is nowadays straightforward due to its implementation in common software, there is neither a real reason nor a clear advantage for using an approximate method. Accordingly, we support the viewpoint of previous works, 10,16-18 recommending the use of GLMMs for the meta-analysis of single proportions. From our perspective, the only disadvantage of a GLMM is that individual study weights are not available, which we consider as a minor drawback; analysts seeing this differently should use the arcsine transformation.
Our recommendation is purportedly in contrast to advice by Barendregt et al 1 promoting the use of the Freeman-Tukey double arcsine transformation over the logit transformation. However, this publication only considered these transformations under the classic meta-analysis model. We agree with Barendregt et al 1 that the use of the logit transformation is problematic in inverse variance meta-analyses with small event numbers or sample sizes; this is also visible in our example. These problems with the logit transformation under the classic meta-analysis do not translate to GLMMs. The classic meta-analysis model assumes that treatment estimates of individual studies follow a normal distribution that is obviously critical in studies with small numbers of events and observations. The arcsine and Freeman-Tukey double arcsine transformation are less affected by this normality assumption than the logit transformation. However, GLMMs taking into account the binomial structure of the data are not affected by this problem at all. 10,16

CONCLUSIONS
Our case study shows that the Freeman-Tukey double arcsine transformation should only be used with special caution for the meta-analysis of single proportions due to potential problems in the back-transformation of meta-analysis results. In our view, a sensitivity analysis using other sample sizes is mandatory for this transformation. GLMMs seem to be a promising alternative which is nowadays available in common meta-analysis software.

SUPPORTING INFORMATION
Additional supporting information may be found online in the Supporting Information section at the end of the article.

How to cite this article:
Schwarzer G, Chemaitelly H, Abu-Raddad LJ, Rücker G. Seriously misleading results using inverse of

APPENDIX A: STATISTICAL METHODS
We consider a meta-analysis of K studies where each study reports the number of events, a k , and the number of observations n k , k = 1, … , K. We assume that the number of events follows a binomial distribution. Specifically, cell count a k ∼ Binomial(n k , p k ), where p k denotes the probability of the event in study k. These probabilities are estimated from the observed number of events and sample sizes byp k = a k ∕n k .

A.1 Transformations
In this subsection, we briefly introduce the arcsine, Freeman-Tukey double arcsine, and logit transformations in the context of a single study. In the next subsection, the use of these transformations in meta-analyses will be described.

A.1.1 Arcsine transformation
The arcsine-transformed event probability AS k 8 is defined as An estimate of AS k is given by replacing p k withp k . The main advantage of this transformation is the property of variance stabilization. The approximate variance of̂A S k is calculated usingV where the approximation improves as n k increases. Notice that the approximate variance of̂A S k only depends on the sample size. A confidence interval for AS k can be constructed aŝA

A.1.2 Freeman-Tukey double arcsine transformation
The Freeman-Tukey double arcsine-transformed event probability FT k 9 is an average of two arcsine-transformed probabilities. Its estimate is given bŷ where the approximation-again-improves as n k increases. A confidence interval for FT k can be constructed following the same methodology for that of the arcsine transformed probability described above.

A.1.3 Logit transformation
The logit transformation is another classic transformation 7 defined as .
Again, an estimate of LO k is given by replacing p k withp k . The approximate variance of̂L O k iŝ It is clear from this variance formula that the approximate variance of a logit transformed proportion can become infinite if the number of events is zero or equal to the sample size. Typically, in this situation, a small increment is added to each denominator in order to yield a finite variance estimate. A confidence interval for LO k can be constructed following the same methodology for that of the arcsine transformed probability described earlier.

A.2 Meta-analysis of single proportions
We briefly describe both the classic meta-analysis method assuming approximate normally distributed study effects (ie, prevalence measures) as well as the generalized linear mixed model taking the binary structure of the data into account.
All methods are available in R function metaprop() from R package meta. 13

A.2.1 Classic random-effects model
Classic fixed-effect and random-effects meta-analysis methods using the inverse variance method 5 can be implemented to combine single proportions. As the random-effects model is a generalization of the fixed-effect model, we only introduce the random-effects model, which is defined aŝ where the 's and u's are independent. This model contains two sources of variation: the within-study variances 2 k , k = 1, … , K, and the between-study variance 2 . The classic meta-analysis methods assume that the variances 2 k are estimated without error bŷ2 k . The estimated effectŝk and corresponding standard errors k (which are assumed known) are used to estimate 2 with the restricted maximum likelihood method. 19 Results are very similar using the classic DerSimonian and Laird estimator, which is still the default in most statistical software for meta-analysis. The fixed-effect model is a special case when 2 = 0. Accordingly, results of fixed-effect and random-effects meta-analysis are identical if the estimatê 2 equals zero. Given estimates (̂k,̂k), the random-effects estimate of , denoted bŷR, iŝ Outperforms arcsine for small prevalences; arcsin √ (a + 1)∕(n + 1) ) sample size needed in back-transformation 11

A.2.2 Generalized linear mixed model
An excellent tutorial 10 describes how generalized linear mixed models can be utilized in the meta-analysis of event outcomes. One special case considered in the paper is the meta-analysis of single proportions, which-like the clas-sic meta-analysis model-assumes a normal distribution for the effect size (ie, transformed proportion) across studies. However, a binomial distribution is assumed for the number of events within a study, ie, a k ∼ Binomial (n k , p k ).
Using the above defined logit transformed proportion LO k , this relation can be re-expressed in the following way to define the random-effects model (1 + exp ( LO k ) n k instead of the likelihood from the normal distribution 10 and is also known as a random intercept logistic regression model that implicitly uses the logit transformation. Accordingly, the GLMM estimateŝG L F and̂G L R correspond to the logit transformed probabilities in the fixed-effect and random-effects model, respectively.
Estimation of GLMMs for meta-analysis of single proportions is straightforward with R function metaprop() by specifying argument method = "GLMM".
In principle, individual study weights could be derived from the likelihood contribution of each individual study; however, this information is at the moment not available in the utilized R software. Alternatively, the width of the Clopper-Pearson confidence intervals that also takes the binomial data structure into account 14,15 could be used to get approximate study weights.

A.3 Back-transformations
For a single study, several statistical methods exist to calculate a confidence interval for a single proportion. 14,15 These methods do not use the arcsine or the Freeman-Tukey double arcsine transformations, and therefore, the back-transformation is not strictly relevant for individual study results. However, in a meta-analysis context, the back-transformation of the (double) arcsine as well as the logit transformation is essential to report results on the original scale, ie, as proportions.

A.3.1 Arcsine back-transformation
The back-transformation/inverse of the arcsine transformation is defined as p AS k = sin ( AS k ) 2 . This back-transformation can be used for a single study as well as the result of a meta-analysis, eg, for the random-effects estimatêA S R and its lower and upper confidence limits. Miller 11 introduced the back-transformation of the Freeman-Tukey double arcsine transformation that was published almost 30 years after the initial publication. 9 For study k, the back-transformation is defined as

A.3.2 Inverse of Freeman-Tukey double arcsine transformation
This rather complex back-transformation arises from using an average of two arcsine transformed proportions. The sample size n k is included in the back-transformation, which is no problem for a single study. However, in a meta-analysis with different sample sizes, a single sample size has to be specified to apply the back-transformation. Miller 11 suggested to use the harmonic mean of the sample sizes, ie, ñ = K Accordingly, this harmonic mean ñ and the meta-analysis estimatêF T F or̂F T R are used in the back-transformation.

A.3.3 Inverse of logit transformation
The inverse of the logit transformation is defined as .
This well-known back-transformation can be used both for a single study and in a meta-analysis setting (classic method or GLMM).