Arcsine‐based transformations for meta‐analysis of proportions: Pros, cons, and alternatives

Abstract Meta‐analyses have been increasingly used to synthesize proportions (eg, disease prevalence) from multiple studies in recent years. Arcsine‐based transformations, especially the Freeman–Tukey double‐arcsine transformation, are popular tools for stabilizing the variance of each study's proportion in two‐step meta‐analysis methods. Although they offer some benefits over the conventional logit transformation, they also suffer from several important limitations (eg, lack of interpretability) and may lead to misleading conclusions. Generalized linear mixed models and Bayesian models are intuitive one‐step alternative approaches, and can be readily implemented via many software programs. This article explains various pros and cons of the arcsine‐based transformations, and discusses the alternatives that may be generally superior to the currently popular practice.


| BACKGROUND
Many research findings in the health sciences are presented in the form of proportions, such as disease prevalence, case fatality rate, a diagnostic test's sensitivity and specificity, among others. 1,2 Metaanalyses have been increasingly used to synthesize proportions that are reported from multiple studies on the same research topic. [3][4][5][6][7][8][9][10] Many meta-analyses of proportions are performed using conventional two-step methods. First, a specific transformation is usually applied to each study's proportion estimate for better approximation to the normal distribution, as required by the assumptions of conventional meta-analysis models. 11 Second, the meta-analysis is performed on the transformed scale, and the synthesized result is then back-transformed to the original proportion scale that ranges from 0% to 100%.
Of note, one may also directly synthesize proportions without any transformation; however, this approach is not optimal, because the proportion estimates may not be approximately normally distributed, especially for rare events and small sample sizes. The Wald-type confidence intervals (CIs) of proportions may be even outside the range of 0% to 100%. 12 Various transformations are available for proportions, including the log, logit, arcsine-square-root, and Freeman-Tukey double-arcsine transformations. [13][14][15][16][17] Among them, the Freeman-Tukey double-arcsine transformation is a popular tool in current practice of synthesizing proportions. 10 We did a search on Google Scholar on June 17, 2020; for each year between 2000 and 2019, we searched for the exact terms "meta-analysis" and "double-arcsine" to obtain the number of research items using the double-arcsine transformation in meta-analyses. We also searched for the exact term "meta-analysis," with restriction to article titles, to obtain the rough number of metaanalysis publications in each year, and calculated the corresponding proportion of research items using the double-arcsine transformation. Figure 1 shows that the double-arcsine transformation has been increasingly used over the past two decades.
Despite the raising popularity, several authors have previously expressed concerns about arcsine-based transformations. 18,19 In addition, many meta-analyses do not even specify the transformation used for synthesizing proportions. 10 Even if a transformation is specified, meta-analysts frequently fail to provide sufficient justification for the selection of the transformation.
This article discusses the purported benefits of the arcsine-based transformations that potentially explain their popularity in current practice. We also introduce how such transformations may be limited, and recommend alternative methods for meta-analysis of proportions that may be superior. We focus on meta-analysis of single proportions, where the arcsine-based transformations are widely used.

| METHODS
Suppose a meta-analysis contains N studies that report single proportions on a common topic. Let p i be the proportion estimate from study i in the meta-analysis (i = 1, …, N). The proportion is then simply calculated as p i = e i /n i , where e i and n i denote study i's event count and sample size, respectively. The arcsine-square-root transformation is The Freeman-Tukey double-arcsine transformation is with variance v i = 1/(n i + 0.5). 13 Of note, the formula above of the double-arcsine transformation is the version originally presented in the article by Freeman and Tukey. 13 While one may also take the average of the two arcsine values (by dividing by 2), leading to the variance v i = 1/(4n i + 2), so that it has the same scale with the arcsinesquare-root transformation, such a linear transformation does not affect the back-transformed proportion.
Besides the arcsine-based transformations, the log and logit transformations are also frequently used for proportions. 10,20,21 Their formulas are more straightforward: the log transformation is y i = g (p i ) = logp i , with variance v i = 1/e i − 1/n i , and the logit transformation After applying a specific transformation to each study's proportion, conventional meta-analysis methods 22 are subsequently performed using the transformed data, that is, y i and v i , leading to the synthesized result y with a 95% CI. The synthesized result is finally back-transformed to the original proportion scale; the overall proportion is usually estimated as p = g −1 (y), and its CI limits are also backtransformed in the same manner.

| PROS
Because conventional meta-analysis models assume normally distributed data, 11  especially when the sample sizes are small. [23][24][25][26][27] In addition, in the presence of zero event counts, both log-and logit-transformed proportions cannot be calculated, and a continuity correction must be applied to the zero counts, usually by adding 0.5. [28][29][30] This correction may have considerable impact on the synthesized proportion for rare events. 31 In this sense, the arcsine-based transformations have the impor- depends additionally on a sample size that represents the overall synthesized result. 34 This "overall sample size" is not well defined in the meta-analysis setting; it may be selected as the harmonic, geometric, and arithmetic means of study-specific sample sizes, 19,34 or the inverse of the variance of the synthesized result 15 ; it is generally difficult to justify the value used as the "overall sample size." More importantly, different values may lead to substantially different proportions in some cases, potentially leading to misleading conclusions. 19 Moreover, numerical problems may occur when using the Freeman-Tukey double-arcsine transformation. Although this transformation refines the usual arcsine-square-root transformation by averaging over the double arcsines for better stabilizing variances, it may have low accuracy at values close to its domain limits, which likely occur in cases of rare events and small sample sizes. 35 Specifically, because the event count e i is between 0 and n i , as indicated in Equation (1)

| ALTERNATIVES
From a statistical perspective, event counts are typically assumed to follow binomial distributions, 36 and all transformations discussed above are applied to the binomial data for approximations to normal distributions within studies. With advances in statistical computing techniques, these approximations in the two-step methods may be unnecessary, because they can be feasibly replaced with one-step meta-analysis methods, including generalized linear mixed models (GLMMs) or Bayesian hierarchical models. 11,18,19,[37][38][39][40][41][42][43] GLMMs directly model event counts with binomial likelihoods and fully account for within-study uncertainties. [37][38][39] They use a specific link function to transform study-specific latent true proportion to a linear scale, on which random effects are specified in a manner similar to the conventional two-step methods. The logit link is the canonical link function for proportions (ie, binomial data), while many other links, such as the log and probit links, may be also used. 36  superior to the conventional two-step methods that require transformations of proportions within studies. 55 Importantly, in some cases, the one-step methods may lead to substantially different results from the two-step methods. 19 In future studies, it is worthwhile to explore the performance of the different methods with various transformations or link functions based on a large collection of empirical meta-analysis datasets, and quantitatively investigate the differences between the synthesized proportions produced by these methods. are not recommended at the between-study level. In fact, many early articles on the arcsine-based transformations were discussed in the setting of an individual study 13,59,60 ; these articles did not directly suggest extending the arcsine-based transformations to the metaanalysis setting.
In summary, we highly recommend the use of GLMMs or Bayesian models for synthesizing proportions; nowadays, many software programs are readily available for implementing them. Most meta-analyses of proportions published in recent years continue to use the Freeman-Tukey double-arcsine transformation, and the rate is increasing ( Figure 1); it is a time for change.

CONFLICT OF INTEREST
The authors declare no conflict of interest.

TRANSPARENCY STATEMENT
Lifeng Lin affirms that this manuscript is an honest, accurate, and transparent account of the study being reported, and that no important aspects of the study have been omitted.

DATA AVAILABILITY STATEMENT
Data sharing is not applicable to this article as no new data were created in this study.