Beyond the forest plot: The drapery plot

In the era of the “reproducibility crisis” and the “P‐value controversy” new ways of presentation and interpretation of the results of a meta‐analysis are desirable. One suggestion that has been made for single studies almost six decades ago and taken up now and then is the P‐value function. For a given outcome, this function assigns a P‐value to each possible hypothetical value, given the data. Moreover, the P‐value function simultaneously provides two‐sided confidence intervals for all possible alpha levels. An application to meta‐analysis, while suggested early, has not been widely established. We introduce the drapery plot that presents the P‐value function for all individual studies and pooled estimates in a meta‐analysis as curves and the prediction range for a single future study. We also present a scaled variant with the test statistic on the y‐axis. Both plots visualize the full information of a pairwise meta‐analysis. We see a drapery plot as a complementary figure to a forest plot. It may be even an alternative in meta‐analyses with many studies where forest plots tend to become very large and complex.


| INTRODUCTION
Scientists have proclaimed a "reproducibility crisis" that manifests itself, among others, in selective publication of statistically significant results, often based on the conventional, but arbitrary significance level of .05. [1][2][3] In the biomedical literature an excessive use of P-values has been observed and criticized, leading to a "P-value controversy." 4 Wasserstein et al took the step to call for abandoning statistical significance. 5 While these and other authors mainly question the concept of statistical significance, a journal has completely banned P-values (Basic and Applied Social Psychology 6 ).
It is widely agreed that new ways of presentation and interpretation of the results of pre-clinical experiments, trials, and observational studies are desirable. 7 One suggestion is the P-value function which was introduced almost six decades ago, [8][9][10] taken up now and then, 11,12 and has recently been picked up again. 13,14 For a given outcome, this function assigns a P-value to each possible hypothetical value of the estimand, given the data. As the P-value function also simultaneously provides two-sided confidence intervals for all possible alpha levels, it is also called the confidence interval function or confidence curve. 12,15 An application of P-value functions in meta-analysis was also suggested, 15 but has not been widely established in the field.
In this paper we suggest a graph, the drapery plot, that presents P-value curves for (a) all individual studies in a meta-analysis as well as (b) the pooled estimates and (c) contains a prediction region for a single future study. Like a forest plot, the drapery plot thus visualizes the main information of a pairwise meta-analysis.
In Section 2 we present the definition of the P-value function, discuss its properties, and explain how a meta-analysis can be visualized as a drapery plot. We describe the interpretation of a drapery plot using two examples in Section 3 and discuss strengths and limitations in Section 4, and concluding with a recommendation in Section 5.

| METHODS
The P-value is a frequentist concept, defined in the context of a statistical hypothesis test. Let θ be the estimand, for example, a treatment effect. To test whether θ differs from a hypothetical value θ 0 , two hypotheses are formulated, the null-hypothesis H 0 : {θ = θ 0 } and an alternative hypothesis, H 1 : {θ 6 ¼ θ 0 } (two-sided test). A one-sided test can also be formulated comparing H 0 : {θ ≤ θ 0 } and H 1 : {θ > θ 0 } (or vice versa). In this paper, we focus on the two-sided setting.
Whether to reject the null-hypothesis is decided based on an estimatorθ informed by the data. To construct a statistical test, an assumption about the statistical distribution of the estimator under the null-hypothesis H 0 is needed. For example, the assumption of a normal distribution for a mean difference (MD) leads to a test statistiĉ θ MD −θ 0 À Á =SE MD whereθ MD denotes the observed mean difference and SE MD its SE.
To keep it simple, we assume throughout this paper that the test statistic is approximated by the standard normal distribution. For a fixed value θ 0 , the two-sided Pvalue p is defined as the probability under the nullhypothesis H 0 : {θ = θ 0 } that the absolute value of the test statistic is greater or equal than its observed value. Under the normality assumption, we may write this as where Φ is the probability function of the standard normal distribution. If p is large (eg, 0.8), the observed data does not contradict the null-hypothesis. The smaller the P-value, the less compatible with the null-hypothesis we think the data. A common convention for deciding whether the null-hypothesis can be rejected is to prespecify a threshold, the significance level (denoted by α) and to reject the null-hypothesis if p < α.
Frequently used values for α are 0.05 and 0.01.

| The P-value function
The concept of the P-value function goes back to Birnbaum (1961), Miettinen (1985), and Poole (1987). [8][9][10] Given the data and the distribution assumption, the P-value depends on the hypothetical value θ 0 . The idea of the P-value function F is to consider the P-value as a function of θ 0 , given the data, that is, under the normality assumption: We interpret values of θ 0 resulting in large P-values as well compatible with the data and values with small Pvalues as less compatible with the data.

| Range and symmetry
P-values range between 0 and 1. Because F depends only on the absolute distance jθ −θ 0 j, the P-value function is symmetric with respect to the observed estimate,θ . It takes its unique maximum 1 if θ 0 is exactly equal to the observed effect estimate,θ . With θ 0 increasing from −∞ toθ, jθ −θ 0 j strictly decreases and thus F(θ 0 ) strictly increases, while it strictly decreases for θ 0 >θ , approaching zero for θ 0 ! + ∞.

Highlights
What is already known The P-value function, though often suggested in the literature, has still not been widely established, particularly not in meta-analysis.

What is new
The drapery plot, a visualization of P-value functions for all elements of a meta-analysis, is now available in the R package meta. We offer two variants of the drapery plot, providing P-values, test statistics and confidence intervals for all alpha levels.
Potential impact for RSM readers outside the authors' field Drapery plots can serve as a complementary addition to forest plots in meta-analyses of all areas of application.

| A scaled version of the P-value function
The P-value function maps the hypothetical parameters to their P-values. Alternatively, the test statistic may be displayed directly. Applying the quantile function to the P-values provides

| Visualization of a pairwise metaanalysis
Pairwise meta-analysis combines the results of similar studies reporting the same outcome of interest. Typically a weighted average of study-specific estimates is calculated under a fixed effect (also called common effect) or random effects model. The random effects model assumes that the true study means are drawn from a distribution. Typically, a normal distribution is assumed with a variance parameter that is estimated from the data. The pooled estimate is defined as the mean of the between-study distribution. Its variance depends both on the standard errors of individual studies and the estimated between-study variance. An established and useful visualization of the results of a meta-analysis is the forest plot, showing the confidence intervals of the primary studies along with a confidence interval for each pooled estimate. It is recommended to add a prediction interval, which gives a region where the estimate from a single future study is expected, based on the random effects model. 19 In this paper we propose a drapery plot as a complementary figure to a forest plot. The drapery plot shows the P-value functions of all primary studies in one graph, together with a P-value curve for each pooled estimate and a shaded prediction region. In contrast to the forest plot, the drapery plot does not only show results for an (arbitrary) confidence level, but for any possible confidence level. We will elaborate on this feature in the next section.
Drapery plots are implemented in function drapery of R package meta. 20,21

| APPLICATION
We will use two meta-analyses, which are available in R package meta, 21 as illustrative examples to describe drapery plots. R code to reproduce our analyses and figures is given in Data S1 of this article.

| Smoking and lung cancer
The data were reconstructed based on the Smoking and Health Report to the Surgeon General 22 and reanalyzed in Schumacher et al. 23 They include the number of lungcancer deaths in smokers and non-smokers and the number of person-years in each group. The effect measure is the mortality rate ratio of smokers compared to non-smokers. Figure 1 shows a forest plot with the results for the common effect (blue diamond) and random effects meta-analysis (red diamond). The pooled lung cancer mortality was about 11 times higher for smokers compared to non-smokers, and also the prediction interval, though wide, was far away from 1. There is clear evidence that smokers have a much higher risk of lungcancer mortality. Figure 2 (left) shows a drapery plot. Gray curves correspond to primary studies, with study weights from the random effects model represented on a grayscale (studies with higher precision shown in darkgray, studies with low precision in lightgray). Each point estimate can be read off at the peak of the respective curve. The prediction region (lightblue) is broader than the P-value curve of the pooled estimates (red and black curves), indicating heterogeneity. Horizontal dashed lines can be used to identify confidence (prediction) intervals for common alpha levels (0.1, 0.05, 0.01). Figure 2 (right) shows a variant of the drapery plot with a quantile-scaled y-axis, such that all curves become triangles. Here, an alpha value of 0.001 is also printed in the figure. None of the curves crosses the null effect line (Incidence Rate Ratio IRR = 1).

| Effects of elevated CO 2 on total biomass of woody plants
This example comes from ecology. It is a meta-analysis on effects of elevated CO 2 on total biomass of woody plants, used as an example in Hedges (1999). 24 Figure 3 shows the rescaled variant of the drapery plot. Despite the large number of 102 studies with many overlapping P-value curves, we get a good impression of the data. The estimated effect is larger than 1 in the majority of studies; only five studies have an estimate below 1 (green curves). The common effect (blue curve) and random effects estimate (red curve) are very distinct, indicating a small-study effect. Similarly as in a funnel plot, this also leads to visible asymmetry: there are a number of studies with large uncertainty on the righthand side without counterparts on the left-hand side of the plot. These studies pull the random effects estimate (which is more susceptible to small studies) to the right. The large between-study heterogeneity is visible in the wide prediction region (in lightblue). While the 90% prediction interval does not include the null effect, the 95% prediction interval clearly includes the null effect. A forest plot for these data is available as Figure S1.

| DISCUSSION
In this paper, we introduce a graph, the drapery plot, to summarize the results of a pairwise meta-analysis. The graph builds on the well-established P-value function which, in our experience, has been rarely used in practice. Our implementation in R function drapery presents P-value curves for individual studies and meta-analysis estimates as well as a prediction region for a single future study. At present, R function drapery only provides P-value curves under the normality assumption. Bender et al, reviewing the history of P-value curves, critically discussed various terms for this function occurring in the literature. 12 They opted for the term confidence curve, arguing "If the confidence interpretation is applied, the terms confidence curve, confidence interval function and confidence distribution function are appropriate, whereas the term P-value function is misleading as the confidence level is plotted on the y-axis and not the P-value". Depending on the viewpoint, the function can be interpreted as providing P-values for shifted null hypotheses (varying θ 0 ) or confidence intervals for all levels, given the data. While we mainly used the term P-value curve in this manuscript due to its current popularity, 13,14 the drapery plot provides two y-axes on the P-value and confidence level scale enabling both interpretations.
The classic drapery plot -like the P-value curveshows the P-values directly on the y-axis (Figure 2, left panel). Berrar showed a kind of drapery plot he called "average confidence curve," without reference to metaanalysis (his Figure 5f). 18 We also implemented a variant of the drapery plot showing the negative test statistic, that is, a transformation of the P-value, on the y-axis ( Figure 2, right panel) which has been utilized before. 14 The main advantages of this scaled variant of the drapery plot are that (a) changes in P-values and confidence limits are linear and that (b) results for smaller P-values/ larger confidence levels are clearly visible. This feature is similar to a funnel plot using the recommended reversed SE on the y-axis instead of the inverse of the variance or SE. 25 Accordingly, the scaled version of the drapery plot is the default in R function drapery.
Thompson suggested a very similar plot based on a completely different consideration in the framework of Fuzzy Set Theory, called "fuzzy number plot," 26 see also Kossmeier et al. 27 Here, the y-axis represents the "membership grade," in this framework meaning the grade (between 0 and 1) to which an element belongs to a (fuzzy) set. Grade 1 means that the fuzzy set reduces to a one-point interval of length 0, here the observed point estimate. Intervals of positive length around the point estimate correspond to grades between 0 and 1, with longer intervals corresponding to lower grades of membership. Thompson's view is that "…in the absence of fuzzy set theory (ie, simply plotting confidence intervals in the same manner of fuzzy numbers), the metric of the vertical axis is unclear. Using fuzzy set theory, an established and interpretable membership grade is assigned to the vertical axis." [26, p. 965]. However, the membership grade is just a linear transformation of the test statistic, and we think the test statistic a concept more widely known to statisticians. For examples of fuzzy number plots, see the figures in the work by Thompson. 26 Bender et al 12 also discussed the scaling of the y-axis and recommended an inverse (top-down) scale for the y-axis, however, they did not consider our linearization transformation. Another potential transformation of the y-axis is the double-square root transformation which also emphasizes smaller P-values and has been used, for example, to compare P-values from different metaanalysis models. 28 However, this transformation does not have the linearization property.
We recommend the drapery plot as a complementary figure to a forest plot which is the most popular graph to summarize a meta-analysis. The distinctive feature/ advantage of the drapery plot is that confidence intervals for individual studies and pooled estimates can be read off directly for any confidence level. In contrast, a forest plot can only display one confidence level at a time. For example, looking at Figure 2, we immediately notice (a) that all individual studies are statistically significant at the alpha level 0.001 and (b) that the prediction region is well above an incidence rate ratio of 2.
Furthermore, we argue that a drapery plot could be a suitable alternative to a forest plot in meta-analyses with very many studies. Figure 3 shows that while a drapery plot can become busy in this situation, the overall figure is not large and fits on a single page. Accordingly, this graph could be used in an article or presentation instead of (or in addition to) a very long forest plot spanning over several pages.
Our implementation of drapery plots is very flexible and enables the user, for example, to label all or some studies or to choose the appearance of curves for individual studies (line color, width, and type) based on studylevel information. In Figure 2, some study labels have been rotated individually to avoid overlapping study labels. In Figure 3, study labels have been suppressed, however, curves of studies with an estimated ratio of means below 1 are printed in green. Similarly, study labels or curves could be modified based on other studylevel information like P-values of treatment effects or whether a treatment effect or confidence interval falls outside the prediction interval.
Nevertheless, a limitation of the drapery plot is that each individual study is not easily recognized or identified if there is a large number of studies like in our second example. This is a property shared by other types of plots such as scatter plots: like a scatter plot with a large number of points, a drapery plot gives a good impression of trends in the data, while it is not thought to inform readers about each single point. In fact, the drapery plot shares also another feature of the funnel plot: like in a funnel plot (and by contrast to a forest plot), there is no arbitrariness with respect to the order of the studies, as both axes have a specific meaning. Based on a drapery plot, heterogeneity, small study effects, and outliers can also be identified.
There are other R packages for creating P-value curves. Infanger and Schmidt-Trucksäss released a special R package pvaluefunctions, not directly intended for application to meta-analysis. 14 The R package concurve allows visualizing the results of a meta-analysis, based on R package metafor 29,30 ; however, it does not offer a simple way to draw P-value curves for all primary studies.

| CONCLUSION
In summary, we see drapery plots as a valuable addition to the toolbox of visualization methods for meta-analysis, either as a complementary figure for sensitivity analyses or as a potential alternative in meta-analyses with very many studies.

SUPPORTING INFORMATION
Additional supporting information may be found online in the Supporting Information section at the end of this article.