A Bayesian multivariate approach to estimating the prevalence of a superordinate category of disorders

Abstract Objective Epidemiological research plays an important role in public health, facilitated by the meta‐analytic aggregation of epidemiological trials into a single, more powerful estimate. This form of aggregation is complicated when estimating the prevalence of a superordinate category of disorders (e.g., “any anxiety disorder,” “any cardiac disorder”) because epidemiological studies rarely include all of the disorders selected to define the superordinate category. In this paper, we suggest that estimating the prevalence of a superordinate category based on studies with differing operationalization of that category (in the form of different disorders measured) is both common and ill‐advised. Our objective is to provide a better approach. Methods We propose a multivariate method using individual disorder prevalences to produce a fully Bayesian estimate of the probability of having one or more of those disorders. We validate this approach using a recent case study and parameter recovery simulations. Results Our approach produced less biased and more reliable estimates than other common approaches, which were at times highly biased. Conclusion Although our approach entails additional effort (e.g., contacting authors for individual participant data), the improved accuracy of the prevalence estimates obtained is significant and therefore recommended.

. Bayesian techniques are often used due to their flexibility and capacity to produce a credible estimate of beliefs given data from multiple sources (e.g., Ades & Sutton, 2006;Greenland, 2006).
Whereas aggregating individual disorder estimates is routine, it is unclear how to model the prevalence of superordinate categories-such as "any anxiety disorder" or "any cardiac disorder"-operationalized by combining multiple underlying conditions. We focus on an example from mental health, but our methods apply to any superordinate category. Within the health professions, superordinate categories play an important role by easing the interpretation and categorization of related symptoms and simplifying the identification of at-risk populations without becoming lost in the minutiae of individual disorders. They also serve important social functions. For example, nonexperts better comprehend broad statements (one in five women suffer from anxiety disorders) than more specialized statements (one in 10 women suffer from social anxiety). Superordinate estimates therefore help generate the "big picture" within which public campaigns are most effective.
One challenge is that the category itself is often operationalized differently across the literature. For example, studies of anxiety disorders often include only a subsample of the eligible disorders-because they are the focal target or because a thorough examination of the individual disorders is too intensive. Prevalence might therefore be defined in one study as the probability of having panic disorder or agoraphobia but in another study as the probability of having social phobia or obsessive-compulsive disorder. This represents the "apples and oranges" problem described in most textbooks (e.g., Borenstein, Hedges, Higgins, & Rothstein, 2009, p. 379): If the underlying construct varies to such a degree between studies, their aggregation rests on questionable grounds.
Although straightforward, either approach is likely to produce poor estimates. The exclusion of individual disorders in a category-level estimate for a given study should negatively bias category prevalence.
Including the number of disorders as a predictor and generating the prevalence of a study measuring all disorders may mitigate estimation bias, but this approach assumes that eligible disorders are equally prevalent-meaning that an increase of one disorder always influences the category-level prevalence in the same manner. Although potentially true in some cases, it is often a questionable assumption.
An alternative approach is to perform a multivariate meta-analysis of the individual disorders accounting for the fact that disorder prevalences are likely to be correlated both within and between studies (Jackson, Riley, & White, 2011). However, current multivariate approaches cannot estimate the prevalence of a superordinate category. This paper proposes a novel multivariate Bayesian approach that estimates the prevalence of a superordinate category while providing a more complete picture of its constituent disorders. We avoid aggregating prevalence estimates that vary in their operationalization, and instead model the prevalence estimates pertaining to the individual disorders and their interrelations. These parameters can be used to estimate the probability of having at least one of those disorders. In the following sections, we describe and then validate our model using a case study and parameter recovery simulations.

| Model
Put simply, our model uses aggregate and individual participant data (IPD) to estimate the prevalence of each individual disorder and the IPD to estimate the comorbidity between disorders. This portion of our model is implemented using Version 2.16 of the Stan modeling language (Stan Development Team, 2016) based on a binomial likelihood with a probit link function to estimate the mean prevalence and variability of each disorder across studies. Relevant code is available in Appendix A. In that appendix, Model 1 applies in the absence of IPD and assumes that the probit-transformed latent correlations between the disorders are known. Model 2 uses IPD to estimate the probit-transformed latent correlations between the disorders, incorporating uncertainty in these values into model estimates. In either case, we use those parameters to simulate a large sample of subjects from which to estimate the prevalence of having at least one disorder. This is a separate step conducted using the R programming language.
We define μ s,d to be the probit-transformed prevalence of disorder d in study s, and μ s, * = (μ s, 1 , …, μ s, D ) to be the vector of all probit-transformed prevalences in study s. For the between-study portion of our model, we assume that μ s,d follows a normal distribution with mean θ d and standard deviation τ d , and μ s, * follows a multivariate normal distribution with correlations given by matrix ω B . The calculations were implemented using a Cholesky factor decomposition of the correlation matrix to improve sampling efficiency, with a uniform LKJ prior (Lewandowski, Kurowicka, & Joe, 2009) for the Cholesky decomposition of ω B (see the Stan Reference Manual). Thus, we have the following: Predictors could be incorporated by adding an additional term β * x s to θ * where x s is the predictor value in study s.
For the probit-transformed population prevalences (θ d ), we used a mildly informative prior based on expert opinion. Because our case study models rare disorders, we used a normal distribution θ d ∼ N(−1.88, 0.30 2 ),implying that population prevalences Φ(θ d ) ranging from 0.6% to 10.0% would be considered probable (within 2 SD) for any given disorder. For the between-study estimates of standard deviation for each disorder (τ d ), we employed a half-normal distribution (truncated at 0) τ d ∼ N(0,0.25 2 ) with a location parameter equal to 0 and a standard deviation of 0.25, such that values larger than 0.5 would be unlikely. Presuming a mean prevalence of 3% for a given disorder, this would mean that 95% of "true" study-specific prevalences lie between 0.2% and 18.9%; we felt this to be a reasonable range that balances the influence of the prior with convergence.
Models using broader priors on τ d produced similar results, albeit with less shrinkage, but did not converge as readily and were characterized by greater uncertainty in τ d due to the relatively small amount of data available for any given disorder.
The within-study model is broken into two parts-one dealing with the aggregate data and the other dealing with the IPD. For studies where the IPD were unavailable, the number of participants with a given disorder (n s,d ) was modeled as arising independently from a binomial distribution with sample size N s equal to the sample size of study s and probability defined earlier: For studies where the IPD were available, prevalence was estimated using individual participant diagnoses. The binary (0 = no diagnosis, 1 = diagnosis) outcomes for each combination of participant and disorder were modelled as dichotomizations of an underlying multivariate normal distribution, with a threshold at 0 and a latent corre- These binary values are summed for each row to simulate the number of diagnoses for a given hypothetical participant, with the overall prevalence representing the proportion of hypothetical participants with one or more disorder. This approach estimates the probability of a participant suffering from one or more AD while propagating our uncertainty across parameters, including between-study heterogeneity (τ d ). Prediction intervals for a new study are calculated in the same manner, with the exception that only a single "true" prevalence is estimated for each posterior sample (i.e., 5,000 hypothetical participants drawn from the same hypothetical study).

| Implementation and sampling parameters
Stan is based on a variant of Markov Chain Monte Carlo sampling and is described in greater detail elsewhere (Hoffman & Gelman, 2014).
Each model employed four independent chains. For the case study, chains included 5,000 iterations minus a warm-up period of 2,500 resulting in 10,000 usable samples; for the simulations, chains included 1,000 iterations minus a warm-up period of 500 resulting in 2,000 usable samples. We recommend the former but reduced this number to make the simulations tractable; fewer samples in our simulations should at worst handicap our estimates. Convergence was tested via visual inspection and using the R-hat statistic (in all cases R-hat ≈ 1 and N Effective > 200, indicating convergence; Gelman & Hill, 2007).
We report all parameters in terms of their median value as well as their highest density interval (HDI; Kruschke, 2014 Zar, Wijma, & Wijma, 2002). The average number of disorders measured by a given study was 3.4; Figure 1 depicts the percentage of participants diagnosed with at least one disorder as a function of the number of disorders measured. These studies represent a partial sample from a meta-analysis of AD within peripartum populations (Fawcett, Fairbrother, Cox, White, & Fawcett, 2018). A partial sample was selected for illustration purposes as it allowed us to focus on the methodology itself rather than becoming lost in the details of the included articles; for this reason, it is important that the present analyses be used for demonstration purposes only. All but two studies (Fisher, Tran, Kriitmaa, Rosenthal, & Tran, 2010;Wenzel, Haugen, Jackson, & Brendle, 2005) permitted calculation of a superordinate (i.e., "any disorder") prevalence estimate-representing the probability of having at least one of the disorders provided above; these studies were excluded from Section 3.1.2 but included in Section 3.1.3.
They were selected despite the missing "any disorder" prevalence estimates to highlight the fact that our model makes better use of the available data.
3.1.2 | Univariate random-effects models using "any disorder" estimates We first analyzed the "any disorder" prevalences, using a series of univariate random-effects models meant to emulate current practice. In keeping with how these analyses have been conducted in the past, we logit-transformed the estimates for each study and aggregated them using the rma function from the metafor package (Viechtbauer, 2010). We fit this model twice-once to the full data set, including studies measuring any number of disorders, and again including only studies measuring three or more disorders. All analyses are summarized in Table 2. Clearly, requiring the inclusion of at least half of the measured disorders increased the estimated prevalence. This might suggest that researchers should require a minimum number of disorders for inclusion; however, the only logical criterion would require inclusion of all disorders. Unfortunately, such a criterion would exclude most of the extant data; only three studies (Fairbrother et al., 2016;Uguz, Gezginc, Kayhan, Sarı, & Büyüköz, 2010;Zar et al., 2002) in the current sample provided "any disorder" prevalence estimates inclusive of all six disorders.
Another approach is to use the number of measured disorders to predict the prevalence of a hypothetical study measuring all disorders (e.g., Baxter et al., 2013). We again fit this model twice, once including all studies and again including only studies measuring three or more disorders. This approach predicts a higher prevalence estimate but is unstable (see Sections 3.1.2 and 3.1.3). Importantly, whereas the effect of number of disorders was significant in the former model (P < 0.001), it was not significant in the latter model (P = 0.693)-likely owing to reduced statistical power combined with range restriction.

| Bayesian multivariate models
We next applied our model to the same data using the approach described in Sections 2.1 and 2.2. From the posterior of this model, we calculated the probability of having one or more, two or more, and so forth disorders. As depicted in Table 3, this estimate is close to those produced by either the univariate analysis of studies measuring all disorders or the initial regression model. However, the HDI is equivalent to or narrower than the confidence intervals from any of the previous models-even though Bayesian confidence intervals for aggregate effects are generally broader than their Frequentist counterparts because the former take uncertainty in τ into account (in this case for each disorder) whereas the latter assume that τ is known. This reflects-in part-the greater ability of our Bayesian model to make use of all data.
We next examined how well our model captured the data.
Figure 2 depicts observed and predicted disorder prevalence for each FIGURE 1 Prevalence (%) of "any" (i.e., having at least one) anxiety or related disorder (panic disorder, obsessive-compulsive disorder, generalized anxiety disorder, social phobia, specific phobia, and posttraumatic stress disorder) plotted against the number of disorders that were measured in that sample. Marker size represents the relative sample size pertaining to each point, ranging from N = 24 to N = 2,202 Kersting  (2011) were calculated from raw data inclusive of additional subjects beyond those reported in their article. Studies contributing IPD are marked with an asterisk (*) and "-" represent values that were not reported. IPD: individual patient data; Preg.: pregnant group; Post.: postpartum group; OCD: obsessive-compulsive disorder; GAD: generalized anxiety disorder; PTSD: posttraumatic stress disorder.
study. All estimates fall well within the boundaries of the model predictions. We also generated "any disorder" predictions for each study in the same manner, with the "any disorder" prevalence defined as the probability of having at least one of the disorders measured in that particular study. These predictions are provided in Figure 3 alongside the reported "any disorder" prevalence values, where available. Despite these data not entering our model, they are nonetheless well represented by the model's predictions.
The above analyses made use of mildly informative priors based on expert knowledge. We repeated the analysis using less informative (hence less realistic) priors. The uninformative prior placed on mean prevalence estimates was based on a normal distribution with a mean of −1.88 and a standard deviation of 1, calibrated such that mean prevalences ranging from <0.1% to 54.8% would be considered probable for any given disorder. For the between-study estimates of standard deviation for each disorder (τ d ), the SD of the half-normal was increased to 0.35, meaning that values higher than 0.70 would be uncommon. Presuming a mean prevalence of 3% for a given disorder, a probit-transformed standard deviation of 0.70 means that the "true" prevalence within any given study might vary credibly from <0.1% to 31.6%. Despite these unrealistic prior expectations, the model output remained largely unchanged-producing an overall estimate of 20%, HDI 95% (15%, 26%). Although the HDI has increased in size, this is because the priors allocated credibility to the possibility that as many as 80% of the participants in a typical study suffered from a given disorder, increasing uncertainty in τ d .

| General method
We next conducted three parameter recovery simulations comparing our model against the approaches described in Section 3.   Note. The estimated probability of having 5+ disorders was negligible and therefore excluded. Prevalence estimates refer to the prevalence within a "typical" study whereas the prediction intervals indicate instead the range of credible values estimated from a new study similar to those included in the model.
Our second and third simulations explored the effect of heterogeneity by changing τ to 0.1 (low heterogeneity) or 0.4 (high heterogeneity) for all disorders. In both cases, we also made "true" prevalences more diverse than in our case study (1.0%, 2.0%, 3.0%, 5.0%, 7.0%, and 9.0%). Our goal in doing so was to evaluate how the predictor approach would perform when the assumption of equal prevalence across disorders was violated.
For each simulated data set, the prevalence of having one or more of the individual disorders was estimated using the univariate approaches presented in Section 3.1.1 (e.g., Baxter et al., 2013;Goodman et al., 2016). We also used our Bayesian approach, first assuming that the between-disorder (ω C ) correlation matrix was known (Model 1), then estimating this matrix from independent participant data (Model 2). We fit the following to each simulated sample: a.) FRE: Frequentist random effects model fit to the "any disorder" prevalences ignoring the number of disorders measured; b.) FRE-H: as FRE but only including studies that reported at least half of the disorders; FIGURE 2 Prevalence (%) for each anxiety or related disorder (panic disorder, obsessive-compulsive disorder, generalized anxiety disorder, social phobia, specific phobia, and posttraumatic stress disorder) and sample within our case study, simulated based on sample size and parameter estimates derived from our model. Circles represent median prevalence estimates, and error bars represent HDI 95% ; the reported prevalence for each study is denoted by an "X." Predictions are provided for studies not measuring a given disorder, representing the probable prevalence of that disorder within that sample. These studies can be identified by the absence of an "X" in the plot and the absence of a numerical prevalence value in the third column For each model including a predictor, we also recorded the associated P value. We ran each simulation 100 times-with every iteration representing a simulated meta-analysis. Iterations required~6 hr each.
For each simulation study, the "true" prevalence of having at least one disorder was estimated by simulating IPD for 10,000 participantseach from a separate study-10,000 times, calculating the overall prevalence for each and taking the median.

| Simulation results
Each panel of Figure 4 depicts the prevalence of having at least one disorder as estimated by each procedure within each simulation. The "true" prevalence for each is depicted by a dotted line; these "true" prevalences depend on individual disorder heterogeneity and hence differ across simulations.
Neither ignoring the number of disorders in each study nor including them as a predictor is adequate. Ignoring variation in the number of measured disorders tended to catastrophically underestimate the "true" prevalence. This underestimation was mildly improved by requiring that a nominal number of disorders be measured for inclusion; we included as our cutoff the midpoint of the number of disorders measured, but this procedure should approach an unbiased FIGURE 3 Prevalence (%) of having at least one anxiety or related disorder (panic, obsessive-compulsive disorder, generalized anxiety disorder, social phobia, specific phobia, and posttraumatic stress disorder) for each study within our case study simulated based on sample size and parameter estimates derived from our model. Circles represent median prevalence estimates and error bars represent HDI 95% ; the reported prevalence for each study is denoted by an "X." Predictions are provided for studies for which the "any disorder" prevalence was unavailable, representing the probable prevalence of having at least one disorder within that sample. These studies can be identified by the absence of an "X" in the plot and the absence of a numerical prevalence value in the third column.
FIGURE 4 Prevalence (%) of having one or more disorder within each simulated meta-analysis using each of the estimation approaches described in text for Simulation 1 (τ and prevalence based on case study), Simulation 2 (τ = 0.1),s and Simulation 3 (τ = 0.4); the "true" prevalence is represented by a dotted line within each plot estimate as the required number of disorders approaches the total number of possible disorders. Imposing this limitation necessarily decreases the number of studies available for analysis, making it inefficient. Including the number of disorders as a predictor resulted in an unbiased but variable estimate. A slight positive bias emerged in Simulations 2 and 3, where the prevalences differ from one another to a greater degree, and the variability of the estimate scaled with increasing heterogeneity.
In contrast, the Bayesian models produced efficient estimates with no discernable bias when heterogeneity was moderate or low (Simulations 1 and 2), with a slight negative bias for BMV-K (median bias of approximately −1%) when heterogeneity was high (Simulation 3). This bias is slight, so we do not wish to over-interpret; nonetheless, we speculate that it arises because all values of τ within this simulation are near the upper range of our prior. Therefore, a small amount of shrinkage is expected. Estimation of τ has consequences for the overall prevalence in part because of the use of a normal distribution on the probit scale: for a fixed probit-transformed prevalence of less than 0 (corresponding to prevalences below 50%), increasing τ increases both left and right tails on the probit scale, but this has a greater impact for the right tail on the probability scale. As a result, larger τ means larger prevalence (assuming μ is unchanged). Inspection of the posterior estimates across our simulations supports this convictionwith the τ for each disorder being underestimated on average by 0.05 (results not shown). Such underestimation is negligible for individual disorders but could aggregate to produce a slight bias overall.

BMV-IPD was unaffected.
Our model is further supported by coverage estimates and confidence interval widths provided in Differences in the precision of estimates from FRE-M and BMV-IPD are larger in the simulations than in the case study. Because the BMV-IPD approach uses individual disorder estimates, the sample on which our case study is based includes a preponderance of single disorder papers that are otherwise rare in this type of analysis (e.g., Goodman et al., 2016) where studies measuring multiple disorders are preferred. Our simulations match the overall mean number of disorders measured from the case study but do not specifically reflect this idiosyncratic quality. Single disorder estimates will be less variable (they are affected by heterogeneity from only a single disorder) and would also have a high degree of influence in the predictor analysis, stabilizing the slope and therefore increasing precision of the estimate. Additional simulations support this interpretation, showing that meta-analyses forced to emulate the distribution of disorders measured from the case study produce more precise estimates for the predictor approaches due to reduced uncertainty in the slope (results not shown). Importantly, inclusion of so many single disorder estimates is uncommon and still rests on the assumption that disorders are equally prevalent and comorbid. Our model outperforms a purely predictor based model-even under these conditions-just not as drastically.  whereas most studies include individual disorder prevalence estimates -not all studies provide workable category-level estimates, either due to inclusion of ineligible disorders (e.g., depression) or omission of the "any disorder" prevalence. Therefore, the number of studies eligible for inclusion is higher for our model, and the data obtained are put to better use.

| Access to an informative posterior distribution
Our model produces a posterior distribution representing the combination of all pertinent knowledge. In addition to prevalence estimates for the individual disorders, this posterior provides the probability of any arbitrary event or confluence of events. For example, Table 3

| Easy handling of dependencies between samples
One common challenge that faces meta-analysts is dealing with multiple samples from the same study or population; such estimates are dependent on one another in a manner that will artificially deflate uncertainty in the aggregate prevalence estimate if not addressed.
For the Frequentist approaches described in Section 3, this dependency is often ignored; however, they are readily handled in our Bayesian approach-and in fact, complex random structure could be implemented. Our case study included two studies with multiple samples (Fisher, Tran, et al., 2010;Mota, Cox, Enns, Calhoun, & Sareen, 2008), which were handled by estimating a single "true" prevalence used by all samples from the study in question.

| Use of prior knowledge
Our approach incorporates expert knowledge into the model to improve estimation. As demonstrated earlier, the improvement in the precision of an estimate due to a reasonable (but skeptical) prior can be substantial. Given the availability of expert knowledge, this would seem to be an easy way to improve model efficiency.
4.2 | Potential limitations 4.2.1 | Assumes access to IPD The first limitation is that we assume access to IPD from which to estimate the within-study correlations amongst the individual disorders (ω C ). Without those data, our model must assume that ω C is known (Model 1), which is uncommon. Although gathering individual data can be difficult, the improved accuracy is worth the effort. This effort could be lessened if those reporting epidemiological studies shared their data or provided a table summarizing each participant suffering from at least one disorder, with a list of each diagnosis they received (e.g., 5 × panic disorder, 2 × panic disorder + posttraumatic stress disorder; e.g., Zar et al., 2002).

| Assumes homogeneity of within-study correlations between disorders (ω C )
This assumption was made partially to simplify our model but also because we do not believe that IPD from a broad enough sample of studies is generally available to estimate heterogeneity amongst the correlations. We highlight this as a potential area for future development.

| Assumes studies are sampled from a larger population
Interpretation of any meta-analysis-including those based on the current approach-is complicated by the presence of heterogeneity. This is because samples included in the model are assumed to be drawn randomly from a broader population of potential samples. Therefore, consideration must be given to the populations being studied, and the results must be interpreted in light of the distribution of prevalence estimates (e.g., using prediction intervals).

| Assumes generality based on a single application
Current findings suggest our model to be an improvement over existing approaches; however, these findings (including the parameter recovery simulations) are based on a particular application in the field of Psychology. Although we expect fully that our model will generalize to other topics, as with all new techniques, it is possible that the observed benefits may be linked to the circumstances of the selected case in unexpected ways. Future applications to other fields will determine whether this is true.

| Alternative approaches and extensions
Although we demonstrate our multivariate Bayesian model to be in many ways superior to the univariate Frequentist models previously used to estimate the prevalence of a superordinate category (e.g., Goodman et al., 2016), there are certainly alternative approaches to address this problem. For example, one could use the expectation maximization algorithm to estimate the prevalence of each of the individual disorders, treating unmeasured disorders within a given sample as missing data. One could also implement a multilevel, multivariate Frequentist model (e.g., using metafor; Viechtbauer, 2010) to estimate the prevalence of the individual disorders. However, in either case, it would be necessary to combine the individual disorder estimates into an estimate of the superordinate category prevalence.
This would most likely involve a simulation procedure similar to that described in Appendix B-and would necessitate inclusion of IPD to estimate the comorbidity across disorders. For that reason, we do not expect either solution to be simpler than the current approach and neither would not benefit from most strengths detailed earlier. In short, whereas other approaches may be preferable under certain circumstances, we believe that complex evidence synthesis often benefits from a Bayesian approach such as ours (e.g. Ades & Sutton, 2006).
Even so, the current model represents only an initial step towards developing a general method for estimating the prevalence of a superordinate category. For that reason, there are many possible extensions. As one representative example, our model could be modified to address bias caused by variation in measurement or selection across studies, perhaps using bias modelling methods such as those of Turner, Spiegelhalter, Smith, and Thompson (2009). This would permit quantification of-and adjustment for-biases due to variation in the quality of the included studies. This would reflect a clear improvement and is a potential target for future development.

| CONCLUSION
Estimating the prevalence of a superordinate category of disorders based on studies with differing operationalization of that category is both common and ill-advised (e.g., Baxter et al., 2013;Goodman et al., 2016;Guo et al., 2016;Steel et al., 2014). We propose instead a Bayesian model using IPD that we have shown to produce unbiased, efficient estimates where other approaches are biased and/or inefficient. The accurate estimation of disease prevalence is of profound clinical importance, because it serves to guide public policy.
To use our case study as an example, a shift from 9% to 19% in the estimated prevalence of peripartum AD means a change from one in 10 to one in five peripartum women suffering from one or more AD. Such a shift has implications for the allocation of public funds and even screening procedures. For this reason, we believe that the current Bayesian approach will have real clinical importance and hope that the present article will encourage future meta-analysts to adopt a similar approach when estimating superordinate category prevalence in the future.

SUPPORTING INFORMATION
Additional supporting information may be found online in the Supporting Information section at the end of the article. The R code necessary to generate the overall prevalence estimates of the models reported in Appendix B is included as a separate file.