The consequences of neglected confounding and interactions in mixed‐effects meta‐regression: An illustrative example

Analysts seldom include interaction terms in their meta‐regression model, which can introduce bias if an interaction is present. We illustrate this by reanalysing a meta‐regression study in acute heart failure. Based on a total of 285 studies, the 1‐year mortality rate related to acute heart failure is considered and the connection to the study‐level covariates year of recruitment and average age of study participants are of interest. We show that neglecting a possibly confounding variable and an interaction term might lead to erroneous inference and conclusions. Based on our results and accompanying simulations, we recommend to include possible confounders and interaction terms, whenever they are plausible, in mixed‐effects meta‐regression models.


Highlights
What we know so far • In meta-regression interaction terms of moderators are often not considered.
• Omitting important variables in a regression model can lead to biased estimates, whereas including redundant variables increases the variance of estimates.
What is new • Re-analysis of a meta-analysis on acute heart failure illustrates how neglecting possibly confounding variables and interaction terms can alter the results of a meta-regression model. • A small simulation study is conducted, which shows how neglecting and unnecessarily including moderators and interaction effects confidence intervals estimates in meta-regression.
Potential impact for RSM readers outside the authors' field • The results suggest to include possible confounding variables and interaction terms in meta-regression, whenever they are plausible, because otherwise wrong conclusions might be drawn.

| INTRODUCTION
In meta-regression, interactions reflecting effect moderators are often neglected. Certainly, in many situations, ignoring interactions is due to the small number of available studies. 1 But even if enough studies are available, interactions are often not considered. 2,3 As we know from ordinary least squares regression, this can lead to biased estimates, whereas including redundant moderators and interactions only increases the variance of the estimates (see Reference [4], pp. 99-101). A further common issue in statistical analyses that can lead to incorrect or misleading results is confounding. It arises when the relationship between an independent variable and a dependent variable is influenced by a third variable, which is ignored in the analysis. The latter is known as a confounding variable. A formal definition can be found elsewhere. 5 The topic is extremely important when distinguishing between mere associations and causal effects, see for example the work of Friedrich et al. 6 A discussion of confounding in meta-analysis can be found in the article of Durlak and Lipsey. 7 In our work, we illustrate these points by a reanalysis of a recent meta-analysis, including mixed-effects metaregression analyses: Kimmoun et al. 3 analysed mortality and readmission to hospital after acute heart failure in clinical studies. They found a statistically significant decline of mortality over calendar time. However, the average age of the patients also decreased over calendar time. This suggests that the observed trend might at least partially be explained by the confounding variable average age or a neglected interaction between the average age and the median recruitment year. Therefore, we conducted meta-regressions where we (i) included not only the recruitment year as a moderator but the average age as well and (ii) both variables and their interaction. We conclude with some practical recommendations, when possible confounding variables and interaction terms should be included.

| METHODS
A mixed-effects meta-regression model with two moderators and an interaction term is of the form where k denotes the number of studies, y i is a function of the effect measure, x 1i and x 2i denote the moderators, u i the random effect and e i the sampling error of the study i ¼ 1, …, k. The u i and e i are usually assumed to be independent and follow normal distributions with u i $ N 0, τ 2 ð Þ and e i $ N 0, v i ð Þ. 8 In the following analysis, the parameter vector β ¼ β 0 , β 1 , β 2 ,β 12 ð Þ T is estimated via weighted least squares regression using inverse variance weights. 9 The variance of the random effect τ 2 is estimated via restricted maximum likelihood estimation (Reference [10], pp. 372-373) since it has been recommended as a good choice of estimator for continuous outcomes in other research. 11 The confidence intervals for the moderators are calculated using the Knapp-Hartung method, 12 because it was shown to be more accurate compared to other covariance estimators in simulation studies. 13,14 The models with only one and only two moderators in Section 3 can be seen as a special case of the model in Equation (1). All analyses were conducted in R 4.1.1 15 using the metafor package. 16 The R-Code will be made publicly available.

| META-REGRESSIONS FOR ACUTE HEART FAILURE STUDIES
The research synthesis by Kimmoun et al. 3 included 285 studies on acute heart failure, published between 1998 and 2017. Outcome measures were 30-day and 1-year readmission rates and 30-day and 1-year mortality in 108, 61, 148 and 204 of the studies, respectively. Furthermore, study characteristics like the median year of recruitment, the number of patients for the follow-up and the average age of the patients (in 260 studies) were reported. Other variables provided information about the medication and medical history of the patients. For more information, we refer to Kimmoun et al. 3 ; the authors made their dataset publicly available.

| Meta-regression analysis with a univariable model
A major finding of the analysis of Kimmoun et al. 3 was a statistically significant decline in the 1-year mortality over calendar time. The decline is shown in the left panel of Figure 1 (adopted from figures 1A and 2B in Reference [3]). A meta-regression with the logit transformed one-year mortality as the outcome variable y i from Equation (1) and the year of recruitment as the only explanatory variable x 1i was conducted for k ¼ 204 studies. The estimates of β 0 and β 1 are b β 0 ¼ 29:432 (95%-CI: [7.328, 51.537]) and b β 1 ¼ À0:015 (95%-CI: [À0.263, À0.042]), so a significant decline in mortality over calendar time was observed. However, the average age is decreasing in the recruitment year as well. In the right plot of Figure 1, it is shown that the average age of the participants decreased by 1.56 years every 10 years. Therefore, it is of interest whether the observed effect of the recruitment year is confounded by the average age or if there is an interaction between those two variables.

| Meta-regression analysis with a two-variable model
In order to check for possible confounding, we considered a model with two moderators. In this model, the (logit transformed) one-year mortality was regressed on the year of recruitment (x 1i in Equation (1)) and the average age (x 2i ) for 181 studies. The estimated intercept ( b β 0 ¼ 12:9584, 95%-CI: À11:0633,36:980 ½ ) and effect of the recruitment year turned out to be not significant ( b β 1 ¼ À0:0081, 95%-CI: À0:0200,0:0038 ½ ), while the estimated effect of the average age was significant ( b β 2 ¼ 0:0299, 95%-CI: 0:0178,0:0420 ½ ). This suggests, that in the univariable model the effect of the recruitment year was in fact confounded by the variable average age. This is also visible in Figure 2, which shows the predicted effect of each explanatory variable at the median level of the respective other variable. Hence, fitting a mixedeffects meta-regression model which also includes the average age as a moderator reveals that the apparent time trend is essentially captured by the changes in age over calendar time. Their observed time trend seems to stem from the neglected confounding variable average age, which was not considered in the meta-regression model of Kimmoun et al. 3 with one moderator only. However, it could be that the recruitment year influences the mortality not directly but via an interaction between the recruitment year and the average age.

| Meta-regression analysis with interaction
To check this, we considered a model with two moderators and their interaction, where we used centred explanatory variables to reduce the impact of multicollinearity (Reference [17], pp. 28-31). In the model the (logit transformed) one-year mortality was regressed on the year of recruitment (x 1i in Equation (1)), the average age (x 2i ) and their interaction for 181 studies. While the average age has a significant effect on the one-year mortality ( b β 2 ¼ 0:0333, 95%-CI: 0:0208,0:0457 ½ ) the year of recruitment turned out to be not significant ( b β 1 ¼ À0:0066, 95%-CI: À0:0185,0:0052 ½ ) in the meta-regression model with two moderators and their interaction. However, the interaction term ( b β 12 ¼ À0:0018, 95%-CI: À0:0035, À0:0001 ½ ) and intercept ( b β 0 ¼ À1:1477, 95%-CI: À1:2271, À1:0684 ½ ) are significant at level 0.05 as well. This suggests, that the recruitment year does not influence the one-year mortality directly, but via an interaction with the average age. In Figure 3, it is shown how this interaction influences the predicted one-year mortality. In the left panel, the predicted time trend in mortality is shown for a comparatively low average age of 60.5, while in the left panel it is shown for a higher average age of 79.5. The average ages 60.5 and 79.5 correspond to the lower and upper quartile among all average ages of the included studies, respectively. While the time trend is increasing for studies with a lower average age, it is decreasing for studies with a higher average age. Hence, this model leads to the conclusion that there is a time trend present in the data, which depends on the age of the patients incorporated in the study.

| DISCUSSION
Motivated by a meta-regression on a temporal trend in mortality of acute heart failure patients, 3 we considered the impact of a neglected confounding variable and interaction in a meta-regression. Re-analysing the study of Kimmoun et al., 3 we showed that the time trend found in the study is questionable. Considering a model, where not only the median year of recruitment but also the average age is included as a moderator, leads to the conclusion that the time trend is confounded by the effect of the patients' age. However, when the interaction of those variables is included as well, the results indicate that there is in fact a time trend, but the time trend depends on the age of the patients. This shows the importance of including possible confounding variables and their interactions in meta-regression models. In the supplement to this paper we also conducted a small simulation study, where we analysed the impact of omitted and redundantly included interactions more generally. The simulation settings are based on the acute heart failure data above. The results are analogous to insights from multiple-regression literature, that is omitting regressors leads to biased estimates, while including unnecessary regressors causes more variance (see e.g. Reference [4], pp. 99-101). Finally, we would like to stress that limitations in software 18 can be another potential reason why most meta-regressions studies only include one-covariate at a time.
A limitation of this paper is that the conducted metaregressions include 204 and 181 studies, respectively. This is much more than is common in many meta analyses. 1 We conducted another simulation study that showed that with a subsample of only 30 studies, the significant impact of the interaction term often cannot be detected. For this simulation, we refer to the supplement as well. Nevertheless, based on our results, we suggest to always include possible confounders and interactions in meta-regression models. Although the ideal modelling approach remains unclear, this work has shown the dangers of using overly simple univariable meta-regression models for statistical inference. Even though many applications will have a much smaller sample size than in the analysis above, considering possible confounding variables and interactions may generally improve insights into the underlying data.

ACKNOWLEDGMENT
Open Access funding enabled and organized by Projekt DEAL.