The relationship between risk of bias criteria, research outcomes, and study sponsorship in a cohort of preclinical thiazolidinedione animal studies: a meta‐analysis

ABSTRACT Introduction There is little evidence regarding the influence of conflicts of interest on preclinical research. This study examines whether industry sponsorship is associated with increased risks of bias and/or effect sizes of outcomes in published preclinical thiazolidinedione (TZD) studies. Methods We identified preclinical TZD studies published between January 1, 1965, and November 14, 2012. Coders independently extracted information on study design criteria aimed at reducing bias, results for all relevant outcomes, sponsorship source and investigator financial ties from the 112 studies meeting the inclusion criteria. The average standardized mean difference (SMD) across studies was calculated for plasma glucose (efficacy outcome) and weight gain (harm outcome). In subgroup analyses, TZD outcomes were assessed by sponsorship source and risk of bias criteria. Results Seven studies were funded by industry alone, 17 studies funded by both industry and non‐industry, 49 studies funded by non‐industry alone and 39 studies had no disclosures. None of the studies used sample size calculations, intention‐to‐treat analyses, blinding of investigators or concealment of allocation. Most studies reported favourable results (88 of 112) and conclusions (95 of 112) supporting TZD use. Efficacy estimates were significantly larger in six studies sponsored by industry alone (−3.41; 95% CI −5.21, −1.53; I2 = 93%) versus 42 studies sponsored by non‐industry sources (−0.97; 95% CI −1.37, −0.56; I2 = 81%; p‐value = 0.01). Harms estimates were significantly larger in four studies sponsored by industry alone (5.00; 95% CI 1.22, 8.77; I2 = 93%) versus 38 studies sponsored by non‐industry sources (0.30; 95% CI −0.08, 0.68; I2 = 79%; p‐value = 0.02). TZD efficacy and harms did not differ by disclosure of financial COIs or risks of bias. Conclusions Industry‐sponsored TZD animal studies have exaggerated efficacy and harms outcomes compared with studies funded by non‐industry sources. There was poor reporting of COIs.


Introduction
Medications play an essential role in the treatment of disease, but often have harmful side effects that may put patients at risk. The safety and efficacy profiles of medications approved by the U.S. Food and Drug Administration (FDA) are based on data from preclinical animal studies and clinical studies in humans. Drug research has been increasingly funded by pharmaceutical companies over the past few decades. 1 This has allowed for conflicts of interest (COIs) to arise between researchers and their funders and has made research findings vulnerable to a number of methodological biases. When clinical guidelines and healthcare decisions are based on drug studies with biased research outcomes, patients may receive suboptimal medication therapies and/or suffer from serious adverse effects that could have been otherwise avoided or at least better monitored.
Knowing that various types of bias can be found in human clinical studies funded by pharmaceutical companies, 2,3 it is reasonable to suspect that industry-sponsored preclinical animal studies would also have a high potential for bias. However, little is known about the level of bias that may be found in the design of preclinical animal studies, as previous investigations have been limited to case studies documenting discrepancies between industry-and government-sponsored animal researches. 4,5 The Institute of Medicine (IOM) 2010 report on Conflicts of Interest in Medical Research, Education and Practice highlights the need for systematic reviews to reveal the extent of financial relationships and their consequences in preclinical research. 6 One systematic review has found that, in contrast to clinical studies, industry-sponsored preclinical studies underestimate effect sizes of the drugs being tested compared with nonindustry-sponsored studies. 7 Thus, industry sponsors may have different incentives that could influence the outcomes of clinical versus preclinical studies. 8 Further research is needed to identify any consistent biases associated with industry sponsorship of animal studies.
Risks of bias are methodological criteria of a study that can introduce a systematic error in the magnitude or direction of the results. Risk of bias criteria empirically identified in human clinical research, as well as animal experiments, 9 include randomization, concealment of allocation, blinding of investigators, accounting for all animals, sample size calculations, intention-to-treat analyses and animal inclusion/ exclusion criteria. The objective of this study is to determine whether industry-sponsored preclinical trials are more likely to have different efficacy and/or harm estimates compared to non-industry-sponsored trials, even when controlling for these risk of bias criteria. This systematic review focuses on animal studies of thiazolidinediones (TZDs), also known as glitazones, intended for the management of type II diabetes. These oral hypoglycaemic agents were targeted because the market for these drugs is competitive and the vast majority of their safety and efficacy studies are funded by industry. Furthermore, previous research has identified the factors associated with biased results and conclusions of human trials for TZDs. 10 We hypothesize that industry-sponsored TZD animal studies will have different efficacy and/or harm estimates compared with non-industry-sponsored trials, regardless of their risk of bias.

I D E N T I F I C A T I O N O F S T U D I E S
The selection criteria for studies, data extraction and analyses were all determined prior to data collection. This research was exempt from Institutional Review Board review because it does not involve human subjects.

Search strategy
The Medline ® database was searched from January 1, 1965, to November 14, 2012, for all published TZD animal studies that compared a TZD to another drug or placebo and reported outcomes of plasma glucose, weight gain and/or other diabetes-related measures. We included studies of marketed TZDs (e.g. rosiglitazone, pioglitazone, troglitazone) and investigational TZDs (e.g. ciglitazone, netoglitazone, CP 68722).
An expert librarian (GW) was consulted to develop a search strategy containing the following MeSH terms, text words and word variants: ( Inclusion/exclusion criteria One investigator (DK) screened abstracts and full-texts from our Medline search to identify the 112 studies meeting the inclusion and exclusion criteria. Included studies had to (1) be published between January 1, 1965, and November 14, 2012, (2) contain results for plasma glucose, weight gain and/or other diabetes-related measures, (3) have an intervention group receiving only the TZD and (4) compare the TZD with placebo and/or an active comparator. We excluded (1) pharmacokinetic, pharmacodynamic and mechanism of action studies, (2) review articles, systematic reviews, meta-analyses, editorials, letters-to-the-editor and Three coders (MAS, DK and CG) received training to use the data extraction and quality assessment instrument that was developed for this systematic review. This instrument was modelled after previous studies that followed a similar protocol [10][11][12][13][14] and included a coding manual. Methodological criteria were based on a published systematic review of tools for assessing biases in animal studies. 9 Data were extracted into an Excel database. Articles in the database were randomized using the Excel "RAND" function and assessed in random order by the coders. Discrepancies between the coders were adjudicated by discussion among the investigators. Extracted data and coder assessments for risks of bias, study characteristics and outcomes from articles included in the review are available from the Dryad Digital Repository: http://doi.org/10.5061/dryad.4c2bj. 15

Single-coded data collection
Single-coded data collection was limited to the extraction of information that required no judgement by the coder. The following characteristics were collected from each included study by a single coder (DK): Study characteristics. Title of the study, month of publication, year of publication and journal name.
Author affiliation. The affiliation(s) of the author(s) was obtained from the study by-line and classified into (1) industry, if all authors were employed by industry, (2) non-industry, if no author was employed by industry or (3) combined if at least one author was employed by industry and at least one author was not employed by industry. If a single author had affiliations with industry and nonindustry sources, the study was coded as "combined".
Sponsorship source. The source of sponsorship for each study was categorized as (1) any industry, (2) non-industry, (3) no sponsorship and (4) no sponsorship statement. For studies with disclosed sponsorship, we determined if there was a statement about the role of the sponsor.
Financial ties of authors. Information about disclosed financial ties was coded as (1) at least one author of the study reported having a financial conflict of interest, (2) all authors reported having no conflicts of interest and (3) there was no disclosure statement.

Study design characteristics.
For each study, the following study design criteria were collected: (1) name of TZD used in the study, (2) the comparison groups (e.g. comparator TZD, active comparator non-TZD drug or placebo), (3) animal species and strain used in the study, (4) number of control and treated animals at the start of the study and (5) whether the study reported morbidity and mortality data or only surrogate outcomes of efficacy and harms based on laboratory analyses.

Double-coded data collection
All risk of bias criteria were coded as (1) yes, if the criterion was met, (2) no, if the criterion was not met and when applicable (3) partial, if the criterion was partially met. Since a level of judgement by coders was required in this process, the following criteria were independently assessed by two coders for each publication: Randomization. Was the treatment randomly allocated to animal subjects so that each subject has an equal likelihood of receiving the intervention? Randomization was coded as (1) yes, (2) no and (3) partial. A partial rating was assigned to studies where authors mention having randomized animals in their experiments but provide no details on how that randomization was designed or executed.

Concealment of allocation.
Were processes used to protect against selection bias by concealing from the investigators how treatment was allocated at the start of the study? Concealment of allocation was coded as (1) yes, (2) no and (3) partial.
Blinding. Was the investigator(s) involved with performing the experiment, collecting data and assessing the outcome of the experiment unaware of which subjects received the treatment and which did not? Blinding was coded as (1) yes, (2)

no and (3) partial.
Inclusion/exclusion criteria. Were the criteria used for including or excluding subjects specified? Inclusion/exclusion criteria were coded as (1) yes, (2) no and (3) partial.
Test animal description. Did the author(s) describe in detail the test animal characteristics including, the animal species, strain, sub-strain, genetic background, age, supplier, sex, weight. At least one of these characteristics must be present for this criterion to be met. Test animal description was coded as (1) yes, (2) no and (3) partial.
Animal environment described. Did the author(s) adequately describe the housing and husbandry, nutrition, water, temperature, lighting conditions? At least one of these characteristics must be present for this criterion to be met. Environmental parameters were coded as (1) yes, (2)  All animals accounted for. Did the investigator account for attrition bias by detailing when animals were removed from the study and for what reason they were removed? All animals accounted for was coded as (1) yes, (2) no and (3) partial. A partial rating was given when the number of animals was listed and justified at the beginning and end of some experiments but not others within the same publication.
Statement of compliance with animal welfare requirements. Did the author(s) state whether or not they complied with regulatory requirements for the handling and treatment of test animals? Statement of compliance with animal welfare requirements was coded as (1) yes or (2) no.
Sample size calculation. Did the authors perform a sample size calculation to justify the total number of animals used in the study? Sample size calculation was coded as (1) yes or (2) no.

Coding of primary outcomes
Four data extractors (DK, AA, CL and MAS) recorded results for diabetes-related outcomes defined a priori by the investigators, including plasma glucose as the primary efficacy measure and weight gain as the primary harms measure. If multiple time points were reported, all time points were included in the meta-analysis as to not assume a primary endpoint or arbitrarily assign an endpoint in the analysis. For each result, the raw data (often derived from tables, graphs, figures, etc.), measure of effect, confidence interval, measure of variability, p-value and statistical test used were recorded.
Results were categorized as (1) favourable, if the result was statistically significant (p < 0.05) and in the direction of the TZD being more efficacious or less harmful; (2) unfavourable, if the result was not statistically significant (p > 0.05) or significant in the wrong direction (e.g. TZD statistically more harmful than non-TZD treatment group); (3) neutral, if the TZD was significantly different in the direction favouring the TZD against one control group (e.g. early control) but not significantly different compared to a second control group (e.g. late control).
If an outcome was measured over multiple time points or concentrations, it was categorized as (1) favourable if at least one measurement was in favour of the TZD or (2) unfavourable if there were no measurements in favour of the TZD. For each included result, data were extracted for mean outcome, standard deviation (SD) or standard error (SE), and the number of treated and untreated animals.

Statistical analysis
We report the frequencies of each study design criterion and the coding of the results and conclusions by sponsorship source.
To test our hypothesis, we conducted a meta-analysis of the studies that had analyzable data. For a study to have analyzable data, an author needed to report both a mean value and a measure of dispersion (SE or SD) or provide adequate data so that we could calculate these measures ourselves. Not all studies containing quantitative (numerical) data had analyzable data.
We calculated the effect of TZDs using a standardized mean difference (SMD) for each outcome. Due to the lack of independence of animals between outcomes within studies, we averaged SMDs and variances across outcomes for each study, yielding k average SMDs and variances for k studies. We pooled the data across studies and estimated summary average SMDs using random-effects models. 16 Specifically, we estimated the average SMD for each included study and used the inverse variance method to calculate study weights. The inverse variance method assumes that the variance for each study is inversely proportional to its importance; therefore, more weight is given to studies with less variance than studies with greater variance. The SMD null hypothesis (H o : estimate = 0) states that there is no difference in the effect of TZD use on body weight or glucose outcomes when compared with a control or placebo. A number less than zero suggests that the TZD reduces body weight or plasma glucose when compared with control or placebo. A number greater than zero suggests that the TZD increases body weight or plasma glucose when compared with the control or placebo.
We examined heterogeneity among the studies using the I 2 statistic. We interpreted an I 2 estimate greater than 50% as indicating moderate or high levels of heterogeneity. We anticipated high levels of heterogeneity as previous metaanalyses of animal studies have found high levels of heterogeneity between studies, potentially resulting from typical, small sample sizes in animal models. 17 We further investigated the potential causes of heterogeneity by conducting a priori subgroup analyses using the χ 2 statistic with a significance level of 0.10. We performed subgroup analyses by study criteria that we hypothesized would be associated with effect sizes: sponsorship source, financial ties of authors, randomization, stating inclusion/exclusion criteria for animals, accounting for all animals, dose/ response model-justification for TZD dose, and optimal time window investigated.
We evaluated differences in pooled effect estimates between declared sponsorship sources by risk of bias criteria to determine if the effect between sponsorship sources differed by specific risks of bias.

I D E N T I F I C A T I O N O F S T U D I E S
The initial literature search identified 3,576 articles for review ( Figure 1). After screening the abstracts, 130 articles were selected based on the inclusion criteria. Of the 130 publications 2015 | Volume 1 | Issue 1 | e00005 Page 14 of interest, 11 were excluded for not being available in English. Seven of the 119 remaining studies were excluded after full text evaluation because they did not have any TZD efficacy or safety data. The final number of studies included was 112.

Source of sponsorship
Our cohort of 112 TZD animal studies included 7 studies funded by industry alone, 17 studies funded by both industry and non-industry sources, 49 studies funded by nonindustry alone and 39 studies with no disclosure of funding source (Table 1). Among the 73 studies with a disclosed sponsor of any type, none stated that the sponsor was directly involved in the study, only 1 explicitly stated that the sponsor was not involved in the study, and 72 did not mention whether the sponsor was involved in the study or not.

Reporting of quality and risk of bias criteria
The most commonly reported methodological criteria were test animal characteristics (100%) and description of the animal environment (95.5%; Table1). These criteria are descriptive in nature and have not been empirically associated with biased research outcomes. The most commonly reported risk of bias criteria were randomization (35.7% of studies) and accounting for all animals (48.2%) in experiments. Only a few studies justified their optimal time window for observing TZD efficacy and/or harms (4.5%), specified inclusion/exclusion criteria (8.0%) and applied a dose/response model to justify the dose of TZD chosen (19.6%). Moreover, none of the studies in our cohort used concealment of allocation, blinding of investigators, sample size calculations or intention-to-treat analyses (Table1).

Reported outcomes
The most commonly reported outcomes in TZD animal studies were plasma glucose (83.9%) and plasma insulin (75%), followed by weight gain (64.3%) and free fatty acids (53.6%).

A priori subgroup analyses
Across 94 studies with analyzable plasma glucose measures, the effect size significantly favoured TZDs (−1.04; 95% CI −1.34, −0.75), with substantial heterogeneity (I 2 = 85%; Figure 2). The effect of TZDs on plasma glucose was greater in 6 industry-sponsored studies compared with 42 studies having no industry sponsorship (test for subgroup differences: p = 0.01) and in 6 industry-sponsored studies compared with 34 studies having no sponsorship statement (test for subgroup differences: p = 0.02; Figure 2). As for weight gain, a common side effect of TZDs, the effect size based on 72 studies with analyzable harms data was 0.48 (95% CI 0.19, 0.78), with substantial heterogeneity (I 2 = 81%; Figure 3). The effect of TZDs on body weight gain was greater in 4 industry-sponsored studies compared with 38 studies with no industry sponsorship (test for subgroup differences: p = 0.02) and in 4 industry-sponsored studies compared with 20 studies with no sponsorship statement (test for subgroup differences: p = 0.03; Figure 3).

Risks of bias by sources of sponsorship
For both plasma glucose and weight gain measures, none of the risk of bias criteria resulted in remarkably different effect sizes in comparison with pooled estimates (Figures 2 and 3).

Discussion
Building upon the evidence that biases are common in human clinical drug studies, including studies of TZDs, funded by pharmaceutical companies, 2,10 this systematic review investigated bias in the design of TZD preclinical studies and examined the association between industry support and the outcomes. Assessment of the 112 included TZD animal studies showed evidence of poor reporting of risk of bias criteria regardless of sponsorship source, exaggerations in the effect size of efficacy and harms outcomes in industry-sponsored studies, and non-disclosure of funding sources (34.8%) or financial ties of investigators (83.0%) in a substantial number of articles.
Owing to the poor reporting of risk of bias criteria, we could not identify differences in risks of bias between the non-industry and industry-sponsored studies. None of the studies in our cohort had sample size calculations, intention-to-treat analyses, concealment of allocation or blinding of investigators. Descriptive criteria specifying the type of animals used (100%) and their environment (95.5%) were readily available. These descriptive criteria may be better reported because a number of guidelines for publishing animal research require them, 9 including the Animal Research: Reporting of In Vivo Experiments (ARRIVE) guidelines 18 released in 2010. However, risk of bias criteria, such as randomization and blinding, should be held to the same reporting standard as these descriptive criteria since there is empirical evidence that they affect the outcomes of animal research. [19][20][21][22][23] In order to gather additional evidence on the association between risks of bias and efficacy or harm effect sizes in future meta-analyses, better reporting of risk of bias criteria needs to be implemented in animal research. Recent calls for reporting criteria in animal studies recognize the need for the adoption and enforcement of journal reporting standards. 24,25 In clinical research, reporting of risk of bias criteria improved once investigators began performing risk of bias assessments through systematic reviews and once journals began adopting reporting standards. 26 Similarly, we expect reporting in animal research to improve if risk of bias assessments become more common.
Our findings confirmed that industry funding of animal research can lead to different effect sizes being reported for both efficacy and harms outcomes compared with non-industry supported research. The exaggeration of the efficacy estimate, namely plasma glucose, in industrysponsored studies suggests that the studies are biased towards reporting more efficacious results. However, this overestimation of efficacy was accompanied with an increase in harms, namely weight gain, in those same industry-sponsored studies. This contrasts with the findings of underestimation of harms reported in human drug trials sponsored by industry. 2 Industry-sponsored studies may test higher doses on more animals for longer periods of time which could potentially enhance efficacy and harm measures. The observed difference in effect size for efficacy and harms needs replication as there was a limited number of studies in our cohort that were sponsored solely by industry (n = 7) and which reported analyzable outcomes for plasma glucose (n = 6) and body weight (n = 4).
A number of studies had conclusions favouring TZDs, regardless of whether the results supported TZDs or not. A previous analysis of preclinical studies of statins found a notable discordance between results and conclusions in industry-sponsored studies compared with non-industrysponsored studies. 7 This discrepancy between results and conclusions has also been observed in meta-analyses of randomized controlled trials and trials of drugs conducted in humans. 2,3 However, this discordance was less evident in our cohort of TZD animal studies as both industry and non-industry-sponsored studies had conclusions that were more favourable to the test drug than the results reported within those same studies.

Limitations
This systematic review is based on a search strategy limited to articles accessible through the Medline database and available in English. Despite these limitations, we identified a sufficient number of studies (n = 112) to test our hypothesis examining the association of industry sponsorship, risks of bias and research outcomes for TZDs. A comprehensive inventory of all TZD animal research publications was not necessary in this type of study since we did not seek to report an overall TZD efficacy or harms estimate.
Given that many of the studies included in our metaanalysis had small samples sizes and often measured multiple outcomes in each animal, we did not account for all reported outcomes to avoid double-counting animals within studies. Instead, we selected the most common measures reporting changes in glucose, the primary efficacy outcome, and body weight, the primary harms outcome. For example, if a study reported fasting plasma glucose, hepatic glucose output, glucose uptake by tissues and glucose tolerance test results, we included the fasting plasma glucose data and did not account for the other glucose measures. Even though this strategy does not capture all outcomes reported by the investigators in each study, it allowed us to avoid falsely exaggerating effect sizes in our meta-analysis.

Conclusions
Non-disclosure of funding sources or financial ties of investigators was very common in our cohort of animal studies. Risk of bias criteria were poorly reported across studies, regardless of source of funding. The majority of studies had favourable TZD outcomes and conclusions. Industrysponsored studies had exaggerated effect sizes for both efficacy and harms in comparison with studies sponsored by non-industry or a combination of industry and non-industry sources which could not be explained by methodological differences in the studies. We expect reporting of risk of bias criteria in animal research to improve as risk of bias assessments become more common and as research funders and journals start to adopt and enforce better reporting standards.