Preliminary results were presented at the Congress for Research Workers in Animal Diseases in December 2008 and at the 2009 ACVIM forum in June 2009.
Corresponding author: Jan M. Sargeant, Centre for Public Health and Zoonoses, Ontario Veterinary College, University of Guelph, 103 MacNabb House, Guelph, ON, Canada N1G 2W1; e-mail: email@example.com.
Background: To address concerns about the quality of reporting of randomized controlled trials, and the potential for biased treatment effects in poorly reported trials, medical journals have adopted a common set of reporting guidelines, the Consolidated Standards of Reporting Trials (CONSORT) statement, to improve the reporting of randomized controlled trials.
Hypothesis: The reporting of clinical trials involving dogs and cats might not be ideal, and this might be associated with biased treatment effects.
Animals: Dogs and cats used in 100 randomly selected reports of clinical trials.
Methods: Data related to methodological quality and completeness of reporting were extracted from each trial. Associations between reporting of trial features and the proportion of positive treatment effects within trials were evaluated by generalized linear models.
Results: There were substantive deficiencies in reporting of key trial features. An increased proportion of positive treatment effects within a trial was associated with not reporting: the method used to generate the random allocation sequence (P < .001), the use of double blinding (P < .001), the inclusion criteria for study subjects (P= .003), baseline differences between treatment groups (P= .006), the measurement used for all outcomes (P= .002), and possible study limitations (P= .03).
Conclusions and Clinical Importance: Many clinical trials involving dogs and cats in the literature do not report details related to methodological quality and aspects necessary to evaluate external validity. There is some evidence that these deficiencies are associated with treatment effects. There is a need to improve reporting of clinical trials, and guidelines, such as the CONSORT statement, can provide a valuable tool for meeting this need.
Randomized clinical trials are the gold standard for testing the efficacy of treatments and preventive strategies under real-life conditions. However, previous studies have illustrated substantive issues related to the reporting of clinical trials in the small animal literature1 and in veterinary medicine in general.2 Authors of systematic reviews of therapeutic interventions in dogs and cats also have reported shortcomings in the reporting of key features in the published research.3–6
Numerous studies have reported shortcomings in the reporting of important methodological features of clinical trials in human clinical research.7–13 Trials conducted without sound methodology are associated with biased treatment effects.14–20 To address deficiencies in the reporting of clinical trials, scientists and editors developed the CONSORT (Consolidated Standards of Reporting Trials) statement,21 which was later revised and published by 4 leading medical journals.22–25 The CONSORT statement consists of a checklist of 22 items to include when reporting the results of a clinical trial and a diagram to describe the flow of participants through each stage of the trial.A companion document provides an explanation for each item in the checklist.26 CONSORT is now endorsed by several hundred journals,27 including 2 veterinary journals (Equine Veterinary Journal and The Veterinary Journal). There is empirical evidence that CONSORT has led to improvements in clinical trial reporting.28
The primary objective of this study was to describe the current status of reporting of clinical trials involving dogs and cats using the CONSORT statement as a guideline. For the purposes of this paper, a clinical trial was defined as a parallel or crossover trial with a concurrent control group conducted in client-owned or research dogs or cats. A secondary objective was to investigate the potential for bias associated with nonreporting by evaluating whether reporting of key components of clinical trials was associated with the likelihood of finding a positive treatment effect.
Materials and Methods
Clinical trials were identified by searching 2 electronic databases (PubMed-MEDLINE and Centre for Agricultural Bioscience International [CABI] Direct). Search terms were created to identify trials related to either dogs or cats, and to identify 4 types of interventions: drug treatments, dietary treatments, surgical interventions, and vaccine evaluations. The specific search string used was (dog OR dogs OR canine OR cat OR cats OR feline) AND (drug* OR diet OR nutrition OR surg* OR vacc*). The search string was restricted to articles published in 2006 or later.
Abstracts identified by the search were screened for relevance by 1 reviewer (AT). Criteria for inclusion were that the article described a clinical trial involving dogs or cats, evaluated 1 of the intervention types using a clinical endpoint, and was in English. “Clinical trial” included trials in client-owned or research dogs or cats, and included both natural disease exposure and deliberate disease challenge of study subjects to an infectious disease agent of interest (“challenge studies”).
Relevant abstracts were imported into a reference management program.a A random number sequence was generated to select 100 articles from these abstracts. If, during evaluation of the full article, a trial was found not to meet the relevance criteria or if a copy of the article could not be located, then the article was replaced by the next citation in the random sequence.
A copy of the full article for each selected abstract was obtained. If an article included more than 1 trial, a single trial was randomly selected using a coin toss or number draw. A checklist was created to evaluate methodological quality and completeness of reporting. The checklist was based on the CONSORT statement, with additional items to address issues relevant to veterinary medicine. Initially, 10 trials were evaluated by all reviewers (all authors of this report) and interrater agreement for each item on the checklist was assessed by κ. Based on this pretest, question wording was modified for several questions. Thereafter, each trial was evaluated independently by 2 reviewers. Disagreements on any item between reviewers were resolved by consensus.
A separate data form was created to obtain information on outcomes evaluated in the trials. The form categorized the outcomes into mortality-related outcomes, morbidity-related outcomes, presence or concentration of an infectious disease agent, quality of life, mobility, immunology, physiology, performance, or “other.” For each trial, 2 reviewers (different from the reviewers on the 1st checklist) independently entered the number of outcomes in each category that were positively associated with treatment (ie, a beneficial treatment effect), negatively associated with treatment, not significantly associated with treatment, descriptive only, or described in “Materials and methods” but with no results subsequently reported. If there were more than 2 levels of the treatment variable, comparisons between the highest level of treatment and any other level were used. If more than 1 treatment was included in the trial, and the primary treatment of interest was not identified, the 1st treatment described in the article was compared with the comparison group. Disagreements between reviewers were resolved by consensus.
The number of trials addressing each checklist item and the results of the outcome form were summarized.
Six methodological quality criteria and 7 items related to completeness of reporting were selected as key components. The methodological quality criteria were (1) random allocation to treatment group, (2) a description of the method used to generate the random allocation sequence, (3) double blinding (ie, blinding of person(s) administering treatment and person(s) evaluating the outcome), (4) a description of the mechanism of blinding, (5) a description of the number of study subjects lost to follow-up, and (6) a description of the statistical methods used for all outcomes. The key items related to completeness of reporting were (1) a description of the inclusion/exclusion criteria for study subjects, (2) a description of the intervention protocol in sufficient detail for replication, (3) a description of animal signalment (sex, age, weight, and breed), (4) a description of baseline differences among treatment groups for at least 1 variable, (5) a description of the measurement of all outcomes, (6) a clearly stated sample size, and (7) a discussion of potential study limitations.
Simple (bivariable) associations between each key methodological quality criteria and completeness-of-reporting item and the proportion of positive treatment associations within a trial were evaluated. Generalized linear models were used wherein the outcome was the number of statistically significant positive treatment outcomes divided by the number of outcomes that were statistically evaluated (ie, excluding outcomes for which no statistical analysis was reported). A log link function was used with a binomial distribution for the error structure.
Least squares linear regression was used to investigate bivariable associations with the total number of key features adequately addressed in each trial. Methodological quality and completeness of reporting were evaluated separately. The outcome for the former was the number of key methodological quality criteria adequately addressed (0–6), and the outcome for the latter was the number of completeness-of-reporting items adequately addressed (0–7). The independent variables were the total number of outcomes in each trial, the proportion of the total number of outcomes that were statistically evaluated, the proportion of outcomes that were positively associated with treatment, and a binary variable representing whether the disease outcome was the result of natural disease exposure (0) or an artificial disease challenge (1). Results were considered statistically significant at a P-value of ≤.05. Assumptions of the least squares regression models were assessed by standard diagnostic procedures, which included assessment of the normality of the residuals.
The electronic databases search was conducted in May 2008, and identified 6,097 abstracts. After removing duplicate records and screening for relevance, 405 unique citations were identified. From these, 100 trials were randomly selected. There were 25 trials published in 2006, 59 in 2007, and 16 in 2008. The trials were from 32 journals, with the number of trials per journal ranging from 1 to 12 (median = 2). There were 72 trials involving dogs and 28 involving cats. The trials evaluated medical therapeutic treatments (n = 74 trials), medical preventive interventions (6 trials), surgical therapeutic treatments (10 trials), surgical preventive interventions (2 trials), dietary therapeutic treatments (6 trials), and management manipulation (2 trials).
There were substantive deficiencies in the reporting of important criteria for both methodological quality and completeness of reporting (Table 1). Of note, 15 trials did not include a comparison group. These studies represented case series, rather than clinical trials. However, they passed the relevance screening because the authors made explicit statements regarding treatment efficacy in the abstract. These 15 case series were excluded from further analyses. Thus, the number of trials evaluated was 85. Less than half the trials included a description of the inclusion/exclusion criteria for study subjects. The sample size was only justified in 1 trial and, of the 67 trials where no statistically significant treatment effect was observed, only 5 described the statistical power of the trial. The majority of trials (70/85, 82%) stated that allocation to treatment group was random, with 11/70 (16%) providing details on the method of random sequence generation. Although blinding was feasible in the majority of trials (80/85, 94%), blinding of the person evaluating the outcome was reported in 38/80 (48%), with blinding of the person administering the treatment and the person assessing the outcome even less frequently described (14/80, 18%). Formal statistical analyses were performed for 82/85 trials. However, although 76/82 (93%) trials reported the use of repeated measurements for at least 1 outcome, only 49 (64%) of these trials described whether the repeated measures were controlled in the analysis.
Table 1. Measures of methodological quality and completeness of reporting reported for 85 clinical trials involving dogs and cats published between 2006 and 2008.
Number of Trials Reporting Measure
Number of Trials Examined
Trials with single or double blinding.
b Trials with statistical analyses performed and level of treatment allocation described.
c Trials of client-owned animals, statistical analysis performed, and level of treatment allocation described.
d Trials with comparison group and formal statistical analysis performed.
Title and abstract
Term “random” or “randomized” used in title or abstract
Study objectives stated
If yes, more than one objective described
If more than one objective, primary objective identified
Inclusion and exclusion criteria for study subjects
Geographic location of trial
Intervention described in sufficient detail for replication
More than one outcome described
If yes, primary outcome identified
Measurement of all outcomes described
Methods used to enhance quality of measurement (eg, multiple observations or observers, explicit training)
Infectious disease outcome
If yes, was deliberate exposure to disease challenge used
If yes, challenge protocol described in sufficient detail for replication
Sample size stated
If yes, sample size justified
If no significant results, statistical power discussed
Trial stopped early
If yes, a priori stopping rules described
Random allocation to treatment described
If yes, method to generate random allocation sequence described
Random allocation sequence concealed until interventions assigned
Description of who generated the allocation sequence, and who enrolled and assigned participants
Additional restrictions in treatment assignment (eg blocking/stratification) described
(1) Client-owned animals
Client blinded to treatment
Person administering treatment blinded (single)
Outcome evaluator blinded
Both person administering treatment and person evaluating outcome blinded (double blinding)
Mechanism of blinding described
(2) Research animals
Person administering treatment blinded (single blinding)
Outcome evaluator blinded
Both person administering and person evaluating outcome blinded (double blinding)
Confidence intervals or measures of variability reported
Exact P-value given or nominal P-value for significance described
Reported results of analyses not described in objectives
If yes, described as exploratory
Description of adverse effects or lack thereof
Possible study limitations discussed
The mean number of outcomes per trial was 10.8 (Fig 1). The average number of outcomes by category was 1.3 outcomes related to mortality (in 6 trials reporting 1 or more outcomes in this category), 3.9 outcomes related to morbidity (in 33 trials), 2.6 outcomes describing bacterial, viral, or parasitic presence or concentration (in 12 trials), 3.6 mobility outcomes (in 7 trials), 2.9 quality of life outcomes (in 15 trials), 4.9 immunological outcomes (in 8 trials), 9.1 physiological outcomes (in 57 trials), 4.5 performance outcomes (in 9 trials), and 4.5 “other” outcomes (in 19 trials), including histologic outcomes, genetics or gene expression, and time-related measures (eg, time to extubation). Of the 915 outcomes, 281 (31.0%) outcomes had no formal statistical comparison reported, and 28 (3.1%) outcomes were described in “Materials and methods” with no results subsequently reported. Of the 606 outcomes for which a statistical test was reported, 206 (34.0%) had a statistically significant positive treatment effect, 42 (6.9%) had a significant negative treatment effect, and for 358 outcomes (59.1%) there was no significant treatment effect.
Simple associations between key methodological and completeness-of-reporting items and the proportion of statistically evaluated outcomes that were positively associated with treatment are shown in Table 2. Trials reporting double blinding had a significantly lower proportion of statistically significant positive treatment effects (P <.001), as did trials describing the method used to generate the random allocation sequence (P < .001). There was a significantly lower proportion of positive treatment associations for trials describing inclusion and exclusion criteria for study subjects (P= .003), trials where baseline differences were reported for at least 1 variable (P= .006), trials where the measurement of all outcomes was described (P= .002), and trials that described study limitations (P= .03).
Table 2. Simple associations between key measures of methodological quality and completeness-of-reporting items and the proportion of statistically evaluated outcomes with a positive treatment effect for 76 trialsa involving dogs and cats published between 2006 and 2008.
β Coefficient (Standard Error)
Estimated using generalized linear equations with a log link and a binomial error structure.
Seventy-six trials with at least 1 outcome evaluated for an association with the treatment using a formal statistical comparison.
b All trials reported the sample size.
Methodological soundness items
Random allocation to treatment reported
Method to generate random allocation sequence described
Double blinding reported
Mechanism of blinding described
Reported number of animals not completing study (ie, lost to follow-up)
Statistical methods described for all outcomes
Inclusion and exclusion criteria described
Intervention described in sufficient detail for replication
Animal signalment described (sex, age, weight, and breed)
Of the 6 key methodological quality criteria, the number that were reported within each trial ranged from 1 to 6 (median = 3.0) (Fig 2). In simple associations, the use of an artificial disease challenge was negatively associated with the number of methodological quality criteria reported (P= .003), although the number of such trials was low (n = 13 trials with infectious disease outcomes). There were no significant associations between the number of key methodological quality criteria reported and the number of outcomes (P= .20), proportion of outcomes statistically evaluated (P= .07), or the proportion of statistically evaluated outcomes that were positively associated with treatment (P= .66). Regression diagnostics indicated that no assumptions were violated for any of the comparisons.
The total number of the 7 key completeness-of-reporting items that were addressed ranged from 1 to 7 (median = 5.0) (Fig 3). There was a positive association between the number of outcomes in a trial and the number of key completeness-of-reporting variables that were addressed (P= .001). There were no significant associations between the number of completeness-of-reporting items that were addressed and the proportion of outcomes that were statistically evaluated (P= .77), the proportion of statistically evaluated outcomes that were positively associated with treatment (P= .30), or the use of a deliberate disease challenge (P= .11). Regression diagnostics indicated that no assumptions were violated for any of the comparisons.
Substantive deficiencies were found in the reporting of methodological quality criteria and in the completeness of reporting of items necessary for the interpretation of trials examined in this study. This is consistent with the observations of trial quality from systematic reviews of trials involving dogs and cats.3–6 Our appraisal template was based on the CONSORT statement, which was developed to include items where there was empirical evidence of an association with bias in studies in human medicine, or where the information was deemed essential by the CONSORT development group for evaluating the external validity of trial findings.22 The descriptive results (Table 1) serve as a guide to trial components that are poorly reported and require consideration when writing trial reports for publication.
A number of case series were identified as clinical trials by our search strategy because they were described as clinical trials in the abstract or because the authors drew conclusions regarding treatment efficacy. Case series are used to describe a group of cases with a similar clinical presentation and can provide valuable information on the clinical presentation and course of a disease.29 However, because they do not include a comparison group, it is inappropriate to make inferences with regards to disease causation, risk factors, or efficacy of treatment.30 Thus, when reporting this type of study, authors should clearly document the limitations of the study design and readers should be alert to such limitations.
Our findings do not necessary mean that the trials were poorly designed, but rather that key features were not reported. Studies in the human health care literature, where trial authors were contacted to clarify whether their trials used concealment of randomization and blinding, found that these features were often used despite not having been reported in the manuscript.31,32 Thus, authors of clinical trials should be cognizant of which features of the trial design should be reported and provide this information in reports of trials.
Information on outcomes, and the associations between treatment and outcomes, was similarly poorly reported in some trials, with only 7% of trials identifying the primary outcome of interest and 34% of treatment comparisons not having formal statistical assessments reported. The identification of clinically significant differences in a specific outcome (the primary outcome) between treatment groups is a key component of sample size determinations, and thus the clear identification of the primary outcome is an important trial feature. A formal statistical analysis of the trial results is necessary to evaluate whether any observed numerical differences in the outcome between treatment groups are likely to be due to chance. There is evidence from the human health care literature that treatment effects that are statistically significant are more likely to be completely reported than are treatment effects that are not statistically significant.33 Thus, trials with a number of incompletely reported comparisons might be associated with bias in terms of reporting of treatments and outcomes, with positive outcomes overreported.
The overall quality of trials was assessed by selecting key trial components and summing the number of these components that were reported within trials. The methodological quality criteria included trial components from the Jadad scale,34 which is commonly used as a quality assessment tool in the human health care literature.16 The Jadad scale includes randomization and method of randomization, double blinding and description of blinding, and a description of losses to follow-up. We added description of the statistical methods for all outcomes, based on our finding that this criterion was not consistently reported in clinical trials involving dogs and cats. Our results showed a considerable range in the number of methodological quality and completeness-of-reporting items that were reported with some trials reporting most or all of the components and other trials reporting few. This highlights the importance of critically evaluating reports of clinical trials when considering this information to make clinical decisions.
Failure to report several of the key methodological quality criteria and completeness-of-reporting items was negatively associated with the proportion of positive treatment effects within a trial (Table 2). Although studies on the association between trial quality and the magnitude of treatment effects in the human health care literature have not always shown consistent results, some studies have shown exaggerated treatment effects in trials that fail to report random allocation to treatment group,14,15 the method used to generate the random allocation sequence,17,19 allocation concealment,15,16,18–20 and double blinding.16,20 In contrast, other studies have found no significant association between double blinding and treatment effect.19,35 We did not find associations between the total number of methodological quality criteria or completeness-of-reporting items that were reported and the proportion of positive treatment effects. However, in the human health care literature, a low score on the Jadad scale has been associated with exaggerated treatment effects.16,17,19 Results from the human health care literature are not directly comparable to our results, as those studies evaluated associations with the primary treatment effect. In the current study, only 6 trials identified the primary outcome. Therefore, we investigated associations between reporting of trial features and the proportion of positive treatment effects within trials. Nonetheless, the evidence from both types of analyses suggests that poor reporting is associated with overestimation of treatment effects.
The current study had several potential limitations. Reviewers were not blinded to author names, raising the possibility that knowledge of the identity of the authors could have influenced the reviewers' assessment. However, the use of 2 reviewers independently assessing the trials should reduce inadvertent bias due to knowledge of author information in the data collection. We recommend blinding reviewers to author information in future studies. A 2nd potential limitation was the relatively small sample size. We selected 100 trials (of which 85 were used in analysis) to provide a descriptive summary of current reporting, rather than specifically based on the number of trials needed to test hypotheses related to potential biases. Therefore, there could have been low power to detect significant associations, particularly for trial features that were reported with high or low frequency. We also did not control for type I error rate for the study associated with the multiplicity of associations that were evaluated. Therefore, it is possible that some of the statistically significant associations represented type I errors.
We did not consider source of funding in our analysis, as it is not a design feature of clinical trials per se. However, further studies might consider including funding source to evaluate whether this is associated with trial quality or with the probability of reporting positive treatment effects.
In summary, our results indicate that published clinical trials involving dogs and cats often have substantive deficiencies in reporting of features related to methodological quality and the detail needed to evaluate external validity. There is some evidence that these deficiencies are associated with reporting treatment effects. There is apparently a need to improve reporting of clinical trials involving dogs and cats and guidelines, such as the CONSORT statement, could provide a valuable tool for meeting this need.
The authors thank Jim Brett for library assistance, Annette Wilkins for helpful comments on a draft of this manuscript, and Aodhan Wall for technical assistance in manuscript preparation. Funding was obtained from the Laboratory for Foodborne Zoonoses, Public Health Agency of Canada, and the Canadian Institutes of Health Research (CIHR) Institute of Population and Public Health/Public Health Agency of Canada Applied Public Health Chair.