To evaluate data reporting related to external validity from randomized controlled trials (RCTs) assessing pharmacologic and nonpharmacologic treatment for hip and knee osteoarthritis (OA).
To evaluate data reporting related to external validity from randomized controlled trials (RCTs) assessing pharmacologic and nonpharmacologic treatment for hip and knee osteoarthritis (OA).
All RCTs assessing pharmacologic treatments and nonpharmacologic treatments for hip and knee OA indexed between January 2002 and December 2006 were selected. A sample of 120 articles were randomly selected: 30 each assessing pharmacologic treatments, surgery or technical interventions, rehabilitation, and nonimplantable devices.
The country was clearly reported in 25 (21%) reports, the setting described in 40 (33%) reports, and the number of centers in 54 (45%). Details about the centers (volume of care) were given in 24 (20%) reports. Rates were lower for surgical trials for the country (3%), the setting (3%), the number of centers (13%), and details about the centers (7%). The intervention was adequately described in all pharmacologic reports and in >80% of rehabilitation reports. The technical procedure was given in all surgical intervention trial reports, but the type of anesthesia was reported in 4 (13%), preoperative care in 2 (7%), and postoperative care in 15 (50%). The device was described in 93% of device trial reports, but the manufacturer was reported in only 33%.
There is low reporting of data related to external validity in reports of RCTs assessing pharmacologic and nonpharmacologic treatments for hip and knee OA.
Well-conducted randomized controlled trials (RCTs) are adopted as the gold standard for evaluating medical interventions (1–4). For results to be clinically useful, RCTs must take into account the internal validity (i.e., the extent to which systematic errors or bias are avoided) and the external validity (sometimes called applicability, i.e., whether the results of a trial can be reasonably applied or generalized to a definable group of patients in a particular setting in routine practice) (5, 6).
Historically, internal validity has been considered a priority for research. Several publications have identified methods to avoid bias (7, 8). The Consolidated Standards of Reporting Trials (CONSORT) statements, endorsed by many major medical journals, improved the reporting of data related to internal validity (1, 9). Tools (10–13) have been developed mainly to evaluate internal validity in reports of trial results included in systematic reviews (14).
Funding agencies and journals have tended to be more concerned with the scientific rigor of interventions studied than with the applicability of the results. Consequently, external validity has been frequently neglected (6, 15–17). This neglect has probably contributed to the failure to translate research into clinical practice. Lack of external validity is frequently advocated as the reason why interventions found to be effective in clinical trials are underused in clinical practice (5). However, assessing the external validity of a trial to turn research into action supposes that information is adequately reported in published articles. Further, as highlighted by the extension of the CONSORT statements to nonpharmacologic treatment, assessing external validity is probably more difficult for trials assessing nonpharmacologic treatments (e.g., surgery, technical interventions, rehabilitation, psychotherapy, devices) than pharmacologic treatments (e.g., oral drugs) (18, 19).
The aim of this study was to evaluate and compare the reporting of external validity in RCTs assessing pharmacologic and nonpharmacologic treatments for hip and knee osteoarthritis (OA). We chose these conditions because they are highly prevalent and can result in disability and reduced quality of life. Further, international guidelines require the use of a combination of pharmacologic and nonpharmacologic treatments for the optimal management of patients with these conditions (20, 21).
We identified all English-language reports of RCTs indexed between January 2002 and December 2006 in PubMed using the search terms “osteoarthritis hip” OR “osteoarthritis knee,” with a limitation to RCTs in Medline via PubMed and to articles published in English. A similar search strategy was used in a previous study on internal validity (22).
We collected the electronic records in an EndNote data file (Thomson Reuters, New York, NY). One author (NA) assessed each report by screening the title and abstract to identify relevant studies. A second author (IB) checked for adequate selection of the abstracts. Articles were included if the study was identified as an RCT assessing pharmacologic or nonpharmacologic treatment for hip or knee OA in a parallel-group or crossover design. We excluded reports of cluster RCTs, nonrandomized trials, observational studies (cohort and case–control studies), extended followup trials (i.e., extended followup of patients included in an RCT beyond the last outcome assessment), nontherapeutic trials (metrologic studies, epidemiologic studies), pathophysiologic studies, letters, ancillary studies of an RCT such as a subgroup analysis, cost-effectiveness evaluation, systematic review, and/or meta-analysis. We also excluded reports of trials assessing the organization of the health care system or interventions provided to care providers. We excluded reports with these designs because we wanted to have a relatively homogeneous sample.
The selected abstracts were classified according to the category of treatment assessed: pharmacologic treatments, surgery or technical interventions (e.g., joint lavage), rehabilitation, or nonimplantable devices.
For each category of treatment, we used a computer-generated list to randomly select 30 articles and then retrieved the full-text articles. Articles not fulfilling the inclusion criteria were replaced by a random selection of articles in the corresponding category. We chose a total of 120 articles for practical reasons, mainly to provide enough articles describing each category of treatment, and enough randomly selected articles to avoid selection bias.
To assess external validity as well as internal validity of the selected reports, we reviewed the literature and generated a standardized data extraction form (available from the corresponding author upon request). We used items related to external validity proposed by the CONSORT statement for RCTs (1), the extension of the CONSORT statement for nonpharmacologic trials (18, 19), and Rothwell et al (5). Before data extraction, as a calibration exercise the standardized form was tested independently by 2 authors (NA, IB) on a separate set of 20 reports. A meeting followed in which the ratings were reviewed and any disagreements were resolved by consensus. One author (NA) independently completed all of the data extraction. A random sample of 20 articles was reviewed for quality assurance.
The data extraction form covered the following data: the characteristics of the selected studies, including the year of publication, journal, medical area of the study (hip OA, knee OA, or hip and knee OA), type of treatment (pharmacologic treatment, surgical intervention, rehabilitation or education, or nonimplantable device), type of control intervention (active intervention, placebo, or usual care), funding sources (public, private, both, no funding, not reported, or unclear), study design (parallel-group or crossover), and sample size.
Internal validity of the selected reports was assessed with use of specific criteria recommended by the Cochrane Collaboration and by quality tools for assessing the results of pharmacologic and nonpharmacologic trials (10, 12), including allocation sequence generation; allocation concealment; blinding of patients, care providers, and outcome assessors; and intent-to-treat (ITT) analysis.
The reporting of data related to external validity was also evaluated.
Data on the method of recruitment (i.e., referral from a rheumatologist or general physician, self-selection of patients through advertisement) and duration of recruitment were evaluated.
We evaluated each study's criteria for patient eligibility (as defined in a previous work ), inclusion (i.e., criteria governing entry or recruitment of individuals into the trial and describing the medical conditions of interest), and exclusion (all other criteria limiting the eligibility of individuals) (23). The exclusion criteria were classified as strongly justified, potentially justified, or poorly justified reasons for excluding individuals from an RCT according to the classification proposed by van Spall et al (23). Exclusion criteria were considered strongly justified if an individual or substitute decision-maker was unable to grant informed consent, if the intervention or placebo would likely be harmful, if the intervention would likely be ineffective, or if the effect of the intervention would be difficult to interpret.
Data on the number of eligible patients, the number of patients not meeting inclusion criteria, and the number of patients refusing to participate were collected. We also checked whether the article reported baseline characteristics of excluded patients, as well as essential data on baseline characteristics of randomized patients (i.e., age, sex, weight/body mass index [BMI], ethnicity, coexisting diseases or comorbidities, duration of the disease, measure of function status, level of pain, description of radiographic evidence of damage, and use of nonsteroidal antiinflammatory drugs).
We collected data on the number of centers/care providers, expertise of centers/care providers, and details about the centers (name, sources, organization, and expertise). The reporting of the number of patients recruited in each center or by each care provider was recorded.
We collected data on whether and how details on the interventions were reported. For pharmacologic treatments, we evaluated the route of administration, dose, duration, frequency of treatment, and patient compliance. For rehabilitation, we evaluated the number, timing, duration, and content of each session; mode of delivery; whether there was supervision; and patient compliance. For surgical interventions, we evaluated the type of anesthesia, preoperative care, postoperative care, description of the technical procedure, and surgeons' compliance with the planned procedure. For nonimplantable devices, we evaluated the reporting of the manufacturer, description of the devices, and patient compliance.
We collected information related to external validity reported in abstracts (i.e., country where the trial took place, setting, number of centers, number of eligible patients, number of patients randomized, length of recruitment, length of followup, and data on care providers), and noted whether the external validity was discussed in the discussion section of the study as is recommended by the CONSORT statement (1).
Quantitative assessment of external validity reporting may offer complementary information. Although it is difficult to specify which aspect of external validity is the most important, we decided to focus on 3 important components that are probably indispensable to assessing the external validity of a trial: the participants, the description of the experimental treatment, and the context of care (centers, setting, care providers' expertise). For each component, we identified items that were considered essential to an adequate assessment of the external validity of a published trial. These items are described in Supplemental Appendix A (available in the online version of this article at http://www3.interscience.wiley.com/journal/77005015/home). The quantitative assessment of external validity was evaluated by the percentage of the selected items that were adequately reported for each component.
Data were analyzed using SAS software, version 9.1 (SAS Institute, Cary, NC). We used descriptive statistics for continuous variables: mean, SD, median (lower quartile, upper quartile), and minimum and maximum values. Categorical variables were described with frequencies and percentages. The results were adjusted for the potential journal clustering effect as has been recommended (24). The reporting of data related to external validity, according to category of treatment, was compared by a linear mixed-effects model, with the percentage of items with external validity as the dependent variable, fixed effects for the treatment category, and journal as a random effect.
Our electronic search identified 388 citations, of which 123 were excluded. Among the 265 included reports, we randomly chose 120 reports, 30 for each category of treatment. After obtaining and reviewing the full texts, 11 articles were replaced. The flow of articles through the study is presented in Supplemental Appendix B (available in the online version of this article at http://www3.interscience.wiley.com/journal/77005015/home).
Characteristics of the included studies are reported in Table 1. The 120 articles were indexed in 53 journals. Among them, 13 (11%) were published in a general medical journal with a high impact factor and 107 (89%) in a general medical journal with a low impact factor or in a specialized medical journal. Most trials (n = 118 [98%]) had a parallel-group design. Three-quarters of the reports assessed knee OA (n = 90). The source of funding was described as public in 45 (38%) articles and as completely or partially private in 25 (21%). A funding source was not reported in 50 (42%) reports.
|All treatment (n = 120)||Pharmacologic treatment (n = 30)||Nonimplantable devices (n = 30)||Rehabilitation (n = 30)||Surgery (n = 30)|
|Type of journal|
|General medical journal, high impact factor||13 (11)||4 (13)||4 (13)||4 (13)||1 (3)|
|Special medical journal, or general medical journal with low impact factor||107 (89)||26 (87)||26 (87)||26 (87)||29 (97)|
|Hip OA||18 (15)||1 (3)||1 (3)||5 (17)||11 (37)|
|Knee OA||90 (75)||23 (77)||27 (90)||21 (70)||19 (63)|
|Hip and knee OA||12 (10)||6 (20)||2 (7)||4 (13)||0|
|Public||45 (38)||6 (20)||15 (50)||17 (57)||7 (23)|
|Manufacturer||18 (15)||8 (27)||4 (13)||0||6 (20)|
|Public and manufacturer||7 (6)||2 (7)||0||2 (7)||3 (10)|
|No funding||8 (7)||2 (7)||0||0||6 (20)|
|Not reported||42 (35)||12 (40)||11 (37)||11 (37)||8 (27)|
|Sample size, median (IQR)||100.0 (60–216)||213.5 (85–431)||66 (38–128)||107 (77–140)||95.5 (52–180)|
|Placebo intervention||43 (36)||18 (60)||17 (57)||6 (20)||2 (7)|
|Active treatment||63 (52)||12 (40)||12 (40)||11 (37)||28 (93)|
|Usual care||14 (12)||0||1 (3)||13 (43)||0|
|Internal validity: adequate|
|Generation of allocation sequences||61 (51)||19 (63)||11 (37)||18 (60)||13 (43)|
|Allocation concealment||49 (41)||16 (53)||12 (40)||14 (47)||7 (23)|
|Blinding of patients||52 (43)||26 (87)||16 (53)||2 (7)||8 (27)|
|Blinding of care providers||38 (32)||24 (80)||11 (37)||2 (7)||1 (3)|
|Blinding of outcome assessors||71 (59)||25 (83)||22 (73)||11 (37)||13 (43)|
|Intent-to-treat analyses||38 (32)||10 (33)||10 (33)||12 (40)||6 (20)|
The median sample size (interquartile range [IQR]) was 100 (IQR 60–216) and was twice as high for reports of pharmacologic trials as for nonpharmacologic trials.
The control group was described as receiving active treatment in 63 (52%) reports, a placebo intervention in 43 (36%), and usual care in 14 (12%). Pharmacologic treatments and nonimplantable devices were mainly compared with placebo or active treatments, whereas rehabilitation interventions were mainly compared with usual care or active treatments, and surgical procedures were compared with active treatment in most reports.
The generation of allocation sequences was adequate in 51% of the reports. The treatment allocation was adequately concealed in 49 (41%) reports. Blinding was reported and was adequate for patients in 43% of reports, for care providers in 32%, and for outcome assessors in 59%. An ITT analysis was described in only one-third of the reports.
|Reporting of||All treatment (n = 120)||Pharmacologic treatment (n = 30)||Devices (n = 30)||Rehabilitation (n = 30)||Surgery (n = 30)|
|Method of recruitment||43 (36)||8 (27)||13 (43)||18 (60)||4 (13)|
|Specific method to enrich patient's recruitment||23 (19)||18 (60)||5 (17)||0||0|
|Duration of recruitment (10 patients/month)||56 (47)||10 (33)||11 (37)||15 (50)||20 (67)|
|Inclusion criteria||118 (98)||30 (100)||30 (100)||30 (100)||28 (93)|
|Exclusion criteria||106 (88)||30 (100)||27 (90)||28 (93)||21 (70)|
|Rate of exclusion criteria in each article, mean ± SD|
|Strongly justified||75.5 ± 23.6||77.7 ± 20.7||79.0 ± 21.3||66.9 ± 26.1||79.3 ± 25.6|
|Potentially justified||1.5 ± 4.7||1.1 ± 3.4||1.5 ± 4.7||2.8 ± 6.8||0.5 ± 2.3|
|Poorly justified||22.9 ± 23.0||21.3 ± 20.5||19.5 ± 19.4||30.3 ± 26.3||20.2 ± 25.3|
|Flow diagram||48 (40)||18 (60)||11 (37)||17 (57)||2 (7)|
|Number of eligible patients||50 (42)||12 (40)||14 (47)||21 (70)||3 (10)|
|Number of patients not meeting inclusion criteria||39 (33)||9 (30)||8 (27)||19 (63)||3 (10)|
|Number of patients refusing participation||31 (26)||6 (20)||8 (27)||14 (47)||3 (10)|
|Baseline characteristics of randomized patients||109 (91)||28 (93)||28 (93)||27 (90)||26 (87)|
|Age||108 (90)||28 (93)||28 (93)||27 (90)||25 (83)|
|Sex||101 (84)||27 (90)||24 (80)||25 (83)||25 (83)|
|Weight/body mass index||74 (62)||22 (73)||18 (60)||17 (57)||17 (57)|
|Ethnicity||18 (15)||8 (27)||3 (10)||5 (17)||2 (7)|
|Duration of disease||47 (39)||13 (43)||20 (67)||9 (30)||5 (17)|
|Measure of function status||55 (46)||17 (57)||15 (50)||16 (53)||7 (23)|
|Level of pain||47 (39)||14 (47)||15 (50)||15 (50)||3 (10)|
|Description of radiographic damage||27 (23)||5 (17)||11 (37)||4 (13)||7 (23)|
|NSAIDs/other drugs||19 (16)||6 (20)||6 (20)||6 (20)||1 (3)|
|Coexisting diseases||14 (12)||2 (7)||1 (3)||9 (30)||2 (7)|
|Location of recruitment||46 (38)||9 (30)||17 (57)||17 (57)||3 (10)|
|Setting of recruitment||40 (33)||7 (23)||16 (53)||16 (53)||1 (3)|
|Country where trial took place||25 (21)||8 (27)||6 (20)||10 (33)||1 (3)|
|Number of centers||54 (45)||19 (63)||16 (53)||15 (50)||4 (13)|
|Details about centers||24 (20)||3 (10)||10 (33)||9 (30)||2 (7)|
|Number of patients recruited in each center||0||0||0||0||0|
|Details of care provider||35 (29)||2 (7)||6 (20)||10 (33)||17 (57)|
|Number of care providers||33 (28)||2 (7)||4 (13)||7 (23)||20 (67)|
|Mode of administration||30 (100)|
|Duration of treatment||30 (100)|
|Frequency of treatment||30 (100)|
|Compliance of patients||10 (33)|
|Type of anesthesia||4 (13)|
|Preoperative care||2 (7)|
|Postoperative care||15 (50)|
|Technical procedure||30 (100)|
|Compliance of care providers||0|
|Number of sessions||29 (97)|
|Timing of sessions||26 (87)|
|Duration of each session||24 (80)|
|Content of each session||28 (93)|
|Mode of delivery||27 (90)|
|Supervision or not||25 (83)|
|Compliance of patients||15 (50)|
|Description of the device||28 (93)|
|Compliance of patients||9 (30)|
The method of recruitment was described in 43 (36%) of the reports. When described, this method relied on referral in 29 (67%) reports and self-selection in 14 (33%) (Table 2). The duration of recruitment was described in 56 (47%) reports; reporting was better in articles about rehabilitation. The median (IQR) duration of recruitment for 10 patients per month described was 0.4 (IQR 0.2–0.8) months for pharmacologic trials, 0.8 (IQR 0.3–1.9) for device trials, 1.2 (IQR 0.9–2.7) for rehabilitation trials, and 2.5 (IQR 1.1–4.4) for surgical trials.
Participant inclusion criteria were described in almost all reports (118 [98%]) and exclusion criteria in 106 (88%) reports (Table 2). Exclusion criteria focused on age in 64 (53%) reports, medical comorbidities in 79 (66%), sex in 17 (14%), medication in 57 (48%), socioeconomic status in 3 (2%), and patients participating in another trial in 6 (5%). Twenty-three percent of reports poorly justified exclusion criteria. These rates did not differ by category of treatment.
A flow diagram of participants through the trial was given in 48 (40%) reports. Data related to the number of eligible participants and the number of participants not meeting inclusion criteria or those refusing participation were reported in less than 50% of the reports, but reporting was better for rehabilitation trials. When given, the mean rates of participants not meeting inclusion criteria or refusing to participate were 22.5 (30%) and 19.2 (16%), respectively.
The baseline data of excluded participants were given in only 1 report. The baseline clinical characteristics of randomized participants were described in 109 (91%) reports. Characteristics concerned age and sex in more than 80% of reports, weight or BMI in 62%, and severity of disease (i.e., duration of the disease, pain, function, radiographic evidence of damage) in less than half. Patients' comorbidities were provided in only 12% of reports.
The interventions were described according to the CONSORT recommendations in all reports of pharmacologic trials and in most reports of rehabilitation trials, but were missing in reports of devices and surgery trials (Table 3). In the reports of medical device trials, a description of the device was given in 28 (93%) reports, but the manufacturer was stated in only 9 (30%). In the reports of surgical intervention trials, the technical procedure was given in all reports, but the type of anesthesia was reported in only 4 (13%), preoperative care in 2 (7%), and postoperative care in 15 (50%). Control treatment was described in most reports (117 [98%]). Descriptions of cointerventions were lacking in 28 (23%) reports, mainly reports of pharmacologic trials.
The setting was described in 40 (33%) reports and the number of centers in 54 (45%) (Table 2). The country where the trial took place was clearly reported in only 25 (21%). Details of centers were given in 24 (20%) reports. Other details such as center sources, organization, and expertise were never reported. The number of participants recruited in each center was never reported. Details on the care providers were given in 35 (29%) reports.
Information related to external validity was provided in the abstract of reports as follows: 5 (4%) articles described the country where the trial took place, 18 (15%) the setting, 14 (12%) the number of centers, 2 (2%) the number of eligible patients, 110 (92%) the number of patients randomized, 6 (5%) the length of recruitment, 98 (82%) the length of followup, and 2 (2%) data on care providers. External validity was discussed in the discussion section of 11 (9%) articles.
The global assessment of each component of external validity by category of treatment is highlighted in Figure 1. Reporting of essential baseline characteristics items was lower in reports of surgical trials (median [IQR] of 30% [30–40] of the essential items reported) than in those of trials of pharmacologic treatments, nonimplantable devices, and rehabilitation (median [IQR] of 50% [40–60], 50% [30–60], and 45% [30–60] of the essential items reported, respectively; P = 0.006).
The reporting of the intervention was better in reports of trials of pharmacologic treatments and rehabilitation (median 80% [IQR 80–100] and 86% [IQR 71–100] of the essential items reported, respectively) than for those of trials of nonimplantable devices and surgery (median 33% [IQR 33–67] and 40% [IQR 20–40]) of the essential items reported, respectively; P < 0.001).
The items dedicated to the context of the trial were poorly reported for trials of all treatments, especially pharmacologic treatments and surgery (median 12% [IQR 12–25] and 25% [IQR 12–25]) of the essential items reported, respectively; P = 0.016).
This study assessed the reporting of external validity in a sample of 120 RCTs assessing pharmacologic and nonpharmacologic treatments for hip or knee OA during a 5-year period. Our results highlight the lack of data related to external validity in published reports of RCTs. Methods for recruiting patients were described in one-third of the reports; 22.9% of the exclusion criteria were poorly justified; important baseline data of patients were lacking; and setting, centers, and care providers were described in less than one-third of articles. Further, the reporting of external validity differed depending on the category of treatment. Reports of trials assessing rehabilitation provided more adequate data related to recruitment, participants, setting and centers, and intervention. Reports of trials assessing surgical procedures lacked such data, even though the reporting of some items, such as the setting, the number of centers, and center volume, is particularly important in this field. In reports of pharmacologic trials and trials assessing nonimplantable devices, the reporting was of varying quality. In reports of pharmacologic trials, the reporting of the method of recruitment and of data related to centers and care providers was poor, but the reporting of the intervention was good.
To our knowledge, this is the first study that has systematically appraised the reporting of data related to external validity from trials assessing pharmacologic and nonpharmacologic treatments. Most recent efforts of researchers and editors to improve the reporting of results of RCTs, such as the CONSORT initiative, have mainly focused on internal validity (1, 9). Nevertheless, external validity is also essential and needs to be emphasized (25, 26). The results of RCTs and systematic reviews cannot be relevant to all patients and all settings. Consequently, reporting the results of RCTs should allow clinicians to judge to whom and in which context these results could reasonably be applied.
The setting, care providers, and centers have obvious implications for external validity (5, 27). In fact, the applicability of results of trials performed in secondary or tertiary settings applied to primary settings is often a concern (5). Further, differences between health care systems can affect the applicability of results, especially regarding organization of care or reimbursement for the cost of care (5). These issues are crucial in trials assessing nonpharmacologic treatments such as surgery or technical interventions. In fact, hospital and care providers' volume and outcome are related (28–33). A surgical procedure might be found to be safe and effective in an RCT performed in high-volume centers by high-volume care providers, but applying these results to low-volume centers might result in very different results (27, 34, 35). Surprisingly, the reporting of data on care providers and centers was far less than optimal in our study, especially for trials assessing surgical procedures.
The representativeness of the patients included in an RCT is also a major issue for external validity. The inclusion and exclusion criteria are among the greatest challenges in achieving representativeness of participants. Highly selective eligibility criteria can considerably reduce the applicability of the trial results. Our results highlight the lack of reporting of exclusion criteria in 12% of the trial reports and 23% of reported exclusion criteria were poorly justified. These results are consistent with those of a systematic review of RCTs published in high impact factor journals between 1994 and 2006 (23). Exclusion criteria reported in our articles concerned mainly elderly patients, those with medical comorbidities, or those treated with specific categories of treatments. The exclusion of these specific categories of participants is problematic because it limits the representativeness of the patients.
The representativeness of the participants is also problematic because those agreeing to participate in RCTs often differ from those who do not participate (36–39). Consequently, the number of eligible nonrandomized patients, as well as the number of participants who were invited to participate but declined, is important for adequate appraisal of the external validity of a trial (22). However, these data were reported in only one-third and one-quarter of our reports, respectively, which is consistent with previous results (40).
Reporting the baseline clinical characteristics of participants included in RCTs should allow clinicians and others to assess external validity by comparison with their patients. Although baseline characteristics were described in almost all of our reports, some important data were missing: weight or BMI, while essential, was given in only 62% of the selected articles. Ethnicity, comorbidities, and severity and activity of the disease (pain, function, radiographic evidence of damage), which also predict response to and influence the generalizability of treatment, were also inadequately reported (41–44).
External validity could also be affected if trials have treatment protocols that differ from usual clinical practice, or have overly stringent limitations on the use of cointerventions. To be able to adequately apply the results of the trial in clinical practice, the treatments should be described in detail to allow for adequate reproducibility. Our results highlight the lack of descriptions of nontrial treatments in two-thirds of the reports of pharmacologic trials, and the lack of descriptions of all the components of nonpharmacologic trials, especially in reports of surgery (45). Finally, despite a specific item of the CONSORT statement dedicated to external validity, very few articles considered this issue in the discussion section.
Our study has several limitations. First, we focused on the reporting of the trial, not its conduct. Consequently, these results highlight the lack of adequate reporting of external validity criteria and do not provide information on the applicability of the results of the trial. Second, the results related to the rate of poorly justified exclusion criteria might be underestimated. Some researchers have highlighted the inadequate reporting of eligibility criteria when comparing the published article with the protocol (46); among an average of 31 eligibility criteria, only 63% were described in the main trial reports. Third, we focused on RCTs assessing hip and knee OA, and these results should be confirmed in other medical areas. However, we chose this disease because it is frequent and involves a wide range of pharmacologic and nonpharmacologic treatments. Further, the authors had some expertise in rheumatology and orthopedics and could therefore adequately evaluate the context of the trials.
In conclusion, this study highlights the lack of consideration of external validity in published reports of RCTs. Much attention is paid to the internal validity of clinical trials; however, even results of well-designed clinical trials are of limited use to clinicians if they have poor external validity and are not applicable to the patients for whom the intervention is designed. Recently, the CONSORT group developed an extension of the CONSORT statements for pragmatic trials. This extension increases the focus on data related to external validity. This initiative should help improve the consideration of external validity.
Dr. Boutron had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study design. Ahmad, Boutron, Moher, Ravaud.
Acquisition of data. Ahmad, Pitrou.
Analysis and interpretation of data. Ahmad, Boutron, Moher, Pitrou, Roy, Ravaud.
Manuscript preparation. Ahmad, Boutron, Moher, Ravaud.
Statistical analysis. Ahmad, Roy.