Investigating the impact of open label design on patient‐reported outcome results in prostate cancer randomized controlled trials

Abstract Background While open‐label randomized controlled trials (RCT) are common in oncology, some concerns have been expressed with regard to Patient‐Reported Outcomes (PRO)‐based claims stemming from these studies. We aimed to investigate the impact of open‐label design in the context of prostate cancer (PCa) RCTs with PRO data. Methods Randomized controlled trials of PCa with a PRO endpoint published between 2004 and 2018 were considered. RCTs were systematically evaluated on the basis of previously defined criteria, including international PRO reporting quality standards and the Cochrane Collaboration's tool for assessing Risk of Bias. The rate of concordance was estimated and compared between traditional clinical outcomes (eg, survival or tumor response) and PRO in open and blinded RCTs. Results We identified 110 RCTs published between 2004 and 2018, of which 62% (n = 68) were open‐label. The general characteristics of PCa RCTs were not different according to their design (open‐label vs blinded). The proportion of PCa RCTs with high‐quality PRO reporting was not different between open‐label RCTs and blinded RCTs (41.2% vs 38.1%; P = .75). No statistically significant difference was found between PRO results and concordance with traditional clinical outcomes according to the study design. Conclusion Our findings suggest that there is no evidence of significant bias for PROs due to the absence of blinding in the context of PCa RCTs. Further analyses should be conducted in other cancer disease sites.


| INTRODUCTION
Being the most common cancer in males, 1 it is not surprising that prostate cancer (PCa) has a severe impact on the burden of disease. Its various treatments (eg, radical prostatectomy, androgen deprivation therapy, chemotherapy) come with a number of potential side effects [2][3][4] and hence have an effect on health-related quality of life (HRQoL). [5][6][7] The latter is therefore also an important factor when treatment choices have to be made. [8][9][10][11] Both the US Food and Drug Administration (FDA) and the European Medicines Agency (EMA) highly endorse the use of patient-reported outcomes (PROs) in this context by requiring the integration of the patients' perspective through better reporting of adverse events and HRQoL in randomized controlled trials (RCTs). 12,13 However, several systematic reviews have highlighted that a high proportion of RCTs including PROs poorly report on these measurements, with missing information being very common. [14][15][16] Another important methodological issue with the reporting of PROs in RCTs is the open-label setting. 17,18 Hence, the FDA rarely considers open-label RCTs adequate for PRO based claims. [19][20][21] Nonblinded patients may report symptoms and adverse events differently compared to blinded patients. 22 Moreover, open-labeling may result in patients assigned to the control group being more likely to drop out, while patients in the experimental group being more likely to complete their PRO monitoring. [23][24][25] Some concerns with respect to PRO reporting have also been expressed for RCTs with unintentional unblinding when treatments have specific toxicities. 22,26 Nevertheless, open-labeling is common in oncology RCTs due to practical restrictions, 20,27 hence it may be a challenge to integrate PRO measurement in oncology clinical trials and meet regulators' requirements. 12,28 To the best of our knowledge, there is no published data systematically investigating the impact of open-labeling in the context of PCa RCTs with PRO data. We therefore aimed to compare the proportion of concordance and discordance between traditional clinical outcomes and PROs in open-label and blinded PCa RCTs.

| Data selection
The analysis reported here was based on data collected from the large Patient-Reported Outcome Measurements Over Time In ONcology (PROMOTION) database. 29 This registry (promo tion.gimema.it) includes all cancer RCTs that have included at least one PRO, either as a primary or secondary/exploratory study endpoint, published since 2004 identified through systematic literature searches in electronic databases (eg, PubMed/MEDLINE). The registry intends to facilitate the evaluation of the quality of RCT-based PRO assessment methodology, instruments, statistical analysis and reporting. 29 For this analysis, all RCTs of PCa published between January 2004 and June 2018 were considered.
Details of inclusion criteria and methodology to evaluate studies have been described previously. 30 Briefly, all RCTs comparing different conventional medical treatment modalities and symptom management enrolling at least 50 patients with PCa (combined arms) were studied. Studies assessing prevention or screening programs, complementary or alternative medicine or psychosocial intervention were excluded. The search was restricted to English language articles. If a selected study had multiple publications, we incorporated all relevant papers in the analysis. More specifically for this update, four reviewers independently reviewed all identified studies, and a fifth reviewer was consulted in case of disagreement.
We specifically collected information on "blinding of participant" using the Cochrane Risk of Bias tool. The Cochrane Risk of Bias tool provides a framework for assessing risk of bias in studies included in a systematic review. 31 The tool covers six domains of bias: selection bias (random sequence generation; allocation concealment), performance bias (blinding of participants and personnel), detection bias (blinding outcome assessment), attrition bias (incomplete outcome data), reporting bias (selective reporting), and other possible bias. Performance bias is focused on blinding of participants and personnel and quantified as 'low', 'high' or 'unclear'. This review classified the RCTs into two groups: (a) "open-label trial" because of a high risk of performance bias and (b) "blinded trial" because of a low risk of performance bias. RCTs with performance bias classified as "unclear" were reviewed again by two of the reviewers to reclassify as "open-label" or "blinded" (GM and AA).

| Concordance between PRO and traditional endpoint results
For each trial we calculated concordance between PROs and more traditional clinical outcomes. For the purpose of this review, we will refer to "clinical outcomes" to identify any type of non-PRO assessment (eg, such as survival outcomes, adverse events or tumor response), used as endpoints in each considered RCT. Each PRO was assessed as "better", "no difference", or "worse" compared to the experimental to control arms. For example, if more than half of the PRO dimensions that were statistically significant were in favor of the experimental arm, the PRO results were considered as "better". If none of the PRO dimensions were statistically significant, or if half of the PRO dimensions were in favor of each treatment arm, the PRO results were classified as "no difference". Trials reporting only descriptive results for PRO endpoints were thus excluded from this analysis. For the clinical outcomes, the same classification was then performed. We then calculated the rate of concordance for clinical outcomes and PRO in open and blinded RCTs.
In addition, we evaluated the quality of PRO reporting in open-label vs. blinded RCTs according to International Society for Quality of Life Research (ISOQOL) PRO recommended criteria, 30,32 which laid the groundwork for the subsequent development of the CONSORT-PRO extension. 33 Studies were categorized as "high quality of PRO reporting" if at least 20 out of 29 (for primary endpoints) criteria were satisfied (or 12 out of 18 for secondary endpoints). Differences in reporting between open-label and blinded RCTs were then quantified using the chi-square test performed at the statistical level of 5%.
Qualitative variables are described as absolute and relative frequencies. Chi-square test or Fisher's exact test was used to compare qualitative variables. All tests were two-sided at the statistical level of 5%. All analyses were conducted on SAS software version 9.4 (SAS Institute Inc).

| General results
A total of 110 RCTs were identified according to our predefined selection criteria among 2,952 records screened  between January 2004 and June 2018. Figure 1 shows the Flowchart for the inclusion and exclusion of PCa RCTs. Among all the 110 RCTs analyzed, 68 (61.8%) were nonblinded/open-label studies and 42 (38.2%) were blinded to the patients at least (Table 1). A total of 66 (60.0%) RCTs had an overall sample size > 200 patients, 45 (40.9%) were conducted in more than one country, and 65 (59.1%) were supported by industry.
A large part of the RCTs included patients with locoregional PCa (42.7%, n = 47), and hormonal treatment was most frequently used (40.9%, n = 45). A statistically significant difference between treatment arms in the clinical primary endpoint was found in 51 (56.0%) RCTs.
With respect to the PRO components, 38 RCTs (34.6%) had a PRO measure as primary endpoint. PRO results were detailed in a secondary paper for 31 RCTs (28.2%). The general characteristics of PCa RCTs were not statistically different according to their design (open-label vs blinded) except the disease stage and that a majority of blinded RCTs were industry supported (80.9%, n = 34; P < .001).

| Impact of blinding on PRO results
Analysis of concordance with clinical outcomes was conducted on 98 RCTs (37 blinded RCTs and 61 open-label RCTs), since 12 studies only reporting descriptive PRO results were excluded. The proportion of RCTs reporting a difference between treatment arms in the primary endpoint was not different between blinded and open-label RCTs.
Among the 55 RCTs reporting better clinical outcomes in favour of the experimental arm, 56.4% (n = 31) reported better PRO, 25.4% (n = 14) reported PRO equivalence and 18.2% (n = 10) reported worse PRO in the experimental arm. Of the 36 RCTs reporting clinical outcomes not different between arms, PROs were reported to be better in the experimental arm in 36.1% (n = 13) of the RCTs, were not different in 55.6% (n = 20), and worse in 8.3% (n = 3).
Finally, no statistically significant difference was found between PRO results and concordance with clinical outcomes according to the status of the study (ie, blinded or not to the patients) ( Table 2). For the RCTs reporting equivalent clinical outcome or nondifference between arms, better PROs were reported in 35.0% (n = 7) of the open-label trials and 37.5% (n = 6) of the blinded RCTs. The proportions of RCTs which reported no difference in PRO among those reporting no difference in clinical outcomes were also consistent across subgroups, with 55.6% of all RCTs, 55% of open-label, and 56.2% of blinded RCTs.

RCT designs
The quality of reporting was globally equivalent between open-label and blinded RCTs (Table 3). However, the rationale for the choice of the PRO instrument was more frequently provided in open-label RCTs (66.2% vs 42.9%, P = .02). Conversely, additional details regarding the hypothesis of PRO analysis and post hoc analyses were found in a higher proportion of blinded RCTs (8.7% vs 33.3% P = .09 and 27.9% vs 57.1%, P < .01 respectively).
The status of PRO as either a primary or secondary endpoint was stated more frequently in blinded RCTs, albeit this difference was not statistically significant (73.5% vs 90.5%, P = .07). The extent of missing data was stated in 73.5% and 66.7% of the open-label and blinded trials respectively, while the statistical approaches for dealing with these are less frequently reported (25% and 28.6%). Overall, the proportion of PCa RCTs with high-quality reporting was not different between open-label RCT and blinded RCT (41.2% vs 38.1%; P = .75) ( Table 4).

| DISCUSSION
When comparing concordance between traditional clinical outcomes and PROs between open-label and blinded RCTs for PCa, we identified 110 RCTs published between 2004 and 2018. The majority of published trials were open-label (62%) and concordance between PRO and clinical outcomes was not different between the two types of RCT study design.
In oncology clinical research, PROs complement other clinical outcomes such as survival, and adverse events assessed by the physicians and allow to incorporate the patient experience in the development of new drugs. A recent review of PRO labeling for oncology drugs approved by the FDA, and the EMA highlighted that among 49 oncology drugs approved between 2012 and 2016, no FDA PRO labeling was identified. While various reasons were noted, a key reason was also related to the open-label design of RCTs. 21 Bias may occur in open-label trials, as observer bias and disappointment bias. [34][35][36][37] Therefore, according to the FDA, patients may be prone to provide biased reports of their own symptoms if they are aware of the treatment they received and lead to an overestimation of the treatment difference observed between the two treatment arms. Disappointment bias may affect dropout, and missing data when patients are assigned to the control group. 23 In two recent publications, Roydhouse and colleagues have explored PRO completion rates between study arms in randomized open-label and double-blind cancer trials submitted to the FDA. 20,38 Their work underlined that differences favoring the experimental arm were seen only in four RCTs in which substantial between-arm completion rate differences were observed. However, completion rates were high, and comparable between arms in a majority of open-label RCTs. 20,38 Because open-label designs are rather frequent in RCTs, some recommendations to help PRO results to impact labeling decisions in these (ie, open-labels) research settings have been proposed: well-designed RCT, well-defined and adequate PRO measures, optimized PRO questionnaire completions rates, minimization of missing data, documentation of missing data, demonstration of large magnitude of effect, and possible consideration of follow-up studies with PROs. 18,39 In our systematic review, the proportion of open-label PCa RCTs is comparable to those generally observed in previous reviews. 18,20 We found that the results of the PROs were not consistently in favor of the experimental arm in open-label RCTs. Another review by Atkinson and colleagues identified five double-blind negative RCTs that reported no significant difference in PROs between study arms despite imbalances in multiple toxic effects. 40 The authors concluded that these results might suggest that there is no sufficient bias to affect PRO between arms. Therefore, taken together with our findings, current evidence-based data do not support previous concerns expressed with regard to the negative impact of open-label design on overall quality of PRO findings.
There is a risk of a global devaluation of PRO relevance to systematically consider with suspicion PRO results in open-label trials. Recent reviews pointed out that PRO reporting is far from the high-quality standards emphasized by regulatory stakeholders and panel expert recommendations. 16 Only 30% of the trials submitted by the sponsor to the FDA reported PRO compliance. 38 Furthermore, in our analysis, the quality of PRO reporting according to ISOQOL recommendations was globally equivalent between open-label and blinded RCTs. However, the overall quality of the reports is far from what we would expect as highlighted in a recent review. 16 However, it is difficult to provide a definitive answer on the actual role of the open-label design on PROs in RCT settings. To further explore it, a case-control study or meta-analysis which includes RCTs evaluating the same treatment in open-label and blinded RCTs and using the same PRO questionnaires could provide additional insights. Recent large international initiatives have been set up to provide guidelines to help standardize the analysis of HRQoL and other PRO measures in cancer RCTs as well as help design PRO in trial protocols. [41][42][43] These recommendations emphasize the need to reach high methodological quality in PRO researches.

Title and abstract
The PRO should be identified as an outcome in the abstract. .

Outcomes
The mode of administration of the PRO tool and the methods of collecting data should be described

Participant flow
A flow diagram or a description of the allocation of participants and those lost to follow-up should be provided for PRO specifically.

Baseline data
The study patients' characteristics should be described, including baseline PRO scores.
. Our study has limitations that should be noted. Our analysis was exploratory and may not be calibrated in terms of statistical power to detect a difference. Also, we could not get into details of the RCTs and explain for each RCT why PRO results were better or worse. Furthermore, RCTs included in the analysis were heterogenous in terms of therapies and setting (localized vs metastatic castration-resistant PCa) which can have a different impact on PROs. Future works should focus on a specific disease state to confirm these results. Finally, the impact of open-label design on compliance could not be assessed in our systematic review since we did not collect data about the rate of missing data. These items were originally combined in the ISOQOL recommended standards but have been split in this report to better investigate possible discrepancies between documentation of PRO missing data (ie, reporting how many patients did not complete a given questionnaire at any given time point) versus actual reporting of statistical methods to address this issue. b These items were not included in the ISOQOL recommended standards but have been evaluated in our study and reported in this This study also has strengths. To the best of our knowledge, it is the first evidence to systematically examine the risk of bias in open-label RCTs in PCa, and analyses were based on a large number of studies published over the last several years. Also, our evaluation was based on internationally endorsed state of the art PRO reporting quality criteria.
To conclude, our findings suggest that there is no evidence of significant bias for PROs due to the absence of blinding in the context of PCa RCTs. Since the research question addressed in our work is not only relevant to PCa RCTs, further analyses should also be conducted to evaluate whether these results may extend to RCTs conducted in other cancer disease sites.