The frequency of assessment of progression in randomized oncology clinical trials

Abstract Background Progression in tumor assessments is often detected at a follow‐up appointment rather than when actual change in progression has occurred, which can bias PFS outcomes. Aim We sought to evaluate the frequency of tumor assessment scans in clinical trials of anti‐cancer interventions and to compare this to recommended (National Comprehensive Cancer Network) and real‐world frequencies of tumor assessments. Methods In a cross‐sectional analysis, we searched for articles published in the three top oncology journals between July 2017 and June 2020. We included articles that were RCTs of patients that had unresectable or metastatic solid tumors and used an intervention that was designed to be anti‐tumor. We abstracted median PFS survival for each group, the PFS hazard ratio, frequency of tumor assessment scans, tumor type, intervention type, and information regarding the study. Results We found that, in the 182 comparisons (163 articles), less frequent tumor assessment (occurring more than 9 weeks between assessments) was associated with higher median PFS values for both the intervention group (p < .0001) and the control group (p < .0001). PFS hazard ratios for studies scanning for tumors every 10 or more weeks were no different than for studies scanning for tumors more frequently (p = .88). Data on the frequency of tumor assessments in the real world is sparse. Conclusion We found that less frequent tumor assessment frequency was associated with longer median PFS in both intervention and control groups of clinical oncology trials but was not associated with differences in PFS hazard ratios. Future research is needed to compare real world to trial assessment.


| BACKGROUND
Progression-free survival (PFS) is one of the most common primary endpoints in oncology clinical trials, 1 and is a composite of death or tumor growth after treatment. Because the measurement of PFS primarily relies on tumor assessment, it can be biased by variables such as the timing and methodology of tumor assessment. As such, the increasing use of PFS in place of other well-established outcomes, such as overall survival, in oncology trials should rely on measurements with minimal bias and consistent protocols.
Previous authors have suggested several recommendations for designing and executing clinical trials in reducing bias related to PFS outcomes measurement, including assessment bias (related to the frequency of tumor assessments) and evaluation bias (treatment arms receiving tumor assessments at different frequencies). 2 The timing of tumor assessments is especially influential on PFS outcomes since progression is often detected at a follow-up appointment rather than when actual change in progression has occurred, 3 thus a longer time interval between tumor assessments can overestimate PFS.
An additional consideration is whether the tumor assessment in clinical trials is representative of real-world practice, or at least practices that are recommended for the real-world practice. The US National Comprehensive Cancer Network has issued recommended scan frequencies for some cancers, and the frequency of scans varies by cancer type and treatment type, recommending every 2-6 months for women with breast cancer treated with endocrine therapy and every 6-12 weeks for women treated with cytotoxic chemotherapy, and as frequently as every 6-16 weeks for kidney cancer. 4 It is with this background that we sought to evaluate the frequency of tumor assessment scans in clinical trials of anti-cancer interventions for solid tumors and to compare this to recommended and real-world frequencies of tumor assessments. We further sought to assess the association between tumor assessment frequency and PFS measurements.

| METHODS
We sought to characterize the frequency of tumor assessments in the literature and to see if there is an association between tumor assessment frequency and either PFS or overall survival indices in oncology studies.

| Article inclusion and data abstraction
Articles published in the three top oncology journals (Lancet Oncology, JAMA Oncology, and Journal of Clinical Oncology) between July 2017 and June 2020 were considered for inclusion. We included articles that were RCTs of patients with cancers that were unresectable or metastatic solid tumors and used an intervention that was designed to be anti-tumor. Studies that did not report PFS hazard ratio, median PFS, or tumor assessment frequency, were pooled analyses, were dose-optimization studies (no comparator), or the intervention was to prevent cancer were excluded. For studies that had more than two arms, we analyzed each comparison separately.
We abstracted median PFS survival for each group, the PFS hazard ratio, frequency of tumor assessment scans, tumor type, intervention type, whether the tumor assessments were blinded (double blind or masked tumor assessment vs. open and unmasked tumor assessments), and information regarding the study (e.g., publication date, journal, etc.). Because a number of studies did not report a PFS hazard ratio, we calculated a risk ratio from the reported median PFS control and intervention values for all studies. Both the risk ratio and the PFS hazard ratio were used as separate outcomes. For the three studies where median PFS was not reached, we used the time the study participants were followed in place of the median PFS. Most studies assessed the tumor response at regular intervals, but for those that had varying frequencies, we used the frequency first used lasting 6 months or longer. We then categorized tumor assessment frequency by <8 weeks, 8-9 weeks, and >9 weeks.
We then compared the tumor assessment frequency with guideline recommendations (National Comprehensive Cancer Network) for the frequency of tumor assessments. We also searched on Google Scholar and PubMed for studies that reported on the frequency of tumor assessment in real-world clinical practice. For search terms, we used the cancer type (for the five most common) and "frequency of tumor assessment real world". For this search, we did not include articles that reported frequency in clinical trials.

| Statistical analysis
We calculated descriptive statistics for included studies. We calculated differences in PFS and overall survival indices by tumor assessment frequency category using analysis of covariance. We ran separate models for PFS hazard ratio, PFS risk ratio, and overall survival hazard ratio. To examine the effects of blinding and tumor type on PFS outcomes, we calculated differences in PFS indices by tumor assessment frequency, stratified by whether the study was blinded or not and by the most common tumor types. We did not include one study in the model analyses because it was found to be an outlier when we checked model residuals. We checked model assumptions by using a QQ plot for normality and the residuals versus fits plot for homogeneity of variance. All data were publicly available and nonidentifiable to patients or study participants, so no institutional review board approval was required. All analyses were done using R statistical software.

| RESULTS
We reviewed 1484 articles. Among excluded articles, 947 were not RCTs; 171 trials did not include patients with unresectable or metastatic cancer; 54 did not include PFS outcomes; 55 used interventions that were not anti-tumor; 40 involved non-solid tumors; 26 were subgroup or secondary analyses; 17 were pooled analyses; 6 had interventions designed to prevent cancer (not treat), and one was a dosefinding study. We further excluded two articles -one article was retracted, and another article reported that the median PFS was not reached in either arm and did not report a hazard ratio, and therefore we had no numbers to use in the analysis. We then excluded two additional studies because they did not report a tumor assessment frequency. The remaining 163 articles were included in the data analysis. Sixteen articles had multiple arms, which resulted in 182 total comparisons.
The percentage of studies with blinded tumor assessments by tumor assessment category was as follows: 38% of studies had tumor assessments being done every 12 weeks or longer; 36% studies had tumor assessments being done every 8 weeks; and 45% of studies had tumor assessments done less than every 8 weeks (chi-square = 0.14; p = .93).

NSCLC=non-small cell lung cancer
F I G U R E 1 Frequency of tumor scans in oncology studies assessing progression free survival in all tumor types combined (overall) and the 3 most common cancer types encountered in studies published in the top 3 oncology journals July 2017 through June 2020. NSCLC, non-small cell lung cancer T A B L E 1 Mean values (and 95% confidence intervals) for progression-free hazard ratios, median progression-free survival for the intervention group, median progression-free survival for the control group, and overall survival hazard ratio, by progression scan frequency for randomized, metastatic oncology studies The median scan frequency for the studies was every 8 weeks We found that less frequent tumor assessment (occurring more than 9 weeks between assessments) was associated with higher median PFS values for both the intervention group (F = 21.76; p < .0001; Table 1 and Figure 2) and the control group (F = 25.24; p < .0001). Median overall survival times were higher for studies that had tumor assessment every 10 or more weeks for both the intervention group (F = 7.91; p < .0001; Table 1 and Figure 2) and the control group (F = 9.00; p = .0002). Both PFS (F = 0.13; p = .88) and overall survival (F = 1.54; p = .22) hazard ratios for studies scanning for tumors every 10 or more weeks were numerically higher than for studies scanning for tumors more frequently, but the differences were not significant (Table 1). Model assumptions for normality and homogeneity of variance were met in these models.
We found that PFS hazard ratios were higher in studies with unblinded assessments than studies that had blinded assessments  of the study (p = .16). We did not find any differences in PFS hazard ratio tumor between categories of tumor assessment frequency in either blinded studies or (p = .62) or unblinded studies (p = .50). When looking at the association between tumor scan frequency and PFS hazard ratio in each of the three most common cancer types, there was no association (NSCLC p = .20; breast p = .61; colorectal p = .94; data not shown).
In looking at the NCCN recommendations for the most common cancers (Table 2), the recommended frequency varied by tumor type and could be as infrequently as every 2-6 months for breast cancer that was treated with hormone therapy and 12 months for melanoma.
Data on the frequency of tumor assessments in the real world is sparse. The real-world frequency was more often than the frequency suggested by NCCN guidelines, but studies reporting these frequencies were conducted prior to the recent NCCN re-evaluations of their guidelines for the most frequent cancer types.

| DISCUSSION
In our examination of tumor scan frequency in oncology studies, we found that studies that assessed progression less frequently (i.e., longer intervals between progression scans) more often occurred in tumor types and settings where PFS was longer, either due to the natural biology of the disease or the treatment. This did not translate into significantly different PFS hazard ratios between categories of tumor assessment frequency. Several authors have suggested the possibility of biased PFS outcomes due to the timing of tumor assessment, 3,10 but in the studies that we included in our analysis, PFS outcomes of cases relative to controls were not differentially assessed between the tumor assessments frequency categories.
We should be careful to say that our study cannot exclude the possibility that the frequency of scanning can result in varying PFS hazard ratios. In all these trials, the sponsor and investigators chose the frequency with the knowledge of the underlying biology, and some prediction of the putative efficacy (this is required for a power calculation). That knowledge may lead to the choice of PFS assessment interval. Put another way, the PFS assessment interval is not randomly selected. As such, we cannot draw a firm causal conclusion that alternative intervals would not alter the observed hazard rations; Instead, we merely observe the phenomenon, that as conducted, our study failed to find such differences.
We also found that that median overall survival was longer in studies that assessed tumor frequency less often, which may be a result of less aggressive or slower growing tumors needing assessments less frequently. Because survival is a hard outcome, survival status should not be biased by treatment status like softer outcomes, such as PFS might be. One possible explanation for this is attrition bias or informative censoring from incomplete follow-up when there are longer intervals between tumor assessments. 11,12 Conversely, studies that had a longer interval between assessments may have been those with an intervention that was less impactful on survival outcomes.
We were not able to find much data on the frequency of tumor assessments in the real world, and it was difficult to determine for the longest duration, and the one in which progression was most likely to occur, as these were studies on metastatic tumors.

| CONCLUSION
In conclusion, we document the frequency of scans in a range of contemporary randomized controlled trials. We found that less frequent tumor assessment frequency was associated with longer median PFS in both intervention and control groups of clinical oncology trials, but was not associated with differences in PFS hazard ratios. This may be explained by deliberate choices made by investigators to assess progression less frequently based on lower event rates. Future research is needed to compare real world to trial assessment.

DATA AVAILABILITY STATEMENT
These data were derived from journal websites that are public domain.

ETHICS STATEMENT
All data were publicly available and non-identifiable to patients or study participants, so no institutional review board approval was required, nor was individual informed consent.