Strength of clinical evidence leading to approval of novel cancer medicines in Europe: A systematic review and data synthesis

Abstract We aimed to evaluate the quality of clinical evidence that substantiated approval of cancer medicines by the European Medicines Agency (EMA) in the last decade. We performed a systematic review and data synthesis of EMA documents in agreement with PRISMA guidelines. We included the European Public Assessment Reports, Summaries of Product Characteristics, and published randomized controlled trials (RCTs) on anti‐cancer drugs approved by EMA from 2010 to 2019, and excluded drugs not indicated for targeting solid or hematological tumors and non‐innovative treatments. We synthesized frequencies of approvals differentiating between unblinded and blinded RCTs with and without overall survival (OS) as a predefined primary outcome measure. We assessed the frequency of post‐approval RCTs for indications without at least one RCT at the time of approval. Of 199 approvals, 159 (80%) were supported by at least one RCT, 63 (32%) by at least one RCT having OS as the primary or co‐primary endpoint, 74 (37%) by at least one blinded RCT, and 30 (15%) by at least one blinded RCT having OS as the primary or co‐primary endpoint. Whereas 40 approvals (20%) were not supported by any RCT and, of those, 9 (22%) were followed by a post‐approval RCT. While the majority of approvals of cancer medicines approved by EMA was supported by at least one RCT, we noted substantial methodological heterogeneity of the studies. Clinical trial registration: PROSPERO registration number CRD42020206669.


| INTRODUC TI ON
Cancer is the second leading cause of death in the world. 1 This huge unmet medical need translated in the approval of numerous new drugs by the regulatory agencies in the recent decade. 2 Current evidence shows that several cancer indications are approved without randomized controlled trials (RCTs) and without overall survival (OS) data as a primary endpoint. [3][4][5][6][7][8][9] These data are still missing even after years from the approval and a comprehensive overview of the approvals that occurred in the last decade is currently missing.
We hypothesized that the urgency of bringing new therapies to the market may be accompanied by poor clinical evidence, in terms of the quality of study design and endpoints used to measure efficacy. Therefore, the objective of our research is to evaluate, in terms of study design and outcomes, the quality of clinical evidence Randomized controlled trials are commonly recognized as the gold standard for the evaluation of new therapies, providing the strongest level of evidence and proof of cause-effect relationship thanks to their high internal validity. 10,11 One of the key strengths of RCTs is the provision of comparative evidence (CE) either versus placebo, active treatment or standard of care. CE represents a fundamental asset for the optimal use of a drug at its entry into clinical practice since despite existing methodological complexities, comparative efficacy evidence should have a formal role in drug licensing decisions. 12 At the time of first approval, in fact, the comparative profile of benefit and risk represents a safeguard for public health by preventing the use of potentially unsafe or inferior treatments compared to the options already on the market. From an economic point of view, CE allows health technology assessment organizations and payers to make better decisions. From the clinical side, it offers doctors and patients the opportunity to understand which is the safest and effective treatment.
In the field of oncology, key treatment goals should be an improvement of clinically relevant endpoints such as OS, quality of life (QoL), or both. 13 Other survival measures, including the widely used progression-free survival (PFS), represent surrogate endpoints. PFS may be biased due to difficulty in measuring progression, use of non-standardized measurement procedures, informative censoring when patients leave the study without documentation of progression or assessor's expectations in case of open-label studies. The correlation of PFS with OS may be poor in some settings. If survival after progression is long, for example, longer than 12 months, it may be difficult to show benefit in OS, and the use of PFS may be preferable. 14 However, if a new treatment offers a clear advantage in terms of OS or QoL, surrogate endpoints are not necessary or should be used as a primary endpoint only in the early stages of clinical development.
The usefulness of a treatment that has only demonstrated positive effects on a surrogate endpoint, not clearly correlated with OS, is questionable. Moreover, surrogate endpoints often give an overestimation of benefit and may lead to approval of medicines that only provide a marginal benefit in a real-world setting. 13 In addition, PFS does not directly measure how a patient really feels or lives; it provides information on the effects of the intervention on the tumor burden process. Therefore, a significant effect on PFS is not enough to achieve reliable evidence of clinical benefit. The real need of cancer patients is to achieve clinically meaningful beneficial effects on disease-related symptoms, on ability to carry out normal activities, and on OS. 15 Thus, it is crucial that new cancer drugs also show their capability to increase in OS, QoL, or ideally both. 16 On the other hand, there are limitations to conducting RCTs and using OS as the primary endpoint, which in some cases can make their implementation not feasible. For example, RCTs may be limited by economic factors or inaccessibility of rare populations. 17 Using OS as a primary endpoint requires larger sample sizes and longer follow-up. 18 In comparison to OS, PFS and response rate as primary endpoint were associated with an 11-month (95% CI, 5-17 months) and a 19-month (95% CI, 13-25 months) reduction in the study du-  20 We aimed to evaluate the quality of study design and outcomes reported by studies supporting EMA approval of new drugs and/or indications for the treatment of cancer in the decade 2010-2019.
Therefore, we assessed the frequency of new cancer indications supported by at least one randomized and controlled trial having OS as a primary or co-primary endpoint.

| Protocol and registration
We performed a systematic review and synthesized analysis. In accordance with the study design, informed patient consent and ethical and regulatory approval are inapplicable. The present work has been conducted in agreement with Preferred Reporting for Systematic Reviews and Meta-analyses (PRISMA) guidelines, 21 and the protocol was registered in the PROSPERO International prospective register of systematic reviews (CRD42020206669).

| Eligibility criteria
We included all new drugs and/or new indications for the treat-  For each drug, the "initial marketing-authorization documents" and "changes since initial authorization of medicine" in the Assessment history section of each product page were accessed.

| Information sources and literature search
The first one reports the data used to request the first authorization, the second one reports the data to request the authorization of further indications if any. Data on studies were obtained from the "main studies" section reported in the clinical efficacy chapter of the "assessment reports." Only the clinical data reported in the afore-

| Study selection
All kinds of clinical studies reported in the main studies section of EPARs were eligible, since the aim of this study was to describe the evidence supporting new cancer indications approved by EMA. For indications without RCTs at the time of first approval, only postapproval RCTs were selected.

| Data extraction
Extraction of data from assessment reports and full-text articles was performed by two independent reviewers (AF and FM) and data were inserted into a standardized database form (Excel; Microsoft). In case of inconsistencies, the data were double checked by both researchers and disagreements were solved by consensus.
For each indication meeting the inclusion and exclusion criteria, the following data were collected: complete indications, their classification as a solid or hematological tumor indication, and subsequent general classification based on the type of tumor (e.g., breast cancer, prostate cancer, lymphoma, etc.); presence of orphan drug conditions or conditional approval. For studies reported as "main studies" in the clinical efficacy section of "assessment reports" the following data were collected: number; identification code; randomization and control (yes/no); blinding (yes/no); phase; number of patients enrolled; ongoing status (yes/no); narrative description of control if applicable; OS as a primary or co-primary endpoint (yes/no); non-OS as a primary or co-primary endpoint (yes/no); OS as a secondary endpoint (yes/no); non-OS as a secondary endpoint (yes/no); presence of the study as a main study for other drugs (e.g., in case of combination therapy).

| Quantitative data synthesis and statistical analyses
Descriptive statistics of the drugs and indications included in the study were performed using Microsoft Excel. In case multiple studies were reported for a single indication, the strongest/cumulative variable was also extracted, and, in such case, the indication was considered to be supported by RCT if at least one RCT was present. The same criterion was adopted for blinded RCTs, RCTs having OS as a primary endpoint, and blinded RCTs having OS as a primary endpoint. Similarly, the highest study phase was extracted. The status of studies was reported as ongoing if at least one ongoing study was present. Survival was classified either in overall (OS) or all other parameters to measure survival, collectively defined as non-OS.
The present analysis did not differentiate between the types of treatment (e.g., neo-adjuvant or adjuvant) or between the stages (e.g., early or metastatic) or variants of the disease.
For post-approval studies searched through literature and SPC, the following items were collected: time from approval to current search; presence/absence of RCT in the target indication; presence or absence of OS as a primary endpoint.
The risk of bias in individual studies was not evaluated. A descriptive statistic of the collected data was performed, using mean or median where appropriate to summarize continuous parameters and percentage frequency for categorical parameters.

| Indications included in the study
Overall, of the 257 products with ATC L classification, 93 were included, for a total of 199 indications and 228 studies (Figure 1).
Reasons for exclusion were a date beyond predetermined limits, non-innovative products (biosimilars, generics, established use) or products not indicated for cancer. For one product only, the EPAR report was not available and it was excluded from the analysis.

| Products characteristics
The 93 products included in the study had overall 199 indications

| Indications characteristics
Of the 199 approvals in the past decade included in our analysis, 68% referred to solid tumors and 32% to hematological tumors. The most frequent diagnoses were lung cancer, leukemia, skin cancer, lymphoma, breast cancer, multiple myeloma, renal cell carcinoma, prostate cancer, and colorectal cancer (

| Key results
Overall, 159 (80%) of the 199 approved indications were supported by at least one RCT, 63 (32%) by at least one RCT having OS as the primary or co-primary endpoint, 74 (37%) by at least one blinded RCT, 30 (15%) by at least one blinded RCT having OS as the primary or co-primary endpoint. Solid tumors and hematological tumors categories presented important differences in the frequency of RCTs (85.2% vs. 67.2%, respectively) and RCTs having OS as a primary or co-primary endpoint (42.2% vs. 9.4%, respectively; Figure 2A,B). All values decreased further when study blinding and blinding with OS as primary or co-primary endpoint were considered.
The nine most frequent indications were supported by at least one RCT with different frequencies, ranging from 53% for lymphoma, to 100% for breast cancer, renal cell carcinoma, prostate cancer, and colorectal cancer. Indications approved by at least one RCT having OS as a primary or co-primary endpoint ranged from 0% for lymphoma, multiple myeloma, and renal cell carcinoma, to 67% for colorectal cancer ( Figure 3A,B).  cases, and a post-approval RCT having OS as a primary endpoint in 5% of cases (Table 2).

| DISCUSS ION
Our systematic review and quantitative synthesis comprising data from the EMA database shows that overall 79.9% of new approvals were supported by at least 1 RCT reported as the main study.
This value dropped to 31.7% when considering RCTs with OS as the primary or co-primary endpoint. However, in most cases, OS was included among the secondary endpoints. We observed relevant differences when considering solid and hematologic cancer as two and 19 RCTs (49%) were judged to be at high risk of bias for their primary outcome. Trials that evaluated OS were at a lower risk of bias than those that evaluated surrogate outcomes. 4 The different result  Our results show that about one of five approved drugs was based on uncontrolled studies and that only 25% of these cases had at least one randomized study after an average time of about 3.6 years from approval. This represents a major concern, given that RCTs still represent the highest level of evidence-based medicine, and that RCTs are the gold standard when the aim of the research is to evaluate the intended effect of an intervention. 23 brilliantly discussed by Naci et al., who proposed a set of five principles to promote the production of high-quality CE to support decision making, briefly: head-to-head comparisons to be routinely reported on product labels; more selective use of expedited programs, including well-designed evidence-generation plans to be conducted in the post-marketing period; more routine use of active-comparator RCTs; network meta-analyses to be performed within each therapeutic area, and higher harmonization in the methods of registration studies; CE data to be a crucial factor in pricing and payment decisions. 29 The agenda for further research to clarify the open points is related to the limitations of our study, in particular, it will be useful to systematically evaluate the magnitude of the clinical effect in relation to the specific conditions and the risk of bias of the individual studies. In addition, we propose further standardization and harmonization in reporting study results by EMA. In fact, this research was conducted by manually consulting the specific EPAR product pages and extracting the searched data. This takes a long time and creates potential sources of human error. Therefore, an aggregated database would be useful to further foster research in this field and make clinical evidence supporting drug use in the EU even more transparent.