## 1 Introduction

Bringing a drug to market is a long, expensive process [1] requiring multiple clinical trials of various sizes. Generally, clinical trials can be divided into ‘screening’ and ‘confirmatory’ studies. The purpose of screening trials is to gain information about a treatment, such as its side-effects or efficacy, and to decide whether further investigation is warranted. The purpose of a confirmatory trial is to prove to an independent observer that the treatment should be licensed. A question of interest is whether changing the way screening trials are conducted could improve the efficiency of the drug development process as a whole.

The problem of designing screening trials of a single new treatment to maximize the efficiency of a drug development program is discussed by Stallard [2]. The problem is considered from the view of a large funder of clinical trials, for whom many treatments are available for testing. Each treatment is tested in a screening trial and, if the screening trial is successful, a confirmatory trial. Treatments are tested sequentially until one succeeds at the confirmatory trial. When the efficiency of the overall drug development process is considered, it has been found that conducting small screening trials on a larger number of treatments is efficient [3-5].

In this article, we extend methods in Stallard [2] to investigate how screening trials that test multiple new treatments should be designed. Whitehead [3] recommends that if there are a limited number of patients and a number of treatments, allocating a small number of patients to a large number of treatments will generally increase the probability of finding a successful treatment. We investigate this idea further and explore optimal design of screening trials to minimize the expected number of patients recruited before a successful treatment is confirmed. In this article, we will also explore how multi-arm screening trials should be designed, including the important question of how many arms should be included in multi-arm screening trials, and multi-arm multi-stage (MAMS) confirmatory trials to maximize the efficiency of the drug development process as a whole.

MAMS trials are ones in which multiple experimental treatments are tested within the same trial with a common control group. Efficiency is gained from the shared control group as well as from interim analyses allowing early dropping of ineffective arms. There are several examples of real MAMS trials, including the Medical Research Council Systemic Therapy in Advancing or Metastatic Prostate cancer: Evaluation of Drug Efficacy trial (STAMPEDE) [6], which uses the methodology described in Royston *et al.* [7], and the TelmisArtan and InsuLin Resistance in HIV (TAILoR) trial, using the methodology described in Magirr *et al.* [8]. In addition, there is a large literature on adaptive trials that start with multiple treatments and select treatments to continue with at an interim analysis [9-15]. Unlike these designs, we consider separate trials for the selection stage and the confirmatory stage, with focus on designing the selection trial to optimize the drug development process as a whole.

We investigate two types of screening design for comparing multiple treatments: (1) the *top-treatment design*, in which if the test statistic of the most successful treatment at the screening stage is above a threshold, then that treatment passes on to a confirmatory trial in which it is tested against a control treatment; and (2) the *all-interesting-treatments design*, in which all treatments with test statistics above a threshold go on to a multi-arm confirmatory trial. We consider the optimal design of the screening trial in terms of the number of treatments simultaneously tested, the sample size per treatment, and the test statistic threshold for which a confirmatory trial takes place. We also consider how the optimal screening design differs when the confirmatory trial is multi-stage (allowing early stopping for futility or efficacy).

As discussed further in Section 2, the setup used in this article assumes that an inexhaustible number of treatments is available, and that each treatment has an independent treatment effect. A single pharmaceutical company would rarely have several distinct treatments available for testing for the same indication. However, a publicly funded trial such as the STAMPEDE trial [6] may be in a position to compare distinct treatments from several companies. In addition, there are other scenarios where the setup used in this article is applicable. Firstly, the Cocaine Rapid Efficacy Screening Trial (CREST), described in Leiderman *et al*. [16] is a highly relevant example that we look at in Section 4. Secondly, the ‘treatments’ may represent different doses of the same treatment or different treatment combinations. In both cases, the number of treatments available for testing would be much higher than the number of distinct drugs. In the former case, the treatment effects would generally be correlated because of a dose–response relationship; however, in some cases a monotone dose–response relationship may be thought to be implausible, such as in the TAILoR trial (or in cases where higher doses may be less well tolerated no dose–response relationship may exist at all). In these cases, the assumption of independence of treatment effects may be realistic. In the latter case, there may be some correlation between treatment combinations containing the same treatment, but again an assumption of independent treatment effects would likely be realistic. An example where there are a large number of possible treatment combinations available for testing is in tuberculosis [17]. A fourth scenario where these assumptions may be realistic is in non-drug trials, where there may be virtually unlimited potential policy interventions awaiting testing. In all these scenarios, the results described in this article would be useful.