The threshold model revisited

Abstract Background The threshold model represents one of the most significant advances in the field of medical decision‐making, yet it often does not apply to the most common class of clinical problems, which include health outcomes as a part of definition of disease. In addition, the original threshold model did not take a decision‐maker's values and preferences explicitly into account. Methods We reformulated the threshold model by (1) applying it to those clinical scenarios, which define disease according to outcomes that treatment is designed to affect, (2) taking into account a decision‐maker's values. Results We showed that when outcomes (eg, morbidity) are integral part of definition of disease, the classic threshold model does not apply (as this leads to double counting of outcomes in the probabilities and utilities branches of the model). To avoid double counting, the model can be appropriately analysed by assuming diagnosis is certain (P = 1). This results in deriving a different threshold—the threshold for outcome of disease (M t ) instead of threshold for probability of disease (P t ) above which benefits of treatment outweigh its harms. We found that M t ≤ P t, which may explain differences between normative models and actual behaviour in practice. When a decision‐maker values outcomes related to benefit and harms differently, the new threshold model generates decision thresholds that could be descriptively more accurate. Conclusions Calculation of the threshold depends on careful disease versus utility definitions and a decision‐maker's values and preferences.

treatment has typically been considered implausibly different from the actual or descriptive decision-making thresholds seen in practice. 2 One possible reason for the difference between behaviour and the threshold model guidance is that people often violate EUT that served as a theoretical framework for derivation of the original threshold model. 3,4 To address this discrepancy between normative and observed behaviour, subsequent threshold models have been formulated from non-EUT perspective such as regret, dual processing, and hybrid theoretical stances. 2 Additional explanations for differences between normative and descriptive estimates generated by the threshold models may, however, exist: (1) descriptive inaccuracy may be a consequence of misapplication of the original threshold model-originally described when diagnosis is not certain when the patient was first seen-to clinical situations when treatment is offered to patients with prior confirmed disease to prevent subsequent events, or disease is defined by the very outcomes treatment is designed to prevent; (2) the original threshold model and subsequent reformulation of the threshold model may not have taken decisionmakers' values and preferences (V&P) explicitly into account. [5][6][7][8] In this paper, we revisit the EUT threshold model by taking into account these new considerations to modify calculation of the treatment action thresholds. We focus on clinical situation when no further diagnostic tests are available to a decision-maker to decrease diagnostic uncertainty, 1 although, as argued below, similar considerations can be applied to clinical settings when such a diagnostic test is available. 9 We also discuss some common pitfalls in the interpretation of the threshold model aiming to provide needed clarifications for the widerclinical applications of this simple but powerful model. In what follows, the terms such as disease events, outcomes, (dis)-utilities, and risks of morbidity are often used interchangeably (see Appendix for definition).

| A brief overview of the threshold model
The threshold model ( Figure 1) was originally developed to provide an answer to the question: "At which probability of disease (P) are we indifferent between expected utility (EU) of administering treatment versus not? 1 ": if P is above the threshold probability (P t ), a decision-maker (physician or patient) should choose treatment (Rx), otherwise not (NoRx).
In the classic Pauker & Kassirer (P&K) model 1 (Figure 1), the expected utilities of each decision are given as follows: Pauker and Kassirer 1 defined the net benefit of treatment as B = U 1 − U 3 , ie, the difference in the utility of the outcomes if the diseased patient were treated or were not treated and net harms as H = U 4 − U 2 , ie, the difference in the utility of the outcomes if a patient without the disease were not treated versus treated. Solving the tree, the threshold probability (P t ) above which we should treat is calculated as follows: 2.2 | Assessing if the probabilities and outcomes (utilities) in the threshold model are independent of each other Figure 1A shows that key structural ingredients of the threshold decision model compose of the probability of disease (P) and consequences of our decisions expressed via health outcomes or our values that we assign to these outcomes (utilities). 1,9 Each branch in the model is characterized by its probability and outcome (utility, U) with which it is associated. 1,9 Importantly, probabilities and utilities are assumed to be independent of each other 10  This creates a problem of double counting and violates independence

| Two thresholds
The preceding discussion indicates that the threshold model can generate two thresholds answering two questions: (1) when the probability of disease and probability of outcomes are independent of each other (classic threshold model) and (2) when the probability of disease and probability of outcomes are not independent of each other (an alternative model presented in this paper).
The classic threshold equation 1,9 (Equation 1) provides the answer to "At which probability of diagnosis/disease should we administer a treatment with given benefit and harms?" This equation should be used when the probability of disease and probability of outcomes are independent of each other ( Figure 1A). The model Figure 1A ( Equation 1) provides the answer without taking test results into consideration.
When the risk of disease and the risk of outcomes are not independent, one way to avoid double counting is to simply set probability of disease to 1. Thus, from modelling perspective, we assume that disease is certain. In the previous examples, prior DVT establishes a predisposition to recurrence, and this predisposition constitutes the "disease." After a certain age, we all have some degree of coronary artery disease, and are, thus, at risk of coronary events. This also agrees with clinical and biological logic as the patient cannot develop disease outcome unless he/she has diagnosis or a condition, which causes given outcomes. At the same time, we can never be sure which of the individual patients will have disease outcome or not 13 ; that is, the sole presence of the condition or predisposition-which one might call the "disease"-does not indicate certainty that outcome will occur 13 ( Figure 1B).
Thus, in the situations when probabilities and outcomes are not independent, we can calculate the threshold that relates to the question: "For which values of parameters of both probabilities and utilities should we administer a treatment when we are certain in diagnosis/disease?" This will occur under generic definitions of net benefits and harms when B ≥ 0. Expressing it in specific terms, when morbidity due to disease (M) is used to define disutilities, the solution of the model shown in Figure 1B generates the following simple equation (see below and Appendix): where M t is the threshold for morbidity (outcome or event) in the absence of treatment and above which treatment should be given and below which should not be given; H rx -treatment-related harms; RRR-efficacy (relative risk reduction) of treatment. Note that the model Figure 1B (Equation 2) assumes that the test that defines disease (the disease being a predisposition to subsequent morbid or mortal events) has already been performed (see Clinical Application below).
Therefore, we can use a decision model shown in Figure 1

| Integration of a decision-maker's preferences into the threshold model
The models outlined above assume that a decision-maker weighs equally outcomes related to benefits and harms of treatments. Clinically, this is often not true. In the Clinical Application section, we will illustrate the calculation of the thresholds when a decision-maker (eg, patient) weighs benefits and harms of treatments equally and differently.
To illustrate the role of patient's preference, we will express generic net benefits and harms through disutilities related to morbidity or mortality of treatment (M). We will also assume that most medical interventions express constant (relative) effects over the range of predicted absolute risk (often termed the "baseline risk") and depend- We also introduce a variable RV H to represent patient's (or decision makers') preferences expressed as relative values of harm of treatment with respect to the consequences of disease outcome M (when outcomes are equally valued, this is set at 1). If we now solve generic Equation 1 (applicable to the situations when probabilities and utilities are independent), using these specific definitions of benefits and harms (see Appendix for details), we obtain the following equation under EUT for the patient's threshold: This equation gives the threshold for the probability of diagnosis of disease at which the rational patient with preferences expressed as RV H will be indifferent between accepting treatment versus not; that is, the patient will use treatment if the estimated probability of disease, P > Pt.
For decision when probabilities and outcomes are not independent of each other, as in Equation 2, we substitute P = 1 in the expressions above and solve for the parameter of interest. For example, we should administer treatment if risk of mortality or morbidity (M) without treatment is larger than the threshold: Note that it is also impossible to know the value of Hrx and RRR in any individual patient as these events in each case will occur in the future. Hence, a decision in individual patients has to be based on the group (trial) data, ideally using multivariable risk prediction models. 16 Note also that it is always the case that M t ≤ P t , which is likely one of the reasons why actual decision-making behavior has been observed to differ from the postulated normative behaviour (see Discussion).

| APPLICATIONS
We illustrate the application of a revised threshold model to a common medical problem: recurrent VTE (rVTE). That is, a disease of interest is VTE, which is associated with substantial morbidity in terms of recurrent rethromboembolic outcomes or events (thus, percentage of the recurrent rethromboembolic outcomes are morbidities or disutilities used in our model). Note, however, that diagnosis of rVTE is made only after imaging shows the presence of new clot in deep veins or lungs. 17 Another way to express this is that the "disease" is the predisposition to recurrent DVT that exists when a patient has experienced a prior DVT. Thus, in this case (and in the vast class of diagnostic problems), if one considers diagnosis is equal to VTE event (rather than the predisposition for having the event), the result would be double counting. As previously discussed, to avoid it, we set P = 1 and calculate threshold for rVTE outcome (see Equations 2 and 4).
We will analyse two clinical problems:

| Data
The EINSTEIN investigators tested efficacy of rivaroxaban (drug that belongs to class of direct oral anticoagulants [DOAC]) versus placebo for secondary prevention of VTE. 25 The decision regarding treatment was made after patients who were eligible for the trial had objectively confirmed, symptomatic DVT or PE using ultrasound, or lung imaging 25 ; that is, the patients with confirmed symptomatic DVT or PE who had been treated for 6 or 12 months with a vitamin K antagonist or rivaroxaban were then randomly assigned to receive continued treatment with rivaroxaban or placebo. 25 The primary efficacy outcome was symptomatic, rVTE, defined as the composite of DVT or nonfatal or fatal PE. 25 We model the situation when no further testing is possible and do not model clinical suspicion whether test should be done-the latter is a purview of the classic threshold model. 1,9 The EINSTEIN investigators found that over 6 to 12 months treatment duration, in the placebo group, the proportion of patients These, however, implausibly high thresholds are not observed in practice and are not recommended by guidelines panels. For the reasons discussed in this paper, this answer is also not normatively correct 12 because "disease" was defined as the outcome of VTE rather than as the predisposition to rVTE created by the first VTE. 25 Normatively accurate answer could be obtained if we considered the outcome of mortality rather than rVTE. For example, the EINSTEIN investigators reported death due to VTE and bleeding of 1/602 (0.2%) and 0/602 (0%), respectively. 25 Using these values to calculate the threshold probability for diagnosis of VTE (T dxVTE ) above which we should give rivaroxaban (vs placebo), we obtain the following: which is normatively but not descriptively correct (as no patients would choose to use rivaroxaban to prevent death when the probability of dying without rivaroxaban over the relevant time period was zero). [19][20][21][22] 2. Calculation of the probability of recurrence of VTE (M) at which we should administer rivaroxaban: If there is dependence between probability and outcomes indicating that we are certain in diagnosis/disease (ie, we define diagnosis/disease as the predisposition to recurrence created by the first event), the threshold for the value of morbidity or mortality is given by formula (4) Figure 2 illustrates M t as a function of bleeding risk.  (Figure 3). 19-22

| DISCUSSION
The threshold model is probably one of the most important advances in the field of medical decision-making, 1,9 which links evidence (which exists on the continuum of credibility) with decision-making (which is a categorical exercise-we decide to act or not to act). 2 The model is not, however, widely used by clinicians at bedside even though it addresses extremely common classes of the medical problems and despite the fact that nomograms to facilitate its use were published more than two decades ago. 28 The model has also not attracted deserved attention even if it is likely that physicians act according to their different thresholds, which, at least in part, can probably explain the tremendous variation observed in today's practice of medicine. 2,29 One of the common explanations provided for the lack of popularity of the threshold models among practitioners is that the original models were based on EUT, which is known to be widely violated both by lay people and physicians 2,3,5 ; that is, physicians do not act at the EUT prescribed thresholds, but rather below or above it, 5,[30][31][32] depending on the context and theoretical approach, which can better explain the observed behaviour. 3,4 This has prompted reformulation of the threshold model using different theoretical frameworks such as regret and dual-processing theory model. 2,3,5,30,33,34 Indeed, some empirical evidence shows that thresholds at which physicians act are more consistent with regret and dual-processing theory model than with the EUT model. 5,35 Our second explanation of differences between normative and descriptive findings is that the original threshold model did not take a decision-maker's V&P explicitly into account. 6 However, when the effect of V&P is incorporated in the threshold model, it actually can be descriptively correct (Figure 3). The previous criticism of the classic threshold model revolved about descriptively unrealistic low thresholds prompting reformulation of the threshold equations from non-EUT theoretical frameworks. 2,31,34,35 However, we found that M t ≤ p t , ie, even lower than the thresholds based on the original threshold model. Yet, in the context of our VTE example, taking V&P into account align the threshold model quite well with the current practice guidelines, which recommend administration of treatment when the VTE recurrence risk (at 5 years) exceeds 30%, 15%, and 3% for high, intermediate, and low risk of VTE recurrence, respectively. [19][20][21][22] This makes descriptive sense because when physicians and patients perceive that diagnosis is certain, they are more inclined to act than when diagnosis is not certain. As clear as these recommendations about the threshold are, typically guidelines panels have not actually explicitly taken patients' V&P into calculations of the thresholds-although there is at least one notable exception, the ninth iteration of the American College of Chest Physicians Antithrombotic Guidelines (which conducted a systematic review of V&P for antithrombotic therapy 36 and specified an equal importance to serious bleeding rVTE). 37 Nevertheless, despite typically neglecting to show how thresholds are calculated, guidelines panels do routinely make such recommendations. [19][20][21][22] Thus, under some considerations, EUT appears to be descriptively realistic. In this context, Felder and Mayrhofer 38 argued the descriptive power of EUT can be further augmented if treatment effects are modelled directly on the probability of disease 14,15 not in utilities as in the original threshold model. 1,9 In a separate (forthcoming) paper, we showed that regardless how the effect of treatment is modelled, the model yields identical results under EUT but not under regret theory. [30][31][32] However, we also consider the question which decision theory (EUT vs non-EUT) is more descriptively accurate an empirical question. 5 We have recently proposed that "one size does not fit all," ie, there cannot be one theory of rationality that can meet all our needs in all contexts. 4 Hence, we should abandon debate if the EUT is superior to the non-EUT or vice versa; rather, we should define the circumstances when application of one theory is more suitable to use than the other.  Figure 1A and text for details) Finally, many consider inadequate decision-making as one of the major culprits for today's suboptimal patient outcomes [39][40][41][42] to the point that some decision scientists have suggested that personal decisions are the leading cause of death. 43 Promoting the threshold model, which was introduced more than 40 years ago 1 and which offered a simple but powerful yet neglected tool, may considerably improve inadequate decision-making often seen in todays' practice. 43 As illustrated in this paper, however, clinicians and guideline developers have to ensure they appropriately apply the threshold model.
The problems we discussed in this paper have arisen from misunderstanding one of the key aspects of P&K 1 model to which we draw attention in this article: The threshold calculation depends on the definition of disease. 6 We demonstrate here that the key to proper appli-  (number of patients who need to be exposed to treatment in order for one patient to be harmed) (NNT/NNH), illustrates this distinction. 6 Obviously, calculation of the threshold depends on the accuracy of the parameters that are used to populate the model. Ideally, the data for calculation of the threshold should be based on a well-done systematic reviews/meta-analyses; this is where decision analysis meets evidence-based medicine. 6 Here, it is important to note that even though our model is meant to help individualize treatment decisions, ultimately data to populate the model are based on average, group estimates. Indeed, risk is a group phenomenon and is knowable and accurately measured as a population-based measure. 13,44 We can never say with perfect certainty which individual patient will develop the outcome of interest 13,44 ; that is, risk in any individual patient remains ultimately unknowable. 44 However, we simply do not have better way to individualize our treatments but to rely on the risk information from the groups. 13,44,45 In fact, one can argue that the entire goal of personalized and precision medicine is to reliably reduce the population to smaller groups, in which risk can still be assessed with high accuracy that may be better applicable to individuals. 44 From the perspective of this paper, more systematic assessment In the classic P&K model 1 (Figure 1), the expected utilities of each decision are given as follows: Following Pauker and Kassirer, 1 we define the net benefit of treatment as B = U 1 − U 3 , ie, the difference in the utility of the outcomes if the diseased patient was treated or was not treated and net harms as H = U 4 − U 2 , ie, the difference in the utility of the outcomes if a patient without the disease was not treated versus treated.
Solving the tree, the threshold probability (P t ) above which we should treat is calculated as follows: (1)

Equation 1 shows a generic version of the threshold equation
based on generic definitions of net benefits and harms as per above.
To obtain, specific version of the threshold equation, we substitute net benefits and harms through disutilities related to morbidity or mortality (M) associated with treating versus not treating (or using one treatment vs another). We distinguish M-morbidity or mortality that occur in the absence of treatment-and Mrx (morbidity/mortality that occur while on the treatment); we also define Hrx as harms that occur due to treatment.
We also assumed that most medical interventions have constant (relative) effects over the range of predicted absolute risk and are conveniently modelled in decision analyses as RR or RRR = 1 − RR. 14,15 This allows intuitive interpretation of treatment effect: P · (1 − RRR); RRR = 1 means that the occurrence of outcome of interest is completely preventable (as P [ 1 -RRR] = 0), whereas RRR = 0 means that treatment does not affect underlying risk (P · [1 -RRR] = P). 14, 15 We also introduce a variable RV H to represent patient's (or decision makers') preferences expressed as relative value of avoiding harms of treatment, H rx with respect to avoiding disease outcome, M.
Thus, we express (dis)utilities in the following way:

Treatment1 versus treatment2
If we want to calculate the threshold at which we want to select one treatment over another, we define utilities somewhat differently 6 : Note that we assume that the patient will value harms of treatment relative to morbidity of disease (RV H ) in the same way regardless if it was caused by treatment 1 versus treatment 2 (it is, of course, possible to assume different RV H , but we consider this outside of a scope of this paper).
Solving these equations for the threshold P t (ie, assuming that the probability of disease and utilities are independent), we get If the assumption of independence between utilities and probabilities is violated, similar to above, we obtain *Mathematically a more correct representation of this utility has the following form: but we assumed that simultaneous occurring of effect of disease and harms of treatment is clinically rare occurrence and is also mathematically negligible, which is the reason we did not include this product in the definition.