## 1. Introduction

Many fields rely on the elicitation of preferences. However, direct questioning methods, such as Likert scales, suffer from well-established drawbacks due to subjectivity (for a summary see Paulhus, 1991). Discrete choice—for example, choosing a single preferred product from a set of presented options—provides more reliable and valid measurement of preference in areas such as health care (Ryan & Farrar, 2000; Szeinbach, Barnes, McGhan, Murawski, & Corey, 1999), personality measurement (Lee, Soutar, & Louviere, 2008), and marketing (Mueller, Lockshin, & Louviere, 2010). More efficient and richer discrete-choice elicitation is provided by best–worst scaling, where respondents select both the best option and worst option from a set of alternatives. For example, a respondent presented with six bottles of wine might be asked to report her most and least preferred bottles. Data collection using best–worst scaling has been increasingly used, particularly in studying consumer preference for goods or services (Collins & Rose, 2011; Flynn, Louviere, Peters, & Coast, 2007, 2008; Lee et al., 2008; Louviere & Flynn, 2010; Louviere & Islam, 2008; Marley & Pihlens, 2012; Szeinbach et al., 1999).

In applied fields, best–worst data are often analyzed using conditional logit (also called multinomial logit, MNL) models; the basic model is also known in cognitive science as the Luce choice model (Luce, 1959). These models assume that each option has a utility (*u*, also called “valence” or “preference strength”) and that choice probabilities are simple (logit) functions of those utilities (Finn & Louviere, 1992; Marley & Pihlens, 2012).1 MNL models provide compact descriptions of data and can be interpreted in terms of (random) utility maximization but afford limited insight into the cognitive processes underpinning the choices made. MNL models also do not address choice response time,2 a measure that has become easy to obtain since data collection was computerized.

We explore the application of modern, comprehensive evidence accumulation models—typically employed to explain simple perceptual choice tasks—to the complex decisions involved in best–worst choice between multi-attribute options. Our application bridges a divide between relatively independent advances in theoretical cognitive science (computational models of cognitions underlying simple decisions) and applied psychology (best–worst scaling to elicit maximal preference information from respondents). The result is the best of both worlds: a more detailed understanding of the cognitions underlying preference but without loss in the statistical properties of measurement and estimation.

As summarized in the next section, previous work in this direction has been hampered by computational and statistical limitations. We show that these issues can be overcome using the recently developed linear ballistic accumulator (LBA: Brown & Heathcote, 2008) model. We do so by applying mathematically tractable LBA-based models to two best–worst scaling data sets: one involving patient preferences for dermatology appointments (Coast et al., 2006), and another involving preference for aspects of mobile phones (Marley & Pihlens, 2012). In these applications, chosen to demonstrate the applicability of our methodology to diverse fields and measurement tasks, we show that previously published MNL utility estimates are almost exactly linearly related to the logarithms of the estimated rates of evidence accumulation in the LBA model; this is the relation that might be expected from the role of the corresponding measures in the two types of models. We follow this demonstration with an application to a perceptual judgment task that uses the best–worst response procedure, with precise response time measurements, to demonstrate the benefit of response time information in understanding the extra decision processes involved in best–worst choice.

In the first section of this article, we describe evidence accumulation models, and the LBA model in particular. We then develop three LBA-based models for best–worst choice that are motivated by assumptions paralleling those previously used in corresponding random utility models of best–worst choice. We show that those LBA models describe the best–worst choice probabilities at least as well as the random utility models. However, in the second section, we show that the three earlier LBA models are falsified by the response time data from the best–worst perceptual task. We then modify one of those LBA models to account for all features of the response time, and hence the choice, data. We conclude that response time data further our understanding of the decision processes in best–worst choice tasks, and that the LBA models remedy a problem with the random utility models, by providing a plausible cognitive interpretation.