Model‐based and model‐free mechanisms in methamphetamine use disorder

Abstract People with methamphetamine use disorder (MUD) struggle to shift their behaviour from methamphetamine‐orientated habits to goal‐oriented choices. The model‐based/model‐free framework is well suited to understand this difficulty by unpacking the computational mechanisms that support experienced‐based (model‐free) and goal‐directed (model‐based) choices. We aimed to examine whether 1) participants with MUD differed from controls on behavioural proxies and/or computational mechanisms of model‐based/model‐free choices; 2) model‐based/model‐free decision‐making correlated with MUD symptoms; and 3) model‐based/model‐free deficits improved over six weeks in the group with MUD. Participants with MUD and controls with similar age, IQ and socioeconomic status completed the Two‐Step Task at treatment commencement (MUD n = 30, Controls n = 31) and six weeks later (MUD n = 23, Controls n = 26). We examined behavioural proxies of model‐based/model‐free decisions using mixed logistic regression, and their underlying mechanisms using computational modelling. At a behavioural level, participants with MUD were more likely to switch their choices following rewarded actions, although this pattern improved at follow up. At a computational level, groups were similar in their use of model‐based mechanisms, but participants with MUD were less likely to apply model‐free mechanisms and less likely to repeat rewarded actions. We did not find evidence that individual differences in model‐based or model‐free parameters were associated with greater severity of methamphetamine dependence, nor did we find that group differences in computational parameters changed between baseline and follow‐up assessment. Decision‐making challenges in people with MUD are likely related to difficulties in pursuing choices previously associated with positive outcomes.


| INTRODUCTION
Methamphetamine use disorder (MUD) is a global health challenge 1 that continues to increase in prevalence, [2][3][4] non-fatal harms, 5,6 and mortality. 2,7,8People with MUD struggle to shift their behaviour from entrenched methamphetamine-orientated actions to actions that align with long-term goals. 9,10Computational decision-making frameworks are well suited to unpack this conflict, as they allow the identification of specific mechanisms that drive maladaptive decisions. 11,12In particular, the model-based/model-free framework of decision-making 13 investigates whether people make decisions by considering future goals (i.e., recovery-orientated actions) or simply repeating previously rewarding actions (i.e., methamphetamine use).
The model-based/model-free framework 13 theorises that there are two decision-making strategies.The first is the model-based system, which learns 1) how environmental 'states' link to one another in a cognitive map and 2) the value of different actions in each state. 14is system allows prospective consideration of how specific actions lead to particular outcomes, and thus enables behavioural adaptations when outcome values change (i.e., shift away from methamphetamine use when it starts to harm). 11Conversely, in the model-free system decisions are informed only by the value of past actions, without taking into account how aspects of the environmental may prospectively link to one another. 14As such, a model-free approach does not allow a nuanced understanding of how a decision-making environment is structured but instead enables quick and computationally efficient decisions. 13,15,16ly two studies have investigated model-based/model-free decision-making in MUD.The first investigated participants in residential rehabilitation, 17 who had been abstinent for up to one year.
The main model-based/model-free outcome measure was a weighting parameter between the two systems (ω parameter).Results showed that participants with MUD were biased towards model-free over model-based decision-making, relative to controls.However, a limitation of this approach is that it could not ascertain the integrity of each model-free and model-based system.Furthermore, many control participants in this study were drawn from a different cultural demographic, relative to the group with MUD.The second study was a rodent model, 18 wherein rats were tested prior to methamphetamine administration, and again after five days of abstinence.The main outcomes were two individual parameters of model-based (β MB ) and model-free learning (β MB ), which allowed an assessment of the overall strength of these systems, rather than simply the ratio of each system being used compared with one another.Similar to the human study, methamphetamine use was associated with reduced model-based decision-making.However, methamphetamine-exposed rats also had deficits in model-free decision-making, driven by difficulties in switching behaviour after unrewarded trials.Furthermore, greater modelfree deficits prior to methamphetamine administration led to more frequent methamphetamine use.Together, these two studies highlight the importance of quantifying the strength of both model-based and model-free mechanisms. 18o other key aspects of model-based/model-free decisionmaking remain unexplored in MUD.The first is whether model-based/ model-free outcomes correlate with patterns of methamphetamine use and MUD symptomology.For example, functional imaging studies have shown that weaker model-based and stronger model-free brain activations during decision-making can predict increased severity of alcohol dependence in young men. 19The second consideration is whether people with MUD can show improvement in model-based/ model-free deficits across recovery.Understanding longitudinal changes (or lack of thereof) may allow a finer understanding of the link between decision-making and clinical prognosis. 20re, we examined the decision-making profile of people with MUD by comparing their model-based and model-free behaviour and computational parameters with a group of well-matched controls.We used an exploratory approach to investigate three specific aims: 1) to identify if there were differences in model-based/model-free decision-making between people with MUD who had recently engaged with treatment and controls with similar sociodemographic

| Design and aims
We compared model-based/model-free decision-making in participants with MUD and controls at two timepoints: upon first contact with the research team during treatment and six weeks afterwards.
We chose follow-up at six weeks as the first months of recovery in MUD are associated with the greatest risk of relapse. 21,22

| Participants
Thirty participants with MUD (M Age = 34.07,SD = 9.42; 24 males) were compared with 31 controls without current substance use or a history of substance use disorders (M Age = 31.36,SD = 8.53; 24 males).Participants with MUD were recruited from public and private drug and alcohol treatment organisations across Melbourne, Australia.Participants with MUD were eligible if they had 1) recently engaged in treatment; 2) reported methamphetamine as their primary drug of concern; and 3) used methamphetamines at least weekly in the month before engaging in treatment.Table 1 summarises methamphetamine use patterns, severity of dependence and treatment modality.We did not exclude participants who had concurrent use of other substances (see Table S1).Participants with MUD were required to be abstinent from methamphetamine for at least 48 h and no more than six months based on self-report at study intake.Controls were recruited using online advertisements and selected to have similar sociodemographic characteristics to the group with MUD (i.e., age, gender, education, socioeconomic status, Table 1).
Nonetheless, controls reported lower levels of depression 23  included if they had used illicit substances less than 10 times in the past five years and did not have a history of substance use disorders (self-report).We also cross-validated and confirmed self-report using hair toxicology including methamphetamine, other stimulants, opioids, cannabis and cannabinoids.Exclusion criteria for both groups included a history of bipolar, schizophrenia or acquired brain injury (selfreported), and IQ estimates that could impact the understanding of the tasks (score <80 on the National Adult Reading Test [NART]). 24seline data from approximately half this sample has been reported for another decision-making task. 25enty-three ($76%) participants with MUD returned for followup.Seven did not return: two could not be contacted, two reported scheduling difficulties, two entered inpatient treatment with restricted access and one was incarcerated.Participants with MUD who returned to follow-up reported a reduction in their frequency of methamphetamine use by an average of 12.73 (SD = 11.93)days (comparing the past four weeks of methamphetamine use at followup with the past four weeks in active use prior to treatment), and only two participants had not reduced their methamphetamine use.In the control group, follow-up data was available from 26 participants ($84%).Four participants could not attend the follow-up session (two reported scheduling difficulties and two were unavailable because of COVID-19 restrictions).Furthermore, one control reported using MDMA between assessments, thus their follow-up data was excluded.Table S2 reports demographics for participants who completed both sessions.

| Procedure
The Eastern Health Human Research Ethics Committee approved the study (E52/1213).Assessments occurred at treatment facilities or nearby community libraries.Participants were initially screened for eligibility and then underwent a standardised assessment session (1.5 h).
Participants returned for the follow-up session, six weeks later.Upon completion, participants were reimbursed with a $40 (AUD) gift-card and a report summarising their decision-making performance.

| Sociodemographic characteristics
Verbal IQ was assessed by the NART.Sociodemographic status was estimated by participant's postcodes and the Australian Bureau of Statistics, Socio-Economic Indexes for Areas tool. 26A B L E 1 Descriptive statistics of sociodemographic group comparisons and methamphetamine use characteristics in people with MUD and drug-free controls at baseline.Frequency of methamphetamine use in the four weeks before treatment was measured by the Timeline Follow Back Interview (TLFB). 27Severity of methamphetamine dependence was measured using the Severity of Dependence Scale (SDS). 28,29Lifetime patterns of substance use (i.e., average dose, years of methamphetamine use, preferred form and secondary substance use) were estimated by the Interview for Research on Addictive Behaviours (IRAB). 30

| Biological measure of substance use
We collected hair samples at follow-up from all participants to confirm accurate self-report from participants with MUD and to confirm that controls were not using any illicit substances.
These samples were analysed using gas chromatography-mass spectrometry. 31There was a strong correlation between selfreported methamphetamine use on the Timeline Follow Back and methamphetamine concentrations in hair (ng/mg; rho = 0.74, p < 0.001).

| Two-step decision-making task
This computer task requires participants to make two consecutive choices to obtain fictive in-game rewards (see Figure 1). 13 strategy more than a model-free strategy. 32As such, performance likely reflects participants' 'default' decision-making style.We provide test-retest reliability data for our main outcomes in the Supplementary Materials.

| Overview
For hypothesis testing, we used a frequentist approach with α = 0.05, incorporating Bayesian statistics where possible.For Bayesian statistics, the alternate hypothesis was that participants with MUD would differ from controls (two tailed; BF > 3 implies evidence in favour of the alternate hypothesis; BF < 1/3 implies evidence in favour of the null). 33Sample size was based on previous addiction studies of the Two-Step Task. 17,34,355.2 | Aim 1: baseline differences in model-based/ model-free decision-making between people with MUD and controls

Data cleaning
We applied previously established criteria to identify suboptimal engagement in the Two-Step Task. 36This included missing >10% of Step Task.Note: At stage 1, participants choose between two spaceships.The blue spaceship has a 70% chance to take them to the red planet (common transition) and a 30% chance to take them to the purple planet (rare transition).
The opposite probabilities apply for the green spaceship.Upon arriving on either planet, the participant then must choose between two different aliens, which have different probabilities of providing reward across time trials, responding with the same key on more than 95% of trials across both stages and having reaction times that were more than two standard deviations faster than other participants.As such, we excluded one participant with MUD who omitted responses on 19.5% of trials at the initial assessment.

Overt behaviour
Two-Step Task choices were analysed with a mixed generalised logistic regression. 13,17,36The outcome variable was 'Stay' (whether participants repeated their first stage choice, [0, 1]).We ran this logistic regression using the glmer function from the lme4 package 37 (Version 1.1-23), a logit link, and 'bobyqa' optimiser in R. 38 Post-hoc tests for significant 'Group' interactions involved calculating estimated marginal means using the emmeans package 39 (Version 1.7.5), and applying a Tukey adjustment for multiple comparisons.

Computational mechanisms of model-based/model-free decisionmaking
To understand the underlying mechanisms driving model-based/ model-free behaviour, we implemented a well-validated computational model of model-based/model-free behaviour. 40This approach has improved parameter estimation and group level differences in clinical populations, compared with previous approaches. 40In this model, every action at stage 1 (blue or green spaceship) and stage 2 (aliens on each planet) has a value, Q.Higher Qs reflect higher action values.On each trial, the Q values of these actions are updated based on feedback received at each stage (Equations (1 and (2).
Spaceship Basic Value: In these equations, t is the trial number In comparison, the model-free values are updated by applying the reinforcement learning that occurred at the second stage (i.e., Equation (2) to the spaceship that was chosen on that trial.This allows the first stage choices to acquire value independent of the transition likelihoods (Equation (4).

MF First Stage Value:
Next, the stage 2 softmax computes the probabilities for each second stage choice at the visited planet.This is based on each alien's value, and β consistency measures how consistently the participant chooses the most rewarding Q stage 2 values (Equation (6).
Stage 2 Softmax: Finally, at the end of each trial, the Q value of all unchosen spaceships and aliens is weighted by a 'forgetting' or decay function, which is the same parameter as the learning rate (1 À α).This means that the participant learns and forgets information at the same rate (Equations 7 and 8): Forgetting Unchosen Aliens: In summary, this model provides five key parameters of computational decision-making mechanisms: α, β consistency , β MB, β MF, and β rep (Table 2).We fit the parameters for each session jointly, which allowed us to determine maximum a posteriori estimates for parameters in each session by using empirical priors fit across both sessions. 41This has been shown to be a superior approach to estimating reliability for longitudinal behavioural data. 41We fit the model 500 times using 10 different starting values, and selected the bestfitting parameter values with negative log-likelihood computed by the emfit package implemented in Matlab R2020b.
We used an exponential transform to ensure model-based and model-free parameter values were positive, and alpha was inverse log-transformed to constrain it between 0 and 1 (see Table 2).We performed between-group comparisons of each of the five free parameters using Bayesian Mann-Whitney U t-tests (as the parameter distributions violated the assumption of normality).Bayesian statistics were conducted in JASP Version 0.16.2 42 using default priors.
We ensured that each participant's choices were not random by comparing each participant's model fit with that of a control model, in which choices at each stage were random.We confirmed that the main model described above had a better fit than the chance model for the majority participants using a likelihood ratio test, which showed all but two MUD participants had a better model fit than the chance model (p < 0.05).We excluded those participants from analyses involving the model parameters, and the outcomes of the overt behavioural analyses are the same regardless of whether these participants are included.

| Aim 2: relationship between computational mechanisms of model-based/model-free decisions and methamphetamine use patterns
We examined the relationship between methamphetamine use patterns and the model-based/model-free computational parameters in the initial assessment.We first calculated Spearman's rho between all variables (see Figure S2 for all correlations), followed by multiple regressions for each methamphetamine use pattern variable with the computational mechanisms as predictors.We included days since last use as a covariate for all regressions.We removed one participant from these analyses who had an extreme number of days since last use (151 days) and another participant who had an extreme average dosage score for the corresponding analysis (Dosage = 4 g.).All outcome variables were mean-centred regressions.There were moderate-to-strong relationships between the computational parameters; however, variance inflation factor estimates for each parameter within each fitted model suggested that there was little risk of multicollinearity impacting the regression coefficients (Table S3).Greater values reflect a greater use of model-free strategy.
Positive/negative values mean a bias towards/ against repetition.
Beta consistency (β consistency ) Consistency in selecting alien with the highest value.
Supplementary Tables S8 and S9 include comprehensive statistics for each group.

| DISCUSSION
We found that 1) participants with MUD showed weakened modelfree choices, 2) increased switching after reward improved during early recovery and 3) there was no evidence that people with MUD  Indeed, reduced win/stay behaviour has also been identified using other tasks of decision-making in MUD. 25,44,45This is further supported by the group differences in alpha and beta consistency values, which aligns with previous computational studies that have observed MUD-related deficits in updating outcome values, 46 as well as consistently selecting high reward actions. 45,47e behavioural analysis showed that increased switching after reward improved over time in the group with MUD.While earlier studies have identified that decision-making can improve over time during early recovery from MUD, 22,48 ours is the first to provide a specific behavioural outcome that can explain why this is occurring (i.e., greater use of rewarding feedback to guide next choice).This finding has clinical significance as it highlights that while learning adaptive rewardrelated responses can be challenging for people with MUD (e.g., attending therapy, forming new social relationships), this deficit can improve.This finding also indicates the importance of incorporating therapeutic approaches that can treat deficits in action-outcome learning processes during the first vulnerable months of early recovery. 49e lack of differences in model-based decisions between the MUD and control groups contrasts with some (but not all) 17,46 previous findings in MUD. 17 One explanation of this discrepancy is that earlier research had controls with higher intelligence and recruited from different geographical locations.In comparison, we closely matched groups for age, education, verbal IQ and socioeconomic status.This is important because model-based decision-making varies across age, 50,51 is greater in people with higher intelligence 52 and impacted by psychosocial factors. 53The fact that our control group, who were

characteristics; 2 )
to identify if specific mechanisms of model-based/ model-free decision-making correlated with patterns of methamphetamine use; and 3) to examine if people with MUD exhibit changes in model-based/model-free decision-making across the early months of recovery.
[M (SD) = 11.29 (9.32)], compared with participants with MUD [M (SD) = 18.90 (9.47), U = 228, p < 0.001, BF 10 = 19.06].Controls were At the first choice, participants decide between two spaceships (blue or green), each with a fixed 70/30% probability of taking them to the red/purple planet, and vice-versa.At the second choice, participants choose between two aliens on the planet they land on, with each alien having different likelihoods of rewarding the participant with one alien treasure.The likelihood of each aliens' reward changes independently over time, via four pre-determined Gaussian walks.These walks were counterbalanced within the MUD participants at baseline and remained the same for each participant at follow-up.The control group received counterbalancing that matched the group with MUD.Before commencing the task, participants completed a short tutorial that included instructions and 50 practice trials (10 response training trials, where they learned how to select aliens; 20 second stage alien choice practice trials, where they learnt how to select between two aliens with likelihoods of providing reward; and 20 full practice trials).They then completed 200 trials.It is important to note that this version of the Two-Step Taskthe most common version used in the addiction literaturedoes not necessarily reward a model-based Fixed predictors included Previous Win (whether participants were rewarded on the previous trial, [À1, 1]); Previous Transition (whether the transition to the planet on the last trial was rare or common, [À1, 1]); the interaction between Previous Win * Previous Transition; Group (MUD or control); and the interactions between Group and Previous Win/Previous Transition/Previous Win * Previous Transition.Random effects included Participant.A significant effect of Group indicates one group was repeating first stage choices more than another.A significant effect of Previous Win indicates model-free behaviour (e.g., a difference towards stay probability in rewarded trials, relative to non-rewarded trials) as participants are repeating their first choice because they were rewarded, regardless of transition type.Therefore, a Group * Previous Win interaction implies group differences in this behavioural proxy of modelfree decision-making.A significant Previous Win * Previous Transition interaction implies a behavioural pattern consistent with model-based decision-making (i.e., participants are considering both reward and transition type).Therefore, a Group * Previous Win * Previous Transition interaction suggests group differences in this behavioural proxy of model-based decision-making.
(1-200), c 1 /c 2 is the first/ second stage choice made on that trial (left or right selection of spaceship at stage 1 or alien at stage 2), s is the state visited at the second stage (red or purple planet) and r is the reward outcome at that stage.An individual's learning rate at each stage is determined by the alpha parameter (α), where smaller values imply faster learning from feedback.At stage 1, participants do not receive reward (they simply move to the red or purple planet; r = 0).At stage 2, the reward, r, is either 1 (reward) or 0 (no reward), which then updates the value of each option.Next, the Q values are used to update the value of model-based choices (Q MB ) and model-free choices (Q MF ) at the first stage.For the model-based values, we take the maximum value of the two second stage choices from the common transition as the value for each first stage choice (Equation (3): MB First Stage Value:

Þþr t ð4Þ
First and second stage choice probabilities are then computed by softmax functions.At stage 1, the respective values of modelbased and model-free choices are linked to parameters that measure their individual weighting on each choice (β MB and β MF ; Equation (5).I is also included as a binary parameter to indicate whether each first stage choice is a repetition of the previous first stage choice.I interacts with β repa parameter that identifies how biased participants are to repeat first stage choices, regardless of value.Stage 1 Softmax:

2. 5 . 4 |
Aim 3: longitudinal analyses of changes in model-based/model-free decision-making We examined how the behavioural proxies and mechanistic indicators of model-based/model-free decision-making changed over six weeks.We excluded follow-up data from one control participant who responded with a left action on 96.75% of choices, and the MUD participant with the very abnormal β MF value at baseline.To investigate changes in the behavioural proxies of model-based and model-free decision-making, we repeated the mixed logistic regression from the initial assessment and added a fixed effect of Time, together with its interactions with Group, Previous Win and Previous Win * Previous Transition.We included Time and Participant T A B L E 2 Description of parameters in computational model.forgetting/decay rate of value.Lower values reflect faster learning from reinforcement and faster forgetting of value in unselected options.Beta MB (β MB ) β MB [0, ∞) Weight of model-based decision-making at the first stage.Greater values reflect a greater use of modelbased strategy.Beta MF (β MF ) β MF [0, ∞) Weight of model-free decision-making at the first stage.

| RESULTS 3 . 1 |
as random effects.An interaction of Time * Group * Previous Win would indicate groups differed in how their behavioural proxy of model-free decision-making changed over time.An interaction of Time * Group * Previous Win * Previous Transition would indicate groups differed in how their behavioural proxy of model-based decision-making changed over time.To investigate any changes in computational mechanisms of model-based/model-free decision-making, we conducted 2 (Group) by 2 (Time) ANOVAs for each computational parameter.The Bayesian and frequentist analyses were conducted in JASP using default priors and estimating Bayes Factors across matched models.43    3 Aim 1: baseline differences between groups 3.1.1| Overt behaviour

Figure 2
Figure 2 presents a visual comparison of participants' behaviour, compared with pure model-free and model-based behaviour.Overall, participants with MUD were less likely to repeat actions (Group; illustrates, the group with MUD had a smaller effect for repeating rewarded actions, relative to nonrewarded actions, compared with the controls.Visually, this is seen in overlap between the confidence intervals of likelihood to repeat rewarded (OR = 3.36, 95% CI [2.19, 5.14]) and non-rewarded actions (OR = 2.11, 95% CI [1.38, 3.23] in the group with MUD but not the control group (rewarded OR = 9.88, 95% CI [6.46, 15.12] and unrewarded OR = 4.27, 95% CI [2.81, 6.48]).Post-hoc analyses also identified that the group with MUD was less likely to repeat their first stage choices after being rewarded on the previous trial (OR = 0.58, 95% CI [0.43, 0.79], p < 0.001), and after receiving no reward on the previous trial (OR = 0.70, 95% CI [0.52, 0.95], p = 0.008).There was no significant difference in the combined use of the transition type and feedback from the previous trialthe behavioural proxy of model-based decision-making (Group * Previous Win * Previous Transition; Comparison of overall behaviour at baseline and follow-up, compared with pure model-free and modelbased behaviour.Note: These figures present the observed likelihood of participants to repeat an action after reward (common, then rare transition) and non-reward (common, then rare transition).The first row presents hypothetical examples of someone who is acting in a manner that encapsulates pure model-free or model-based overt behaviour.The second and third rows present observed data from the baseline and follow-up sessions strong behavioural proxies of model-based behaviour at baseline (Previous Win * Previous Transition; OR = 1.08, 95% CI [1.00, 1.17], p = 0.056).3.1.2| Computational mechanisms of modelbased/model-free decision-making Consistent with the behavioural data, there was weak evidence of a null group difference in the model-based parameter (β MB , BF 10 = 0.57).There was moderate evidence to support group differences in the model-free parameter (β MF ; BF 10 = 3.53), whereby people with MUD showed lower model-free use compared with matched controls.There was weak evidence that controls learned/ updated more rapidly compared with people with MUD (α, BF 10 = 1.70), and moderate evidence that people with MUD showed a more consistent pattern of choosing the most subjectively rewarding stage 2 choice (β consistency , BF 10 = 3.11).Finally, there was moderate evidence that matched controls were more likely to repeat previously rewarded actions, (β rep , BF 10 = 3.25).Figure 4 presents each parameters' value between groups.

3. 2 |
Aim 2: relationship between computational mechanisms of model-based/model-free decisions and methamphetamine use patterns There was a positive relationship between recent frequency of methamphetamine use consumption and consistency of choice at stage 2 (β rep ; ρ = 0.39, p = 0.047).The regression model predicting frequency of methamphetamine use was significant (F [6, 19] = 2.94, p = 0.03, R 2 = 0.48).However, there was no evidence for any predictors significantly explaining variance in frequency of use.We report the full regression analyses in the Supplementary Results.

3. 3 |
Aim 3: longitudinal analyses of changes in MBMF decision-making 3.3.1 | Overt behaviour (stay/switch after reward and transition) over time The group with MUD showed a significant improvement in their behavioural proxy of model-free decision-making across time compared with controls (Time * Group * Previous Win; OR = 1.19, 95% CI [1.02, 1.39], p = 0.029).Furthermore, post-hoc analyses found that participants with MUD significantly improved their repetition after reward (OR = 1.25, 95% CI [1.03, 1.51], p = 0.009), though the group with MUD still had marginal differences in this behaviour compared with controls at follow-up (OR = 0.76, T A B L E 3 Logistic regression predicting overt model-based/model-free behaviour at baseline.

F I G U R E 4
Estimated values for each parameter separated by group.Note: Dots represent individual participants, boxplot represents interquartile range (Q1-Q3), error bars represent to minimum/maximum.Any dots beyond are minor outliers (1.5 multiplied by IQR).For specific descriptions of each parameter, see Table 2

F I G U R E 5
Predicted likelihood to repeat choice after different feedback across sessions.Note: This figure highlights how the group with MUD begin to substantially improve their repetition of rewarded actions between baseline and follow-up differed from controls in model-based decision-making.These findings suggest that MUD-related deficits in goal-directed behaviour are likely driven by difficulties in learning the value of actions, rather than a dominant habitual system.As indicated by both behavioural and computational analyses, participants with MUD made less use of the previous trial's outcome to inform the next trial's first stage choice.However, we reason that people with MUD may have more generalised difficulties in learning from previous trial outcomes, rather than having a specific deficit in the mechanism developing habitual/model-free behaviour.This argument was supported by our behavioural and computational data showing that participants with MUD were less likely to repeat rewarded first stage choices (i.e., reduced win/stay behaviour)indicating an underlying difficulty in learning from positive outcomes.
matched in several socioeconomic factors to the group with MUD, exhibited only modest behavioural proxies of model-based decisionmaking at baseline appears to be further evidence of this.Our findings should be considered in the context of limitations.Our version of the Two-Step Task does not provide incentives for model-based decision-making, which may have led to underutilization of this system in both groups.This might have contributed to the null group differences in model-based computational mechanisms, as there was limited evidence of model-based decision-making in the overt behaviour analysis.Future work might rectify this challenge by employing a two-stage task that encourages model-based decisionmaking32 .Our task also did not allow investigation of whether modelbased and model-free systems were acting complementarily, or if model-based learning was not only acting prospectively but also to retrospectively inform the reward value of uncertain options, as suggested by recent computational neuroscience work.54Furthermore, model-based deficits and model-free dominance might only occur in MUD for methamphetamine-related stimuli.As such, our fictive ingame rewards may not have sufficiently engaged this dysfunctional T A B L E 4 Logistic regression predicting likelihood of participants to repeat their first choice across the two timepoints (initial assessment and six-week follow-up).