Metacognition and the effect of incentive motivation in two compulsive disorders: Gambling disorder and obsessive–compulsive disorder

Aims Compulsivity is a common phenotype among psychiatric disorders, such as obsessive–compulsive disorder (OCD) and gambling disorder (GD). Deficiencies in metacognition, such as the inability to estimate one's performance via confidence judgments could contribute to pathological decision‐making. Earlier research has shown that patients with OCD exhibit underconfidence, while patients with GD exhibit overconfidence. Moreover, it is known that motivational states (e.g. monetary incentives) influence metacognition, with gain (respectively loss) prospects increasing (respectively decreasing) confidence. Here, we reasoned that OCD and GD symptoms might correspond to an exacerbation of this interaction between metacognition and motivation. Methods We hypothesized GD's overconfidence to be exaggerated during gain prospects, while OCD's underconfidence to be worsened in loss context, which we expected to see represented in ventromedial prefrontal cortex (VMPFC) blood‐oxygen‐level‐dependent activity. We tested those hypotheses in a task‐based functional magnetic resonance imaging (fMRI) design (27 patients with GD, 28 patients with OCD, 55 controls). The trial is registered in the Dutch Trial Register (NL6171). Results We showed increased confidence for patients with GD versus patients with OCD, which could partly be explained by sex and IQ. Although our primary analyses did not support the hypothesized interaction between incentives and groups, exploratory analyses did show increased confidence in patients with GD specifically in gain context. fMRI analyses confirmed a central role for VMPFC in the processing of confidence and incentives, but no differences between the groups. Conclusion Patients with OCD and those with GD reside at opposite ends of the confidence spectrum, while no interaction with incentives was found, nor group differences in neuronal processing of confidence.

sufficient (i.e., <nbins) to estimate the model, since the distribution of their confidence judgments was very skewed. This resulted in β2 being inestimable. Therefore, we excluded those two subjects (only) from the analyses of the last model when testing at the population level, whilst including them for the other models.

Figure S1: Properties of Confidence Judgments
A: observed performance (% correct choices) as a function of reported confidence. B: reported confidence as function of evidence for correct (green) and incorrect (red) choices. C: observed performance (% correct choices) as a function of evidence, for high (gray) and low (black) confidence trials. The insets presented on the side of each graph depict the results of the population-level analyses on the correlation coefficients (A) or on the regression coefficients (B and C). Error bars indicate inter-subject standard errors of the mean. *: P<.05; **: P<.01; ***P<.001 Table S1: Results of properties of confidence judgments

Behavioral Analyses: Properties of Early Certainty
Here we provide further details about the computation and properties of the early certainty variable.
To verify that our model of early certainty is an appropriate proxy of confidence judgments, we performed similar behavioral analyses to confirm the three main properties of confidence judgments still hold for our early certainty variable. We performed identical analyses, substituting subjective confidence judgments for early certainty values.
Our results show that the measures of early certainty and performance are highly correlated (R = 0.73 ± 0.0362; Figure S2A, Table S2). Early certainty is also positively associated with evidence for correct trials, and negatively for incorrect trials ( Figure S2B, Table S2). Finally, the relationship between performance and evidence is indeed higher in trials with high early certainty versus low early certainty ( Figure S2C, Table S2). Nota bene: when inspecting the data for the last model, we observed that the regression model was inestimable for four subjects. This was due to the median-split of the early certainty trials into high and low variants, where these four subjects had an average performance of 100% in the high confidence trials, making β1 inestimable. Therefore, we excluded those four subjects from the analyses of the last model when testing at the population level, whilst including them for the other models.  and evidence (z-scored) and interactions between incentive and group, as well as two-way and threeway interactions between evidence, accuracy and group. All models included trial-by-trial data, and a random subject intercept as well as a random slope of incentive per subject.

Behavioral Analyses: Integration of Evidence in Confidence Judgments
Theoretical models of confidence formation suggest that confidence buildsat least partly -on the integration of noisy perceptual evidence used for decision-making 7,8 . A resulting signature of confidence is its statistical dependence on an interaction of accuracy and perceptual evidence, which is typically illustrated as an 'X-pattern' where confidence increases/decreases with increasing evidence for correct/incorrect decisions, respectively. To study if GD and OCD patients show aberrant integration of evidence in confidence signals, we have included a three-way interaction term between evidence, accuracy and group in Model 1. Post-hoc testing was performed by comparing the groups on the slopes of evidence integration in confidence separately for correct and incorrect trials using the emtrends() function.

Behavioral Analyses: Confidence Calibration
Confidence calibrationalso known as confidence biasis the difference between average confidence and average performance per subject. If this measure is positive, this indicates overconfidence, whereas negative numbers indicate underconfidence. We calculated confidence calibration for each subject per incentive condition. We then performed a mixed ANOVA implemented in the afex package, to test for main effects of incentive conditions, groups, and their interaction. When a main effect was found significant, we performed post-hoc testing using the emmeans package, correcting for multiple comparisons using Tukey's method.

Behavioral Analyses: Metacognitive Sensitivity
Metacognitive sensitivity is a measure that indicates how well one's confidence judgments discriminate between one's correct and incorrect answers. One of the metrics used to express metacognitive sensitivity is discrimination. Discrimination is calculated as the difference between one's average confidence in their correct answers and their incorrect answers. The higher this metric, the more sensitive one's metacognitive abilities are. Another metric for sensitivity is meta-d', which represents how much information in signal-to-noise units is available for the formation of confidence judgments 9 .
The higher meta-d', the higher the metacognitive sensitivity.
We calculated discrimination for each subject per incentive condition. Moreover, we computed metad' per incentive and group using a hierarchical Bayesian framework 10 . We then performed two mixed ANOVA implemented in the afex package, to test for main effects of incentive conditions, groups, and their interaction on discrimination and meta-d' separately.

fMRI Analyses: Acquisition & Preprocessing
All our analyses were performed using MATLAB with SPM12 software (Wellcome Department of Cognitive Neurology, London, UK). Raw multi-echo functional scans were weighed and combined into 570 single volumes per scan session (64), using the first 30 dummy scans to calculate the optimal weighting of echo times for each voxel by applying a PAID-weight algorithm. During the combining process, realignment was performed on the functional data by using linear interpolation to the first volume. Subsequently, the functional images were co-registered, segmented for normalization to MNI space and smoothed. To reduce motion-related artifacts, the Art-Repair toolbox 11 was used to detect large volume-to-volume movement and repair outlier volumes. Outliers were detected using a threshold for the variation of the mean intensity of the BOLD signal and a volume-to-volume motion threshold. A threshold of 1.5% variation from the mean intensity was used to detect and repair volume outliers by interpolating from the adjacent volumes.

fMRI Analyses: General Linear Models
GLM 1 consisted of three regressors for each timepoint: 'choice', 'incentive/rating' and 'feedback', to which parametric modulators (pmods) were added. All regressors were specified as stick functions time-locked to the onset of the respective events. The choice regressor was modulated by two pmods: early certainty (z-scored on subject level) and button press (left or right) to control for motor-related activation. The incentive/rating regressor was modulated by two pmods: incentive value ([-1,0,1]) and confidence rating (z-scored on subject level). The feedback regressor was additionally modulated by a pmod representing choice accuracy.
GLM 2 consisted of regressors for each of two time points (choice moment and incentive/rating moment) and three incentive conditions, as well as a single regressor at feedback moment, resulting in a total of seven regressors. All regressors at choice moment were modulated by a pmod of button press (left/right) and signed evidence: a variable that signifies the interaction between evidence and accuracy. Signed evidence was calculated as the absolute value of evidence in case of correct answers and the negative absolute evidence (i.e. -abs(evidence)) in case of incorrect answers. All regressors at rating moment were modulated by a pmod of confidence, and the feedback regressor was modulated by a pmod of accuracy. Thus, for all these events we could examine both baseline activity and regression slopes relating to their respective pmod.
For both GLMs pmods were not orthogonalized and thus competed to explain variance. We included six motion parameters as nuisance regressors. Regressors were modeled separately for each scanning session and constants were included to account for between-session differences in mean activation. All events were modeled by convolving a series of delta functions with the canonical hemodynamic response function (HRF) at the onset of each event and were linearly regressed onto the functional BOLD-response signal. Low frequency noise was filtered with a high pass filter with a cut off of 128 seconds. We controlled for the number of sessions while making the first-level contrasts. All contrasts were computed at subject level and then taken to group level analyses. For GLM 1 we assessed group differences by performing a one-way ANOVA to our contrasts of interest, using an F-contrast test to test for any group differences (i.e. [1 -1 0; 0 1 -1]). In addition, to gain a complete picture of areas involved in our contrasts of interest, we grouped all subjects together and performed one-sample ttests against 0.

Behavioral Descriptive Results
Here we show the descriptive results that are depicted in Figure 2.

Behavioral Analyses: Integration of Evidence in Confidence Judgments
The evidence integration effect differed per group, as signaled by a significant three-way interaction between accuracy, evidence and group (F2,15094 = 3.0533, p=0.04723) (Figure 3, Supplementary Table 3). Post-hoc, we compared the groups on the slopes of evidence integration in confidence separately for correct and incorrect trials using the emtrends() function, and found that the slope for evidence integration into confidence was less steep for correct answers in GD patients compared to both HCs (GD -HC = -1.712 +-0.283, Z-ratio = -6.057, p<0.001) and OCD patients (GD -OCD = -2.110 +-0.357, Z-ratio = -5.912, p<0.001). This indicates that GD patients' confidence ratings were less influenced by the perceptual evidence when they made a correct choice. No differences between OCD patients and HC were found regarding evidence integration effects.

Behavioral Analyses: Confidence Calibration
We found a significant main effect of group (F(2,107)=4.40, p = 0.015), but no effect of incentive, nor an interaction effect between group and incentive. Post-hoc tests showed that GD patients showed increased calibration compared to OCD patients (t107: -2.967, p = 0.0103), but no differences between GD or OCD patients and HC subjects. This indicated that GD patients are more overconfident

Behavioral Analyses: Metacognitive Sensitivity
We did not find a significant main effect of group or incentive, nor an interaction effect between group and incentive, both for discrimination and meta-d'. Average discrimination values were positive and average meta-d' was close to 1, indicating sensitive metacognition.

Behavioral Analyses: Clinical Correlations
We performed additional correlational analyses to explore whether subject's mean confidence level correlates with various clinical questionnaires of interest, separately for OCD and GD patients. In OCD patients there were no significant correlations with severity of OCD symptoms as measured with the YBOCS (p>.5) or with obsessive beliefs measured with the OBQ-44 (p>.5). In GD patients there was also no significant correlation with symptom severity measured using PGSI (p>.4), but there was a significant positive correlation between confidence level and BAS (Behavioral Approach System) scores (r = 0.4608, p = 0.01784).

Behavioral Analyses: Evidence Across Conditions
Due to a technical bug, perceptual evidence was not equal across incentive conditions. We performed a mixed ANOVA with within-subject factor incentive and between-subject factor group, which showed that evidence differed significantly over incentive conditions (F2,205=39.94 p<.001), but not over groups (F2,107=0.94 p>.3), and no interaction between incentive and group was found (F3.83,205=0.82 p>.5). Post-hoc testing using t-tests revealed that evidence was highest in neutral, followed by gain, followed by loss condition (neutral versus loss: t-ratio= 7.844, p<.001; neutral versus gain: t-ratio=3.306, p=0.001; gain versus loss: t-ratio: 5.537, p<.001). Since evidence did not differ between the groups, it cannot account for any group differences we find in our data. Importantly, there are no effects of incentive on performance. Moreover, the difference in evidence over incentive conditions does not drive our incentive-induced confidence bias, since we do find a parametrically increase in confidence over incentive value, with a significant difference between all pairs. This means that confidence is higher in gain versus neutral conditions, even though evidence was significantly higher in neutral versus gain conditions. This shows that even though trials were easier in the neutral condition, participants were still more confident when they could gain points.

Behavioral Analyses: Clinical Groups With Their Own Control Group
In order to explore whether the behavioral analyses as in the main results with better matched control groups to the demographics of the two clinical groups would reveal similar results, we selected two subsets from our bigger sample of HCs (OCD control group N = 31, GD control group N = 32, with a slight overlap of N=8) of control groups to compare them with the two clinical groups.
Even though the groups were better matched and did not significantly differ from the clinical groups These additional analyses thus show that even when using a better matched control group, we find no evidence for abnormalities in confidence level for OCD nor GD patients. For GD patients, we do, however, replicate that GD patients have a lower slope of evidence integration in confidence for correct answers compared to HCs.

fMRI: Interaction Between Metacognition and Incentives in VS (GLM 2)
We performed an ROI analysis by leveraging our factorial design. We extracted VS activations for both time points (choice and rating), all incentives (loss, neutral and gain), all groups (HC, OCD and GD), for both baseline activity and a regression slope with (1) signed evidence and (2) confidence judgments for all these events.
First, one-sample t-tests showed that, overall, VS baseline activations did not differ from 0 at choice moment (t100 = -0.317, p >0.75), while it was positive for baseline activations at rating moment (t100 = 8.238, p < 0.001). The correlations between VS activity and signed evidence at choice moment was significantly positive (t100 = 4.985, p < 0.001). However, the correlation between VS activity and confidence at rating moment did not differ from 0 (t100 = 1.664, p = 0.099) ( Figure S3). This implies that activity in VS is related to incentive presentation, but also that it is related to signed evidence (i.e. the interaction between accuracy and evidence, showing that VS activity was lowest when one had high levels of evidence but was incorrect, and highest when one had a lot of evidence and was in fact correct). Then, we turned to see whether there were effects of incentive condition and group around this general signal. As expected, at choice moment there were no effects of incentive condition on VS baseline activity, nor on its correlation with the signed evidence signal (i.e. slope) ( Figure S3, Table   S4). Moreover, we did not find a group nor an interaction effect on both baseline VS activity and the correlation with signed evidence at choice moment. At rating moment, however, incentive condition had a significant effect on both the baseline VS activity, as well as its correlation with confidence.
Post-hoc testing showed that the baseline VS activity was highest during gain, followed by loss, and lowest during neutral (loss versus gain: t196: -4.590, p < 0.001, neutral versus gain: t196: -7.710, p < 0.001, loss versus neutral: t196 = 3.119, p = 0.006). The correlation of VS activity with confidence was significantly higher (i.e. increased slope) in gain versus neutral (t196 = -2.607, p = 0.0265), while no differences between gain and loss, or between neutral and loss were found. Moreover, there was a significant group effect on VS baseline activity during rating moment. This effect did not remain significant in the post-hoc tests, however, which showed that GD subjects had subthreshold decreased activity compared with HCs, averaged over incentive conditions (t98 = -2.272, p = 0.0646). No interaction effects between group and incentive were found on baseline activity or its correlation with confidence at rating moment.