
Figure 1
Illustration of a single trial from the reinforcement learning task. Stimuli were presented for a maximum of 3sec, during which participants were free to make their selection. The selection was then highlighted for 500 ms, followed by a jitter of variable duration (2–6sec). Reward feedback was then presented for 3sec, followed by another jitter of variable duration (2-6sec). Stimuli consisted of two pairs of abstract fractal images (80% vs. 20% reinforcement rate), which were presented in randomized order, and participants completed 30 trials per pair.
Table 1
Overview of priors for group means.
| PARAMETER | GROUP-LEVEL PRIOR (μ) | GROUP-LEVEL PRIOR (σ) |
|---|---|---|
| α0 | Uniform (.01, 5) | Uniform (.0001, 2) |
| αexp | Uniform (–3, 3) | Uniform (.0001, 2) |
| τ0 | Uniform (0.1, 2) | Uniform (.0001, 2) |
| τexp | Uniform (–3, 3) | Uniform (.0001, 2) |
| vcoeff | Uniform (–100, 100) | Uniform (.0001, 10) |
| η+, η_ | Uniform (–3,3) | Uniform (.0001, 4) |

Figure 2
Response time distributions (RT, in seconds) in the control group (a, blue) and the gambling disorder group (b, red) with choices of the suboptimal options coded as negative RTs. c: Accuracy per group (chance level is 0.5). d: Total rewards earned per group. e: Median RTs per group.
Table 2
Model comparison results, separately per group. We examined reinforcement learning drift diffusion models (RLDDMs) with single vs. dual learning rates (η) and fixed vs. modulated non-decision times (τ) and decision threshold (α), as well as a null model without learning (DDM0). Model comparison used the estimated log pointwise predictive density (-elpd)(Vehtari et al., 2017). We also report the 95% CI of the difference in -elpd between each model and the best-fitting model (-elpddiff).
| MODEL | η | τ | α | CONTROLS | GAMBLERS | ||||
|---|---|---|---|---|---|---|---|---|---|
| -elpd | -elpddiff | RANK | -elpd | -elpddiff | RANK | ||||
| DDM0 | – | Fixed | Fixed | 800.2 | 215.0 [171.6, 258.4] | 9 | 1115.9 | 107.6 [78.9, 136.3] | 9 |
| RLDDM1 | 1 | Fixed | Fixed | 658.1 | 72.8 [48.6, 97.1] | 8 | 1055.5 | 47.3 [27.3, 67.2] | 8 |
| RLDDM2 | 1 | Fixed | Power | 634.3 | 49.0 [29.3, 80.7] | 6 | 1021.4 | 13.1 [.4, 25.8] | 4 |
| RLDDM3 | 1 | Power | Fixed | 644.0 | 58.8 [36.8, 97.1] | 7 | 1027.1 | 18.8 [3.1, 34.4] | 6 |
| RLDDM4 | 1 | Power | Power | 628.3 | 43.1 [24.1, 68.7 | 5 | 1010.3 | 2.0 [–9.1, 13.2] | 2 |
| RLDDM5 | 2 | Fixed | Fixed | 615.0 | 29.7 [15.9, 43.5] | 4 | 1049.0 | 40.7 [23.8 57.6] | 7 |
| RLDDM6 | 2 | Fixed | Power | 591.4 | 6.1 [–.1, 12.3] | 2 | 1019.3 | 11.1 [4.5, 17.6] | 3 |
| RLDDM7 | 2 | Power | Fixed | 599.8 | 14.5 [4.4, 24.6] | 3 | 1022.4 | 14.1 [3.4, 24.9] | 5 |
| RLDDM8 | 2 | Power | Power | 585.3 | 0.0 | 1 | 1008.3 | 0.0 | 1 |

Figure 3
Posterior predictive checks in the control group. Top row: observed RTs over time (black lines) and model predicted RTs (solid blue lines: means, dashed lines: +/– 95% percentiles). Bottom row shows observed accuracies over time (black lines) and model predicted accuracies (solid blue lines: means, dashed lines: +/– 95% percentiles). a) DDM0 without reinforcement learning. b) RLDDM1 with a single learning rate, fixed non-decision time and fixed decision threshold. c) RLDDM8 with dual learning rates, modulated non-decision time and modulated decision threshold.

Figure 4
Posterior predictive checks in the gambling disorder group. Top row: observed RTs over time (black lines) and model predicted RTs (solid red lines: means, dashed lines: +/– 95% percentiles). Bottom row shows observed accuracies over time (black lines) and model predicted accuracies (solid red lines: means, dashed lines: +/– 95% percentiles). a) DDM0 without reinforcement learning. b) RLDDM1 with a single learning rate, fixed non-decision time and fixed decision threshold. c) RLDDM8 with dual learning rates, modulated non-decision time and modulated decision threshold.

Figure 5
Posterior distributions for RLDDM8 parameters. Upper panels: posterior distributions of parameter group means for the control group (blue) and the gambling group (red). Lower panels: posterior group differences per parameter (control group – gambling disorder group). Solid (thin) horizontal lines in the lower panels denote 85% (95%) highest posterior density intervals.
Table 3
Group differences and within-group effects for all RLDDM8 parameters. Mdiff: mean posterior group difference. P(group diff. > 0): posterior probability that the group difference in a parameter is > 0. dBF (group difference): directional Bayes Factors comparing the evidence for a group difference > 0 to the evidence for a group difference < 0. Within group comparisons: P(effect): posterior probability for an effect (for αexp, τexp and vcoeff, the comparison is vs. 0). dBF: directional Bayes Factors comparing the evidence for a parameter value > 0 to the evidence for a parameter value < 0.
| GROUP DIFFERENCES | WITHIN-GROUP COMPARISONS | ||||||
|---|---|---|---|---|---|---|---|
| Mdiff | P (group diff. > 0) | dBF | CONTROL GROUP | GAMBLING GROUP | |||
| P (effect) | dBF | P (effect) | dBF | ||||
| α0 | –.167 | 18.25% | .29 | – | |||
| αexp | .061 | 96.39% | 27.60 | 69.29% | .43 | 99.98% | .00024 |
| τ0 | .128 | 96.91% | 27.09 | – | |||
| τexp | .002 | 52.05% | 1.12 | 99.71% | .003 | 96.10% | .045 |
| vcoeff | 1.79 | 96.40% | 25.79 | >99.99% | 15860 | >99.99% | 15828 |
| η+ | .147 | 62.44% | 1.64 | – | |||
| η– | .239 | 63.09% | 1.66 | – | |||

Figure 6
Parametric analyses of model-based average Q-values (GLM2) revealed a robust main effect across groups in the ventro-medial prefrontal cortex (a). Parameter estimates at the peak voxel from (a) are shown in b).
Table 4
Replication analyses for model-derived measures (main effects across groups): average Q-value across options, chosen – unchosen Q-value, and model-derived prediction error. Small volume correction for multiple comparisons (SVC) used a single region of interest mask across two meta-analyses (Bartra et al., 2013; Clithero & Rangel, 2014) of reward value effects (see methods section).
| CONTRAST/REGION | COORDINATES | PEAK T-VALUE | p(FWE)SVC | ||
|---|---|---|---|---|---|
| Average Q-value | |||||
| vmPFC | –4 | 38 | 6 | 4.73 | .002 |
| Chosen-unchosen value | |||||
| No significant effects in ROI | |||||
| Reward prediction error | |||||
| Left ventral striatum | –10 | 6 | –10 | 5.69 | <.001 |
| Right ventral striatum | 12 | 10 | –12 | 6.77 | <.001 |
| vmPFC | –4 | 56 | –4 | 6.26 | <.001 |
| Posterior Cingulate Cortex | 0 | –36 | –36 | 4.49 | .012 |

Figure 7
Parametric analysis of model-based reward prediction error (GLM1) revealed a robust main effect across groups in bilateral ventral striatum (a). Parameter estimates at peak voxels in (a) were then extracted from GLM3 to illustrate effects of positive (+) vs. negative (–) prediction errors in each group in both left and right ventral striatum (b).
Table 5
Inclusion Bayes Factors (BFincl) from Bayesian repeated measures ANOVAs at ventral striatal peak voxels showing main effects of model-based prediction error (PE) across groups (see Table 4).
| EFFECTS | LEFT VENTRAL STRIATUM [–10 6 –10] | RIGHT VENTRAL STRIATUM [10 12 –12] |
|---|---|---|
| PE sign | 1060.335 | 7059.337 |
| Group | 0.878 | 0.719 |
| Group * PE Sign | 1.966 | 1.759 |
