Have a personal or library account? Click to login
Decomposition of Reinforcement Learning Deficits in Disordered Gambling via Drift Diffusion Modeling and Functional Magnetic Resonance Imaging Cover

Decomposition of Reinforcement Learning Deficits in Disordered Gambling via Drift Diffusion Modeling and Functional Magnetic Resonance Imaging

By: Antonius Wiehler and  Jan Peters  
Open Access
|Mar 2024

Figures & Tables

cpsy-8-1-104-g1.png
Figure 1

Illustration of a single trial from the reinforcement learning task. Stimuli were presented for a maximum of 3sec, during which participants were free to make their selection. The selection was then highlighted for 500 ms, followed by a jitter of variable duration (2–6sec). Reward feedback was then presented for 3sec, followed by another jitter of variable duration (2-6sec). Stimuli consisted of two pairs of abstract fractal images (80% vs. 20% reinforcement rate), which were presented in randomized order, and participants completed 30 trials per pair.

Table 1

Overview of priors for group means.

PARAMETERGROUP-LEVEL PRIOR (μ)GROUP-LEVEL PRIOR (σ)
α0Uniform (.01, 5)Uniform (.0001, 2)
αexpUniform (–3, 3)Uniform (.0001, 2)
τ0Uniform (0.1, 2)Uniform (.0001, 2)
τexpUniform (–3, 3)Uniform (.0001, 2)
vcoeffUniform (–100, 100)Uniform (.0001, 10)
η+, η_Uniform (–3,3)Uniform (.0001, 4)
cpsy-8-1-104-g2.png
Figure 2

Response time distributions (RT, in seconds) in the control group (a, blue) and the gambling disorder group (b, red) with choices of the suboptimal options coded as negative RTs. c: Accuracy per group (chance level is 0.5). d: Total rewards earned per group. e: Median RTs per group.

Table 2

Model comparison results, separately per group. We examined reinforcement learning drift diffusion models (RLDDMs) with single vs. dual learning rates (η) and fixed vs. modulated non-decision times (τ) and decision threshold (α), as well as a null model without learning (DDM0). Model comparison used the estimated log pointwise predictive density (-elpd)(Vehtari et al., 2017). We also report the 95% CI of the difference in -elpd between each model and the best-fitting model (-elpddiff).

MODELηταCONTROLSGAMBLERS
-elpd-elpddiffRANK-elpd-elpddiffRANK
DDM0FixedFixed800.2215.0
[171.6, 258.4]
91115.9107.6
[78.9, 136.3]
9
RLDDM11FixedFixed658.172.8
[48.6, 97.1]
81055.547.3
[27.3, 67.2]
8
RLDDM21FixedPower634.349.0
[29.3, 80.7]
61021.413.1
[.4, 25.8]
4
RLDDM31PowerFixed644.058.8
[36.8, 97.1]
71027.118.8
[3.1, 34.4]
6
RLDDM41PowerPower628.343.1
[24.1, 68.7
51010.32.0
[–9.1, 13.2]
2
RLDDM52FixedFixed615.029.7
[15.9, 43.5]
41049.040.7
[23.8 57.6]
7
RLDDM62FixedPower591.46.1
[–.1, 12.3]
21019.311.1
[4.5, 17.6]
3
RLDDM72PowerFixed599.814.5
[4.4, 24.6]
31022.414.1
[3.4, 24.9]
5
RLDDM82PowerPower585.30.011008.30.01
cpsy-8-1-104-g3.png
Figure 3

Posterior predictive checks in the control group. Top row: observed RTs over time (black lines) and model predicted RTs (solid blue lines: means, dashed lines: +/– 95% percentiles). Bottom row shows observed accuracies over time (black lines) and model predicted accuracies (solid blue lines: means, dashed lines: +/– 95% percentiles). a) DDM0 without reinforcement learning. b) RLDDM1 with a single learning rate, fixed non-decision time and fixed decision threshold. c) RLDDM8 with dual learning rates, modulated non-decision time and modulated decision threshold.

cpsy-8-1-104-g4.png
Figure 4

Posterior predictive checks in the gambling disorder group. Top row: observed RTs over time (black lines) and model predicted RTs (solid red lines: means, dashed lines: +/– 95% percentiles). Bottom row shows observed accuracies over time (black lines) and model predicted accuracies (solid red lines: means, dashed lines: +/– 95% percentiles). a) DDM0 without reinforcement learning. b) RLDDM1 with a single learning rate, fixed non-decision time and fixed decision threshold. c) RLDDM8 with dual learning rates, modulated non-decision time and modulated decision threshold.

cpsy-8-1-104-g5.png
Figure 5

Posterior distributions for RLDDM8 parameters. Upper panels: posterior distributions of parameter group means for the control group (blue) and the gambling group (red). Lower panels: posterior group differences per parameter (control group – gambling disorder group). Solid (thin) horizontal lines in the lower panels denote 85% (95%) highest posterior density intervals.

Table 3

Group differences and within-group effects for all RLDDM8 parameters. Mdiff: mean posterior group difference. P(group diff. > 0): posterior probability that the group difference in a parameter is > 0. dBF (group difference): directional Bayes Factors comparing the evidence for a group difference > 0 to the evidence for a group difference < 0. Within group comparisons: P(effect): posterior probability for an effect (for αexp, τexp and vcoeff, the comparison is vs. 0). dBF: directional Bayes Factors comparing the evidence for a parameter value > 0 to the evidence for a parameter value < 0.

GROUP DIFFERENCESWITHIN-GROUP COMPARISONS
MdiffP (group diff. > 0)dBFCONTROL GROUPGAMBLING GROUP
P (effect)dBFP (effect)dBF
α0–.16718.25%.29
αexp.06196.39%27.6069.29%.4399.98%.00024
τ0.12896.91%27.09
τexp.00252.05%1.1299.71%.00396.10%.045
vcoeff1.7996.40%25.79>99.99%15860>99.99%15828
η+.14762.44%1.64
η.23963.09%1.66
cpsy-8-1-104-g6.png
Figure 6

Parametric analyses of model-based average Q-values (GLM2) revealed a robust main effect across groups in the ventro-medial prefrontal cortex (a). Parameter estimates at the peak voxel from (a) are shown in b).

Table 4

Replication analyses for model-derived measures (main effects across groups): average Q-value across options, chosen – unchosen Q-value, and model-derived prediction error. Small volume correction for multiple comparisons (SVC) used a single region of interest mask across two meta-analyses (Bartra et al., 2013; Clithero & Rangel, 2014) of reward value effects (see methods section).

CONTRAST/REGIONCOORDINATESPEAK T-VALUEp(FWE)SVC
Average Q-value
  vmPFC–43864.73.002
Chosen-unchosen value
No significant effects in ROI
Reward prediction error
  Left ventral striatum–106–105.69<.001
  Right ventral striatum1210–126.77<.001
  vmPFC–456–46.26<.001
  Posterior Cingulate Cortex0–36–364.49.012
cpsy-8-1-104-g7.png
Figure 7

Parametric analysis of model-based reward prediction error (GLM1) revealed a robust main effect across groups in bilateral ventral striatum (a). Parameter estimates at peak voxels in (a) were then extracted from GLM3 to illustrate effects of positive (+) vs. negative (–) prediction errors in each group in both left and right ventral striatum (b).

Table 5

Inclusion Bayes Factors (BFincl) from Bayesian repeated measures ANOVAs at ventral striatal peak voxels showing main effects of model-based prediction error (PE) across groups (see Table 4).

EFFECTSLEFT VENTRAL STRIATUM
[–10 6 –10]
RIGHT VENTRAL STRIATUM
[10 12 –12]
PE sign1060.3357059.337
Group0.8780.719
Group * PE Sign1.9661.759
DOI: https://doi.org/10.5334/cpsy.104 | Journal eISSN: 2379-6227
Language: English
Submitted on: Oct 7, 2023
Accepted on: Mar 7, 2024
Published on: Mar 20, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Antonius Wiehler, Jan Peters, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.