
Figure 1
Structure of the modified 3-Arm Bandit (3AB) Task. (a) Visual representation of the modified 3-arm bandit task. Subjects choose between three options (arms), each associated with distinct reward and punishment probabilities. (b) Any arm selection can result in one of the four possible outcomes i.e. nothing, win token (green only), loss token (red only), or both. The task was designed to reduce cognitive load compared to the traditional 4-arm bandit task by limiting the number of choices and increasing the probability of both reward and punishment outcomes across trials. Outcome probabilities of win and loss events shown in (c).

Figure 2
Validation of the 3AB Task: Correlational Analysis of 4AB and 3AB Model Parameters. Scatter plots illustrating the correlations between corresponding model parameters from the old 4AB task and the new 3AB task, using data from 111 subjects. Each subplot displays the Pearson correlation coefficient (r) and p-value, assessing the consistency of individual performance across reward learning rate (Arew), punishment learning rate (Apun), reward sensitivity (R), and punishment sensitivity (P). These plots aim to evaluate the validity of the new 3AB task by demonstrating whether similar patterns of behaviour are observed across both tasks.

Figure 3
Group-level mean win-stay and lose-shift percentages are shown for each task version among participants who completed both tasks (N = 111). Error bars represent standard error of the mean. Strategy use was highly similar across tasks. Win-stay behaviour averaged 82.0% (SD = 27.5) for the 3AB task and 81.0% (SD = 26.6) for the 4AB task; lose-shift behaviour averaged 68.7% (SD = 22.4) for 3AB and 71.7% (SD = 22.2) for 4AB. Individual-level behaviour was strongly correlated across tasks (win-stay: r = 0.81, p < .001; lose-shift: r = 0.70, p < .001), indicating consistent application of learning strategies across the two task structures. Error bars represent the standard error of the mean.

Figure 4
Scatter plots illustrating the relationships between scores on the Dimensional Anhedonia Rating Scale (DARS), Generalized Anxiety Disorder Assessment 7-item (GAD-7), Snaith-Hamilton Pleasure Scale (SHAPS), and Zung Self- Rating Depression Scale (ZUNG). Each subplot presents the Pearson correlation coefficient (r) for the respective pairwise comparisons. The data points are represented as teal circular markers, with a salmon-coloured trend line indicating the linear relationship between variables. The sample size (N = 935) is consistent across all plots, reflecting complete cases across all four questionnaire totals. These relationships highlight the varying degrees of association between measures of anhedonia, anxiety, and depression, with strong correlations observed between DARS and SHAPS, and moderate correlations between GAD and SHAPS.

Figure 5
Pairwise Pearson correlations between questionnaire scores in the final task sample (N = 206). Scatter plots show individual subject data and Pearson correlation coefficients (r). Panels use pairwise-complete observations. DARS = Dimensional Anhedonia Rating Scale; SHAPS = Snaith-Hamilton Pleasure Scale; GAD = Generalized Anxiety Disorder scale; ZUNG = Zung Depression Scale.

Figure 6
Parameter Recovery for Learning Rates and Sensitivity. The scatter plot displays the parameter recovery results for reward and punishment learning rates, as well as for reward and punishment sensitivity, across anhedonic and non-anhedonic groups. Each scatter plot compares the original parameters with the simulated parameters, with an accompanying trend line and Pearson correlation coefficient (r). Arew: r = 0.79 (anhedonic), r = 0.85 (non-anhedonic), Apun: r = 0.67 (anhedonic), r = 0.74 (non-anhedonic), R: r = 0.96 (anhedonic), r = 0.97 (non-anhedonic), P: r = 0.82 (anhedonic), r = 0.92 (non-anhedonic).
Table 1
Reward and Punishment Learning Parameters by Anhedonia Status.
| MODEL PARAMETER | GROUP | MEAN | SD | t-STATISTIC | P- VALUE | DEGREES OF FREEDOM (df) | BF01 (EVIDENCE FOR NULL) |
|---|---|---|---|---|---|---|---|
| Arew | Anhedonic | 0.48 | 0.18 | 1.19 | 0.24 | 192.0 | 3.36 |
| Non-Anhedonic | 0.45 | 0.20 | |||||
| Apun | Anhedonic | 0.34 | 0.14 | –0.72 | 0.48 | 186.7 | 5.14 |
| Non-Anhedonic | 0.36 | 0.16 | |||||
| R | Anhedonic | 5.89 | 3.13 | 0.77 | 0.44 | 191.9 | 4.97 |
| Non-Anhedonic | 5.53 | 3.45 | |||||
| P | Anhedonic | 4.07 | 2.48 | 0.45 | 0.65 | 181.9 | 5.96 |
| Non-Anhedonic | 3.90 | 3.02 |

Figure 7
Posterior Distributions and 95% Highest Density Intervals (HDIs) for Group-Level Model Parameters. Posterior densities are shown for each group-level parameter estimated via hierarchical Bayesian modelling, separately for Anhedonic and Non-Anhedonic participants. Black horizontal bars represent the 95% HDIs. Group difference scores were computed as Non-Anhedonic minus Anhedonic, such that negative values indicate higher estimates in the Anhedonic group. Across all parameters, the 95% HDIs of the group differences included zero, suggesting no credible differences between groups in learning rate or sensitivity parameters.

Figure 8
Win-Stay and Lose-Shift Strategies Between Anhedonic and Non-Anhedonic Groups. Bar plots represent the average win-stay and lose-shift strategy adoption percentages for the anhedonic and non-anhedonic groups. Error bars represent the standard error of the mean.

Figure 9
Win-stay and lose-shift strategy rates in real vs simulated data across groups. Bars show group means for Anhedonic and Non-Anhedonic participants with separate bars for Real (darker) and Simulated (lighter) data; error bars indicate standard error of the mean (SE). Simulations slightly underestimated win-stay (Δ (Sim–Real) ≈ –3.5 to –4.9 percentage points) and over-estimated lose-shift (Δ ≈ +12.8 to +13.2 points); paired within-group comparisons were significant (see Results). Subject-wise Real–Sim correlations were high for win-stay and moderate for lose-shift, indicating preservation of individual-difference structure.

Figure 10
Predicted Action Probabilities and Actual Choices Across Trials. Choices and modelled action probabilities for three representative subjects. Solid markers indicate actual choices made by the subjects on each trial (at y = 1), colour-coded by arm. Lines show trial-by-trial predicted probabilities generated by the hierarchical Bayesian model, with the same colour-coding.

Figure 11
Distribution of Model Prediction Accuracy Across Subjects. Histogram showing the distribution of model-predicted choice accuracy for each subject (N = 206). Accuracy was computed as the proportion of trials where the model’s highest predicted action probability (Pa) matched the participant’s actual choice. The vertical dashed line indicates the group mean accuracy (59.60%).

Figure 12
Group Comparisons of Model Parameters and Self-Reported Anhedonia Measures (DARS and SHAPS). This figure illustrates the relationship between self-reported anhedonia and computational model parameters of reward and punishment learning. Scatter plots display the distribution of scores for the two questionnaire measures—DARS (daily activity and reward engagement) and SHAPS (hedonic capacity)—in relation to four computational model parameters: Reward Learning Rate, Punishment Learning Rate, Reward Sensitivity, and Punishment Sensitivity. Each point is colour-coded by group (anhedonic or non-anhedonic). Although correlation values are presented, these should be interpreted with caution due to the extreme groups design, which limits variability in one group and affects the generalizability of linear relationships. The figure primarily highlights that there is no strong link between subjective anhedonia group status and performance-based measures of reward and punishment processing.
Table 2
Exploratory Correlations Between DARS Subscale Scores and Computational Model Parameters.
| DARS SUBSCALE | PARAMETER | CORRELATION | P-VALUE |
|---|---|---|---|
| Hobbies | Arew | –0.06 | 0.40 |
| Apun | 0.06 | 0.39 | |
| R | –0.09 | 0.22 | |
| P | 0.01 | 0.92 | |
| Food/Drink | Arew | –0.09 | 0.19 |
| Apun | 0.04 | 0.61 | |
| R | –0.07 | 0.29 | |
| P | –0.02 | 0.80 | |
| Social Interaction | Arew | –0.08 | 0.27 |
| Apun | 0.09 | 0.19 | |
| R | –0.05 | 0.46 | |
| P | –0.03 | 0.65 | |
| Sensory Experiences | Arew | –0.12 | 0.09 |
| Apun | –0.03 | 0.62 | |
| R | 0.00 | 0.98 | |
| P | –0.01 | 0.92 |
Table 3
Demographics of Selected Participants (N = 206, Age = 18–60).
| METRIC | VALUE | COUNT | PERCENTAGE |
|---|---|---|---|
| Age Range | 18–60 | — | — |
| Mean Age (SD) | 39 (11) | — | — |
| Sex | |||
| Male | 78 | 38% | |
| Female | 126 | 61% | |
| Other | 2 | 1% | |
| Ethnicity | |||
| White | 188 | 91% | |
| Asian | 8 | 4% | |
| Black | 4 | 2% | |
| Two or More | 4 | 2% | |
| Other | 2 | 1% | |
| Education | |||
| High school graduate | 90 | 44% | |
| Bachelor’s degree | 84 | 41% | |
| Master’s degree | 13 | 6% | |
| Doctorate degree | 7 | 3% | |
| Professional degree | 5 | 2% | |
| Some high school – no diploma | 5 | 2% | |
| Other | 2 | 1% | |

Figure 13
Model Comparison Results Based on Relative LOOIC Values. Bars show each model’s difference in LOOIC relative to the best-fitting model (relative LOOIC = 0; lower is better). In this dataset, banditNarm_lapse achieved the lowest LOOIC, with banditNarm_singleA_lapse and banditNarm_4par performing very similarly. We retained banditNarm_4par as the primary model for hypothesis testing because it directly targets valence-specific learning and sensitivity parameters central to our anhedonia-related hypotheses, while lapse models were treated as robustness checks.
Table 4
Model Variants Tested. Comparison of reinforcement learning models tested on the 3-arm bandit task. Each model varies in included parameters and computational assumptions. Arew = reward learning rate; Apun = punishment learning rate; A = shared learning rate; R = reward sensitivity; P = punishment sensitivity; ξ = lapse parameter (choice noise); τ = inverse temperature (softmax); β = inverse temperature/precision parameter in the choice rule; λ, θ, s0, sD = Kalman filter parameters governing mean reversion and uncertainty dynamics (see Supplementary for full definitions); decay = Q-value decay rate.
| MODEL | PARAMETERS INCLUDED | KEY FEATURES |
|---|---|---|
| banditNarm_2par_lapse | Arew, Apun, ξ | Reduced lapse model with separate reward and punishment learning rates (Arew, Apun) but no sensitivity parameters; includes lapse ξ capturing stimulus-independent random responding |
| banditNarm_4par | Arew, Apun, R, P | Valence-specific learning rates and separate reward/punishment sensitivities (R, P); no lapse or temperature term; focus model for testing anhedonia-related hypotheses. |
| banditNarm_delta | A (shared), tau | Simple Rescorla–Wagner model with a single shared learning rate A across reward and punishment and an inverse temperature tau; no separate sensitivity parameters. |
| banditNarm_kalman_filter | λ, θ, s0, sD, β | Kalman filter over option values with mean reversion (λ, θ), uncertainty tracking (s0, sD) and inverse temperature β; no explicit learning-rate parameter. |
| banditNarm_lapse | Arew, Apun, R, P, ξ | Same valence-specific learning rates and sensitivities as banditNarm_4par, plus a lapse parameter ξ that mixes softmax choice with uniform random responding. |
| banditNarm_lapse_decay | Arew, Apun, R, P, ξ, decay | As banditNarm_lapse but with an additional decay term on Q-values; tests combined effects of sensitivity, ξ, and slow forgetting of option values over time. |
| banditNarm_singleA_lapse | A (shared), R, P, ξ | Single shared learning rate A with separate reward and punishment sensitivities (R, P) and lapse ξ; directly probes whether a shared vs valence-specific learning rate better explains behaviour. |
Table 4
Model Variants Tested. Comparison of reinforcement learning models tested on the 3-arm bandit task. Each model varies in included parameters and computational assumptions. Arew = reward learning rate; Apun = punishment learning rate; A = shared learning rate; R = reward sensitivity; P = punishment sensitivity; xi = lapse parameter (choice noise); tau = softmax temperature; lambda = learning rate for uncertainty in Kalman filter; theta = initial uncertainty estimate; beta = precision of belief updating; decay = Q-value decay rate.
| Model | Parameters Included | Key Features |
|---|---|---|
| banditNarm_2par_lapse | Arew, Apun, xi | Valence-specific learning; lapse (noise) |
| banditNarm_4par | Arew, Apun, R, P | Valence-specific learning + sensitivity (focus model) |
| banditNarm_delta | A (shared), tau | Simple Rescorla-Wagner; shared learning rate and temp |
| banditNarm_kalman_filter | lambda, theta, beta, | Uncertainty tracking via Kalman filter |
| banditNarm_lapse | Arew, Apun, R, P, xi | Full sensitivity model + lapse |
| banditNarm_lapse_decay | Arew, Apun, R, P, xi, decay | Sensitivity + lapse + decay of Q-values |
| banditNarm_singleA_lapse | A (shared), R, P, xi | Simplified model; shared learning rate |



Table 4
Model Variants Tested. Comparison of reinforcement learning models tested on the 3-arm bandit task. Each model varies in included parameters and computational assumptions. Arew = reward learning rate; Apun = punishment learning rate; A = shared learning rate; R = reward sensitivity; P = punishment sensitivity; xi = lapse parameter (choice noise); tau = softmax temperature; lambda = learning rate for uncertainty in Kalman filter; theta = initial uncertainty estimate; beta = precision of belief updating; decay = Q-value decay rate.
| Model | Parameters Included | Key Features |
|---|---|---|
| banditNarm_2par_lapse | Arew, Apun, xi | Valence-specific learning; lapse (noise) |
| banditNarm_4par | Arew, Apun, R, P | Valence-specific learning + sensitivity (focus model) |
| banditNarm_delta | A (shared), tau | Simple Rescorla-Wagner; shared learning rate and temp |
| banditNarm_kalman_filter | lambda, theta, beta, | Uncertainty tracking via Kalman filter |
| banditNarm_lapse | Arew, Apun, R, P, xi | Full sensitivity model + lapse |
| banditNarm_lapse_decay | Arew, Apun, R, P, xi, decay | Sensitivity + lapse + decay of Q-values |
| banditNarm_singleA_lap"se | A (shared), R, P, xi | Simplified model; shared learning rate |







