Anhedonic Traits Do Not Impair Performance in a 3-Arm Bandit Task

Arjun Ramaswamy; Yumeya Yamamori; Umesh Vivekananda; Vladimir Litvak; Jonathan P. Roiser

doi:10.5334/cpsy.135

Anhedonic Traits Do Not Impair Performance in a 3-Arm Bandit Task

Computational Psychiatry

Volume 10 (2026): Issue 1

By: Arjun Ramaswamy , Yumeya Yamamori , Umesh Vivekananda , Vladimir Litvak and Jonathan P. Roiser

Open Access

|Apr 2026

Figures & Tables

**Structure of the modified 3-Arm Bandit (3AB) Task**. **(a)** Visual representation of the modified 3-arm bandit task. Subjects choose between three options (arms), each associated with distinct reward and punishment probabilities. **(b)** Any arm selection can result in one of the four possible outcomes i.e. nothing, win token (green only), loss token (red only), or both. The task was designed to reduce cognitive load compared to the traditional 4-arm bandit task by limiting the number of choices and increasing the probability of both reward and punishment outcomes across trials. Outcome probabilities of win and loss events shown in **(c)**.

**Validation of the 3AB Task: Correlational Analysis of 4AB and 3AB Model Parameters**. Scatter plots illustrating the correlations between corresponding model parameters from the old 4AB task and the new 3AB task, using data from 111 subjects. Each subplot displays the Pearson correlation coefficient (r) and p-value, assessing the consistency of individual performance across reward learning rate (Arew), punishment learning rate (Apun), reward sensitivity (R), and punishment sensitivity (P). These plots aim to evaluate the validity of the new 3AB task by demonstrating whether similar patterns of behaviour are observed across both tasks.

**Group-level mean win-stay and lose-shift percentages are shown for each task version among participants who completed both tasks (N = 111)**. Error bars represent standard error of the mean. Strategy use was highly similar across tasks. Win-stay behaviour averaged 82.0% (SD = 27.5) for the 3AB task and 81.0% (SD = 26.6) for the 4AB task; lose-shift behaviour averaged 68.7% (SD = 22.4) for 3AB and 71.7% (SD = 22.2) for 4AB. Individual-level behaviour was strongly correlated across tasks (win-stay: r = 0.81, p < .001; lose-shift: r = 0.70, p < .001), indicating consistent application of learning strategies across the two task structures. Error bars represent the standard error of the mean.

Scatter plots illustrating the relationships between scores on the Dimensional Anhedonia Rating Scale (DARS), Generalized Anxiety Disorder Assessment 7-item (GAD-7), Snaith-Hamilton Pleasure Scale (SHAPS), and Zung Self- Rating Depression Scale (ZUNG). Each subplot presents the Pearson correlation coefficient (r) for the respective pairwise comparisons. The data points are represented as teal circular markers, with a salmon-coloured trend line indicating the linear relationship between variables. The sample size (N = 935) is consistent across all plots, reflecting complete cases across all four questionnaire totals. These relationships highlight the varying degrees of association between measures of anhedonia, anxiety, and depression, with strong correlations observed between DARS and SHAPS, and moderate correlations between GAD and SHAPS.

**Pairwise Pearson correlations between questionnaire scores in the final task sample (N = 206)**. Scatter plots show individual subject data and Pearson correlation coefficients (r). Panels use pairwise-complete observations. DARS = Dimensional Anhedonia Rating Scale; SHAPS = Snaith-Hamilton Pleasure Scale; GAD = Generalized Anxiety Disorder scale; ZUNG = Zung Depression Scale.

**Parameter Recovery for Learning Rates and Sensitivity**. The scatter plot displays the parameter recovery results for reward and punishment learning rates, as well as for reward and punishment sensitivity, across anhedonic and non-anhedonic groups. Each scatter plot compares the original parameters with the simulated parameters, with an accompanying trend line and Pearson correlation coefficient (r). Arew: r = 0.79 (anhedonic), r = 0.85 (non-anhedonic), Apun: r = 0.67 (anhedonic), r = 0.74 (non-anhedonic), R: r = 0.96 (anhedonic), r = 0.97 (non-anhedonic), P: r = 0.82 (anhedonic), r = 0.92 (non-anhedonic).

Table 1

Reward and Punishment Learning Parameters by Anhedonia Status.

MODEL PARAMETER	GROUP	MEAN	SD	t-STATISTIC	P- VALUE	DEGREES OF FREEDOM (df)	BF01 (EVIDENCE FOR NULL)
Arew	Anhedonic	0.48	0.18	1.19	0.24	192.0	3.36
Arew	Non-Anhedonic	0.45	0.20
Apun	Anhedonic	0.34	0.14	–0.72	0.48	186.7	5.14
Apun	Non-Anhedonic	0.36	0.16
R	Anhedonic	5.89	3.13	0.77	0.44	191.9	4.97
R	Non-Anhedonic	5.53	3.45
P	Anhedonic	4.07	2.48	0.45	0.65	181.9	5.96
P	Non-Anhedonic	3.90	3.02

**Posterior Distributions and 95% Highest Density Intervals (HDIs) for Group-Level Model Parameters**. Posterior densities are shown for each group-level parameter estimated via hierarchical Bayesian modelling, separately for Anhedonic and Non-Anhedonic participants. Black horizontal bars represent the 95% HDIs. Group difference scores were computed as Non-Anhedonic minus Anhedonic, such that negative values indicate higher estimates in the Anhedonic group. Across all parameters, the 95% HDIs of the group differences included zero, suggesting no credible differences between groups in learning rate or sensitivity parameters.

**Win-Stay and Lose-Shift Strategies Between Anhedonic and Non-Anhedonic Groups**. Bar plots represent the average win-stay and lose-shift strategy adoption percentages for the anhedonic and non-anhedonic groups. Error bars represent the standard error of the mean.

**Win-stay and lose-shift strategy rates in real vs simulated data across groups**. Bars show group means for Anhedonic and Non-Anhedonic participants with separate bars for Real (darker) and Simulated (lighter) data; error bars indicate standard error of the mean (SE). Simulations slightly underestimated win-stay (Δ (Sim–Real) ≈ –3.5 to –4.9 percentage points) and over-estimated lose-shift (Δ ≈ +12.8 to +13.2 points); paired within-group comparisons were significant (see Results). Subject-wise Real–Sim correlations were high for win-stay and moderate for lose-shift, indicating preservation of individual-difference structure.

**Predicted Action Probabilities and Actual Choices Across Trials**. Choices and modelled action probabilities for three representative subjects. Solid markers indicate actual choices made by the subjects on each trial (at y = 1), colour-coded by arm. Lines show trial-by-trial predicted probabilities generated by the hierarchical Bayesian model, with the same colour-coding.

**Distribution of Model Prediction Accuracy Across Subjects**. Histogram showing the distribution of model-predicted choice accuracy for each subject (N = 206). Accuracy was computed as the proportion of trials where the model’s highest predicted action probability (Pa) matched the participant’s actual choice. The vertical dashed line indicates the group mean accuracy (59.60%).

**Group Comparisons of Model Parameters and Self-Reported Anhedonia Measures (DARS and SHAPS)**. This figure illustrates the relationship between self-reported anhedonia and computational model parameters of reward and punishment learning. Scatter plots display the distribution of scores for the two questionnaire measures—DARS (daily activity and reward engagement) and SHAPS (hedonic capacity)—in relation to four computational model parameters: Reward Learning Rate, Punishment Learning Rate, Reward Sensitivity, and Punishment Sensitivity. Each point is colour-coded by group (anhedonic or non-anhedonic). Although correlation values are presented, these should be interpreted with caution due to the extreme groups design, which limits variability in one group and affects the generalizability of linear relationships. The figure primarily highlights that there is no strong link between subjective anhedonia group status and performance-based measures of reward and punishment processing.

Table 2

Exploratory Correlations Between DARS Subscale Scores and Computational Model Parameters.

DARS SUBSCALE	PARAMETER	CORRELATION	P-VALUE
Hobbies	Arew	–0.06	0.40
	Apun	0.06	0.39
	R	–0.09	0.22
	P	0.01	0.92
Food/Drink	Arew	–0.09	0.19
	Apun	0.04	0.61
	R	–0.07	0.29
	P	–0.02	0.80
Social Interaction	Arew	–0.08	0.27
	Apun	0.09	0.19
	R	–0.05	0.46
	P	–0.03	0.65
Sensory Experiences	Arew	–0.12	0.09
	Apun	–0.03	0.62
	R	0.00	0.98
	P	–0.01	0.92

Table 3

Demographics of Selected Participants (N = 206, Age = 18–60).

METRIC	VALUE	COUNT	PERCENTAGE
Age Range	18–60	—	—
Mean Age (SD)	39 (11)	—	—
Sex
Male		78	38%
Female		126	61%
Other		2	1%
Ethnicity
White		188	91%
Asian		8	4%
Black		4	2%
Two or More		4	2%
Other		2	1%
Education
High school graduate		90	44%
Bachelor’s degree		84	41%
Master’s degree		13	6%
Doctorate degree		7	3%
Professional degree		5	2%
Some high school – no diploma		5	2%
Other		2	1%

**Model Comparison Results Based on Relative LOOIC Values**. Bars show each model’s difference in LOOIC relative to the best-fitting model (relative LOOIC = 0; lower is better). In this dataset, banditNarm_lapse achieved the lowest LOOIC, with banditNarm_singleA_lapse and banditNarm_4par performing very similarly. We retained banditNarm_4par as the primary model for hypothesis testing because it directly targets valence-specific learning and sensitivity parameters central to our anhedonia-related hypotheses, while lapse models were treated as robustness checks.

Table 4

Model Variants Tested. Comparison of reinforcement learning models tested on the 3-arm bandit task. Each model varies in included parameters and computational assumptions. Arew = reward learning rate; Apun = punishment learning rate; A = shared learning rate; R = reward sensitivity; P = punishment sensitivity; ξ = lapse parameter (choice noise); τ = inverse temperature (softmax); β = inverse temperature/precision parameter in the choice rule; λ, θ, s0, sD = Kalman filter parameters governing mean reversion and uncertainty dynamics (see Supplementary for full definitions); decay = Q-value decay rate.

MODEL	PARAMETERS INCLUDED	KEY FEATURES
banditNarm_2par_lapse	Arew, Apun, ξ	Reduced lapse model with separate reward and punishment learning rates (Arew, Apun) but no sensitivity parameters; includes lapse ξ capturing stimulus-independent random responding
banditNarm_4par	Arew, Apun, R, P	Valence-specific learning rates and separate reward/punishment sensitivities (R, P); no lapse or temperature term; focus model for testing anhedonia-related hypotheses.
banditNarm_delta	A (shared), tau	Simple Rescorla–Wagner model with a single shared learning rate A across reward and punishment and an inverse temperature tau; no separate sensitivity parameters.
banditNarm_kalman_filter	λ, θ, s0, sD, β	Kalman filter over option values with mean reversion (λ, θ), uncertainty tracking (s0, sD) and inverse temperature β; no explicit learning-rate parameter.
banditNarm_lapse	Arew, Apun, R, P, ξ	Same valence-specific learning rates and sensitivities as banditNarm_4par, plus a lapse parameter ξ that mixes softmax choice with uniform random responding.
banditNarm_lapse_decay	Arew, Apun, R, P, ξ, decay	As banditNarm_lapse but with an additional decay term on Q-values; tests combined effects of sensitivity, ξ, and slow forgetting of option values over time.
banditNarm_singleA_lapse	A (shared), R, P, ξ	Single shared learning rate A with separate reward and punishment sensitivities (R, P) and lapse ξ; directly probes whether a shared vs valence-specific learning rate better explains behaviour.

Table 4

Model Variants Tested. Comparison of reinforcement learning models tested on the 3-arm bandit task. Each model varies in included parameters and computational assumptions. Arew = reward learning rate; Apun = punishment learning rate; A = shared learning rate; R = reward sensitivity; P = punishment sensitivity; xi = lapse parameter (choice noise); tau = softmax temperature; lambda = learning rate for uncertainty in Kalman filter; theta = initial uncertainty estimate; beta = precision of belief updating; decay = Q-value decay rate.

Model	Parameters Included	Key Features
banditNarm_2par_lapse	Arew, Apun, xi	Valence-specific learning; lapse (noise)
banditNarm_4par	Arew, Apun, R, P	Valence-specific learning + sensitivity (focus model)
banditNarm_delta	A (shared), tau	Simple Rescorla-Wagner; shared learning rate and temp
banditNarm_kalman_filter	lambda, theta, beta,	Uncertainty tracking via Kalman filter
banditNarm_lapse	Arew, Apun, R, P, xi	Full sensitivity model + lapse
banditNarm_lapse_decay	Arew, Apun, R, P, xi, decay	Sensitivity + lapse + decay of Q-values
banditNarm_singleA_lapse	A (shared), R, P, xi	Simplified model; shared learning rate

Table 4

Model	Parameters Included	Key Features
banditNarm_2par_lapse	Arew, Apun, xi	Valence-specific learning; lapse (noise)
banditNarm_4par	Arew, Apun, R, P	Valence-specific learning + sensitivity (focus model)
banditNarm_delta	A (shared), tau	Simple Rescorla-Wagner; shared learning rate and temp
banditNarm_kalman_filter	lambda, theta, beta,	Uncertainty tracking via Kalman filter
banditNarm_lapse	Arew, Apun, R, P, xi	Full sensitivity model + lapse
banditNarm_lapse_decay	Arew, Apun, R, P, xi, decay	Sensitivity + lapse + decay of Q-values
banditNarm_singleA_lap"se	A (shared), R, P, xi	Simplified model; shared learning rate

Articles in this issue

DOI: https://doi.org/10.5334/cpsy.135 | Journal eISSN: 2379-6227

Journal RSS Feed

Language: English

Submitted on: Feb 4, 2025

Accepted on: Mar 10, 2026

Published on: Apr 13, 2026

Published by: Ubiquity Press

In partnership with: Paradigm Publishing Services

Publication frequency: 1 issue per year

Keywords:

Anhedonia,

reinforcement learning,

3-arm bandit,

Hierarchical Bayesian modelling,

reward and punishment sensitivity,

Online behavioural study

© 2026 Arjun Ramaswamy, Yumeya Yamamori, Umesh Vivekananda, Vladimir Litvak, Jonathan P. Roiser, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.

Volume 10 (2026): Issue 1

Anhedonic Traits Do Not Impair Performance in a 3-Arm Bandit Task

Figures & Tables

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Table 1

Figure 7

Figure 8

Figure 9

Figure 10

Figure 11

Figure 12

Table 2

Table 3

Figure 13

Table 4

Table 4

Table 4

Paradigm

My account