
Figure 1
Four-armed bandit and gambling task. Example trial of the four-armed bandit task (a). On each trial, participants chose one out of four bandits and received one out of four possible outcomes: reward (green token), punishment (red token), neither reward nor punishment (empty box) or both reward and punishment (red and green token). An example of the win and loss probabilities fluctuating independently over time within one of the boxes (b). On each gambling task trial, participants chose between a 50–50 gamble and a sure (guaranteed amount of points) option (c). Trials were either mixed gambles (50–50 chance of winning or losing points or sure option of 0 points) or gain-only trials (50–50 chance of winning or receiving nothing or sure gain).

Figure 2
Basic behaviour, practice effects, and test-retest reliability of model-agnostic measures on the four-armed bandit task. Boxplots of the four-armed bandit task showing probability to stay after a certain outcome in session 1 and 2 (a). The probability to stay was significantly different after each outcome type (Loss<Neither<Win) but no clear practice effect was evident. Scatter plots of the model-agnostic measures comparing behaviour on two testing sessions approximately 2 weeks apart (b). Lightly shaded regions in Figure 2a represent within-subjects standard error of the mean (SEM). * p < 0.001.
Table 1
Reliability of model-agnostic and computational measures of the four-armed bandit task. All measures but the lapse parameter are significant at p < 0.05. Brackets represent the 95% confidence interval.
| MODEL-AGNOSTIC P(STAY) MEASURES (N = 50) | ICC(A,1) | ICC(1) | PEARSON’S R |
|---|---|---|---|
| Summary statistics (Figure 2) | |||
| Win | 0.46 (0.21–0.65) | 0.46 (0.21–0.65) | 0.46 (0.20–0.65) |
| Loss | 0.54 (0.32–0.71) | 0.54 (0.31–0.71) | 0.55 (0.32–0.72) |
| Neither | 0.66 (0.48–0.79) | 0.67 (0.48–0.80) | 0.67 (0.48–0.80) |
| Model-calculated reliability from joint hierarchical logistic regression | |||
| Win | 0.63 | ||
| Loss | 0.63 | ||
| Neither | 0.71 | ||
| REINFORCEMENT LEARNING MODEL (N = 47) | ICC(A,1) | ICC(1) | PEARSON’S R |
| Model estimated separately per session (Figure 3) | |||
| Reward learning rate | 0.60 (0.38–0.75) | 0.60 (0.38–0.75) | 0.60 (0.38–0.76) |
| Punishment learning rate | 0.63 (0.42–0.77) | 0.62 (0.41–0.77) | 0.64 (0.43–0.78) |
| Reward sensitivity | 0.52 (0.26–0.70) | 0.50 (0.25–0.69) | 0.56 (0.33–0.73) |
| Punishment sensitivity | 0.45 (0.20–0.65) | 0.45 (0.19–0.65) | 0.46 (0.19–0.66) |
| Lapse | 0.01 (–0.08–0.14) | –0.43 (–0.64– –0.17) | 0.05 (–0.24–0.33) |
| Model-calculated reliability from joint hierarchical Bayesian model | |||
| Reward learning rate | 0.71 (0.53–0.84) | ||
| Punishment learning rate | 0.85 (0.69–0.95) | ||
| Reward sensitivity | 0.68 (0.48–0.84) | ||
| Punishment sensitivity | 0.64 (0.37–0.85) | ||
| Lapse | –0.01 (–0.65–0.68) | ||

Figure 3
Practice effects and test-retest reliability of the winning reinforcement learning model parameters derived from the four-armed bandit task. Boxplots show point estimates of the Bandit4arm_lapse model parameters in session 1 and 2, fit under separate priors (a). Scatter plots of the Bandit4arm_lapse model parameters over session 1 and 2 are presented (b). SEM: standard error of the mean. * p < 0.05.

Figure 4
Posterior predictive performance of the winning reinforcement learning model derived from the four-armed bandit task. Boxplots depicting accuracy of bandit4arm_lapse model in predicting choices (a). Model estimates from session 1 (S1) predicted future session 2 (S2) behaviour above chance (black boxplot). Both S1 and S2 model estimates also predicted behaviour on the same session significantly above chance (blue and red boxplots). Predicting future performance (session 2 data) using a participant’s own model parameter estimates was significantly better than using other participants’ S1 model parameter estimates (b) but not when comparing against the mean S1 model priors (c). SEM: standard error of the mean. * p < 0.01.

Figure 5
Basic behaviour, practice effects, and test-retest reliability of model-agnostic measures on the gambling task. Boxplots show the probability to gamble based on the trial type in session 1 and 2, with no significant session effects (a). Scatter plots of the model-agnostic measures over session 1 and 2 (b). Lightly shaded regions in Figure 5a represent within-subjects standard error of the mean (SEM). * p < 0.001.
Table 2
Reliability of model-agnostic and computational measures of the gambling task. All measures are significant at p < 0.05. Brackets represent the 95% confidence interval.
| MODEL-AGNOSTIC P(GAMBLE) MEASURES | ICC(A,1) | ICC(1) | PEARSON’S R |
|---|---|---|---|
| Summary statistics (Figure 5) | |||
| Mixed trials | 0.63 (0.43–0.78) | 0.63 (0.43–0.78) | 0.63 (0.42–0.77) |
| Gain-only trials | 0.59 (0.38–0.75) | 0.59 (0.38–0.75) | 0.60 (0.39–0.76) |
| Model-calculated reliability from joint hierarchical logistic regression | |||
| Mixed trials | 0.73 | ||
| Gain-only trials | 0.72 | ||
| PROSPECT THEORY MODEL | ICC(A,1) | ICC(1) | PEARSON’S R |
| Model estimated separately per session (Figure 6) | |||
| Loss aversion | 0.68 (0.50–0.81) | 0.68 (0.50–0.81) | 0.72 (0.55–0.83) |
| Risk aversion | 0.78 (0.55–0.89) | 0.78 (0.64–0.87) | 0.83 (0.71–0.90) |
| Inverse Temperature | 0.80 (0.64–0.89) | 0.80 (0.67–0.88) | 0.84 (0.74–0.91) |
| Model-calculated reliability from joint hierarchical Bayesian model | |||
| Loss aversion | 0.87 (0.77–0.94) | ||
| Risk aversion | 0.90 (0.83–0.95) | ||
| Inverse Temperature | 0.91 (0.85–0.96) | ||

Figure 6
Practice effects and test-retest reliability of the prospect theory model derived from the gambling task. Boxplots show point estimates of the prospect theory model parameters in session 1 and 2, fit under separate priors (a). Scatter plots of the prospect theory model parameters over session 1 and 2 are presented (b). SEM: standard error of the mean. * p < 0.05.

Figure 7
Posterior predictive performance of the prospect theory model derived from the gambling task. Boxplots depicting accuracy of prospect theory model in predicting choices (a). Session 1 (S1) model estimates predicted S1 behaviour significantly above chance (blue boxplot), as did session 2 (S2) model estimates on S2 data (red boxplot). Importantly, model parameter estimates from S1 predicted task performance from S2 above chance (black boxplot). Predicting future S2 performance using a participant’s own S1 model parameter estimates was significantly better than using other participants’ S1 model parameter estimates (b) and mean S1 model priors (c). SEM: standard error of the mean. * p < 0.001.
