Have a personal or library account? Click to login
Reliability of Decision-Making and Reinforcement Learning Computational Parameters Cover

Reliability of Decision-Making and Reinforcement Learning Computational Parameters

Open Access
|Feb 2023

Figures & Tables

cpsy-7-1-86-g1.png
Figure 1

Four-armed bandit and gambling task. Example trial of the four-armed bandit task (a). On each trial, participants chose one out of four bandits and received one out of four possible outcomes: reward (green token), punishment (red token), neither reward nor punishment (empty box) or both reward and punishment (red and green token). An example of the win and loss probabilities fluctuating independently over time within one of the boxes (b). On each gambling task trial, participants chose between a 50–50 gamble and a sure (guaranteed amount of points) option (c). Trials were either mixed gambles (50–50 chance of winning or losing points or sure option of 0 points) or gain-only trials (50–50 chance of winning or receiving nothing or sure gain).

cpsy-7-1-86-g2.png
Figure 2

Basic behaviour, practice effects, and test-retest reliability of model-agnostic measures on the four-armed bandit task. Boxplots of the four-armed bandit task showing probability to stay after a certain outcome in session 1 and 2 (a). The probability to stay was significantly different after each outcome type (Loss<Neither<Win) but no clear practice effect was evident. Scatter plots of the model-agnostic measures comparing behaviour on two testing sessions approximately 2 weeks apart (b). Lightly shaded regions in Figure 2a represent within-subjects standard error of the mean (SEM). * p < 0.001.

Table 1

Reliability of model-agnostic and computational measures of the four-armed bandit task. All measures but the lapse parameter are significant at p < 0.05. Brackets represent the 95% confidence interval.

MODEL-AGNOSTIC P(STAY) MEASURES (N = 50)ICC(A,1)ICC(1)PEARSON’S R
Summary statistics (Figure 2)
Win0.46 (0.21–0.65)0.46 (0.21–0.65)0.46 (0.20–0.65)
Loss0.54 (0.32–0.71)0.54 (0.31–0.71)0.55 (0.32–0.72)
Neither0.66 (0.48–0.79)0.67 (0.48–0.80)0.67 (0.48–0.80)
Model-calculated reliability from joint hierarchical logistic regression
Win0.63
Loss0.63
Neither0.71
REINFORCEMENT LEARNING MODEL (N = 47)ICC(A,1)ICC(1)PEARSON’S R
Model estimated separately per session (Figure 3)
Reward learning rate0.60 (0.38–0.75)0.60 (0.38–0.75)0.60 (0.38–0.76)
Punishment learning rate0.63 (0.42–0.77)0.62 (0.41–0.77)0.64 (0.43–0.78)
Reward sensitivity0.52 (0.26–0.70)0.50 (0.25–0.69)0.56 (0.33–0.73)
Punishment sensitivity0.45 (0.20–0.65)0.45 (0.19–0.65)0.46 (0.19–0.66)
Lapse0.01 (–0.08–0.14)–0.43 (–0.64– –0.17)0.05 (–0.24–0.33)
Model-calculated reliability from joint hierarchical Bayesian model
Reward learning rate0.71 (0.53–0.84)
Punishment learning rate0.85 (0.69–0.95)
Reward sensitivity0.68 (0.48–0.84)
Punishment sensitivity0.64 (0.37–0.85)
Lapse–0.01 (–0.65–0.68)
cpsy-7-1-86-g3.png
Figure 3

Practice effects and test-retest reliability of the winning reinforcement learning model parameters derived from the four-armed bandit task. Boxplots show point estimates of the Bandit4arm_lapse model parameters in session 1 and 2, fit under separate priors (a). Scatter plots of the Bandit4arm_lapse model parameters over session 1 and 2 are presented (b). SEM: standard error of the mean. * p < 0.05.

cpsy-7-1-86-g4.png
Figure 4

Posterior predictive performance of the winning reinforcement learning model derived from the four-armed bandit task. Boxplots depicting accuracy of bandit4arm_lapse model in predicting choices (a). Model estimates from session 1 (S1) predicted future session 2 (S2) behaviour above chance (black boxplot). Both S1 and S2 model estimates also predicted behaviour on the same session significantly above chance (blue and red boxplots). Predicting future performance (session 2 data) using a participant’s own model parameter estimates was significantly better than using other participants’ S1 model parameter estimates (b) but not when comparing against the mean S1 model priors (c). SEM: standard error of the mean. * p < 0.01.

cpsy-7-1-86-g5.png
Figure 5

Basic behaviour, practice effects, and test-retest reliability of model-agnostic measures on the gambling task. Boxplots show the probability to gamble based on the trial type in session 1 and 2, with no significant session effects (a). Scatter plots of the model-agnostic measures over session 1 and 2 (b). Lightly shaded regions in Figure 5a represent within-subjects standard error of the mean (SEM). * p < 0.001.

Table 2

Reliability of model-agnostic and computational measures of the gambling task. All measures are significant at p < 0.05. Brackets represent the 95% confidence interval.

MODEL-AGNOSTIC P(GAMBLE) MEASURESICC(A,1)ICC(1)PEARSON’S R
Summary statistics (Figure 5)
Mixed trials0.63 (0.43–0.78)0.63 (0.43–0.78)0.63 (0.42–0.77)
Gain-only trials0.59 (0.38–0.75)0.59 (0.38–0.75)0.60 (0.39–0.76)
Model-calculated reliability from joint hierarchical logistic regression
Mixed trials0.73
Gain-only trials0.72
PROSPECT THEORY MODELICC(A,1)ICC(1)PEARSON’S R
Model estimated separately per session (Figure 6)
Loss aversion0.68 (0.50–0.81)0.68 (0.50–0.81)0.72 (0.55–0.83)
Risk aversion0.78 (0.55–0.89)0.78 (0.64–0.87)0.83 (0.71–0.90)
Inverse Temperature0.80 (0.64–0.89)0.80 (0.67–0.88)0.84 (0.74–0.91)
Model-calculated reliability from joint hierarchical Bayesian model
Loss aversion0.87 (0.77–0.94)
Risk aversion0.90 (0.83–0.95)
Inverse Temperature0.91 (0.85–0.96)
cpsy-7-1-86-g6.png
Figure 6

Practice effects and test-retest reliability of the prospect theory model derived from the gambling task. Boxplots show point estimates of the prospect theory model parameters in session 1 and 2, fit under separate priors (a). Scatter plots of the prospect theory model parameters over session 1 and 2 are presented (b). SEM: standard error of the mean. * p < 0.05.

cpsy-7-1-86-g7.png
Figure 7

Posterior predictive performance of the prospect theory model derived from the gambling task. Boxplots depicting accuracy of prospect theory model in predicting choices (a). Session 1 (S1) model estimates predicted S1 behaviour significantly above chance (blue boxplot), as did session 2 (S2) model estimates on S2 data (red boxplot). Importantly, model parameter estimates from S1 predicted task performance from S2 above chance (black boxplot). Predicting future S2 performance using a participant’s own S1 model parameter estimates was significantly better than using other participants’ S1 model parameter estimates (b) and mean S1 model priors (c). SEM: standard error of the mean. * p < 0.001.

DOI: https://doi.org/10.5334/cpsy.86 | Journal eISSN: 2379-6227
Language: English
Submitted on: Nov 15, 2021
Accepted on: Jan 23, 2023
Published on: Feb 8, 2023
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2023 Anahit Mkrtchian, Vincent Valton, Jonathan P. Roiser, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.