
Figure 1
(A) Schematic of the Pavlovian go/no-go task. On each trial, a robot entered the ‘scanner’ from the left of screen, prompting a response (go or no-go) from the participant during a response window (Experiment 1: 1.5 seconds; Experiment 2: 1.3 seconds). The outcome (number of points won or lost) was subsequently presented on the scanner display (Experiment 1: 1.0 seconds; Experiment 2: 1.2 seconds), followed by an inter-trial interval animation (1 second) in which the conveyor belt carried the old robot out of view and a new robot into the scanner. The color of the scanner light denoted outcome domain (e.g., blue denoting reward and red denoting punishment). (B) The four trial types, produced by a factorial combination of outcome domain (rewarding, punishing) and correct action (go, no-go). (C) Outcome probabilities for each outcome domain following a correct or incorrect response. Correct responses yielded the better of the two possible outcomes with 80% chance. (D) Trial composition. In Experiment 1, participants saw 8 total robots (two of each trial type), each presented for 30 trials (240 total trials). In Experiment 2, participants saw 24 total robots (6 of each trial type), each for 8, 10, or 12 trials (240 total trials).

Figure 2
Large practice effects on the standard Pavlovian go/no-go task in Experiment 1. (A) Group-averaged learning curves for each trial type and session. Shaded regions indicate 95% bootstrapped confidence intervals. (B) Group-averaged performance for each session. Performance measures from left-to-right: Correct responses, or overall accuracy; Go bias, or difference in accuracy between Go and No-Go trials; Congruence effect, or difference in accuracy between congruent (GW, NGAL) and incongruent (NGW, GAL) trials; and Feedback sensitivity, or the difference in accuracy on trials following veridical and sham feedback. Behavior on the first session was significantly different from all other sessions on all measures. ** Denotes significant pairwise difference (p<0.05, corrected for multiple comparisons). (C) Distribution of correct responses across sessions by trial type. Percentage of participants, for each session and trial type, exhibiting at- or below-chance performance ( response accuracy; grey), intermediate performance ( response accuracy; light blue), or near-perfect performance ( response accuracy; dark blue). Across sessions, performance improved on all trial types that were not already close to ceiling on the first session.
Table 1
Model comparison collapsing across sessions. Accuracy = trial-level choice prediction accuracy between observed and model-predicted Go responses. PSIS-LOO = approximate leave-one-out cross-validation scores presented in deviance scale (smaller numbers indicate better fit). ΔPSIS-LOO = difference in PSIS-LOO values between each model and the best-fitting model (M7).
| MODEL | PARAMETERS | ACCURACY | PSIS-LOO | PSIS-LOO (se) |
|---|---|---|---|---|
| M1 | β,η | 87.5% | –151457.9 | –5602.6 (68.3) |
| M2 | 89.0% | –154011.9 | –3048.6 (51.2) | |
| M3 | 89.8% | –155817.8 | –1242.7 (31.3) | |
| M4 | 89.8% | –156261.6 | –798.8 (22.6) | |
| M5 | 89.9% | –156265.9 | –794.6 (20.7) | |
| M6 | 89.9% | –156401.8 | –658.6 (18.8) | |
| M7 | 90.1% | –157060.5 | – |

Figure 3
Reinforcement learning model parameters in Experiment 1 show evidence of practice effects and low reliability. (A) Group-level model parameters for each session. Error bars indicate 95% Bayesian confidence intervals (CIs). ** Denotes pairwise comparison where 95% CI of the difference excludes zero. (B) Test-retest reliability estimates for each model parameter. Dotted lines indicate average across pairs of sessions. Shaded region indicates conventional range of acceptable reliability (ρ ≥ 0.7). (C) Test-retest reliability estimates for each model parameter using ICC. Dotted lines indicate average across the three sessions. Shaded region indicates conventional range of good reliability (rICC ≥ 0.6).

Figure 4
Smaller or no practice effects on the modified Pavlovian go/no-go task in Experiment 2. (A) Group-averaged learning curves for each trial type and session. Shaded regions indicate 95% bootstrapped confidence intervals. (B) Group-averaged performance for each session. Performance indices from left-to-right: Correct responses, or overall accuracy; Go bias, or difference in accuracy between Go and No-Go trials; Congruency effect, or difference in accuracy between Pavlovian congruent (GW, NGAL) and incongruent (NGW, GAL) trials; and Feedback sensitivity, or the difference in accuracy on trials following veridical and sham feedback. ** Denotes significant pairwise difference (p<0.05, corrected for multiple comparisons). (C) The percentage of participants, for each session and trial type, exhibiting at- or below-chance performance ( response accuracy; grey), intermediate performance ( and response accuracy; light blue), or near-perfect performance ( response accuracy; dark blue).
Table 2
Model comparison collapsing across sessions. Accuracy = trial-level choice prediction accuracy between observed and model-predicted Go responses. PSIS-LOO = approximate leave-one-out cross-validation presented in deviance scale (smaller numbers indicate better fit). ΔPSIS-LOO = difference in PSIS-LOO values between each model and the best-fitting model (M7).
| MODEL | PARAMETERS | ACCURACY | PSIS-LOO | PSIS-LOO (se) |
|---|---|---|---|---|
| M1 | β,η | 72.9% | –95806.3 | –6205.2 (73.2) |
| M2 | 76.5% | –99616.0 | –2395.5 (48.9) | |
| M3 | 77.6% | –101283.0 | –728.5 (28.2) | |
| M4 | 77.5% | –101422.4 | –589.0 (21.1) | |
| M5 | 77.7% | –101519.0 | –492.4 (19.1) | |
| M6 | 77.8% | –101548.7 | –462.7 (17.2) | |
| M7 | 78.1% | –102011.4 | – |

Figure 5
Reinforcement learning model parameters in Experiment 2 show improved stability and reliability. (A) Group-level model parameters for each session. Error bars indicate 95% Bayesian confidence intervals (CIs). ** Denotes pairwise comparison where 95% CI of the difference excludes zero. (B) Test-retest reliability estimates for each model parameter. Filled circles denote estimates for Experiment 2; open circles denote estimates from Experiment 1, for comparison. Grey vertical lines show the change in reliability across experiments. Dotted lines indicates average reliability for Experiment 2. Shaded region indicates conventional range of acceptable reliability (ρ ≥ 0.7). (C) Test-retest reliability estimates for each model parameter using ICC. Dotted lines indicate average across pairs of sessions. Shaded region indicates conventional range of good reliability (rICC ≥ 0.6).
