Have a personal or library account? Click to login
Signatures of Perseveration and Heuristic-Based Directed Exploration in Two-Step Sequential Decision Task Behaviour Cover

Signatures of Perseveration and Heuristic-Based Directed Exploration in Two-Step Sequential Decision Task Behaviour

Open Access
|Feb 2025

Figures & Tables

cpsy-9-1-101-g1.png
Figure 1

Outline of the two-step task (TST). Transition probabilities from the first stage to the second stage remain the same in both versions of the task. The second stage with a green frame depicts the modified task version employed in data set data1 (Mathar et al., 2022): after making a S2-choice subjects receive feedback in the form of continuous reward magnitudes (rounded to the next integer). The lower S2 stage (orange frame) depicts the classic version (used in data set data2; Gillan et al., 2016), in which the S2 feedback is presented in a binary fashion (rewarded vs. unrewarded based on fluctuating reward probabilities).

Table 1

Free and fixed parameters for all models.

MODELFREE PARAMETERS
Qα, α2, α3, ßMB, ßMF, ßpersev, ß2
Q + BANDITα, α2, α3, ßMB, ßMF, ßpersev, ß2, φ
Q + TRIALα, α2, α3, ßMB, ßMF, ßpersev, ß2, φ
Q + HOPα, α2, α3, αHOP, ßMB, ßMF, ßpersev, ß2
Q + BANDIT + HOPα, α2, α3, αHOP, ßMB, ßMF, ßpersev, ß2, φ
Q + TRIAL + HOPα, α2, α3, αHOP, ßMB, ßMF, ßpersev, ß2, φ

[i] Note. Q refers to the basic hybrid model with a FOP term. BANDIT/TRIAL = added first-stage exploration bonus based on respective counter heuristic (c.f. Computational Models section); φ: parameter that scales the exploration bonus. Note that this parameter remains the same for both exploration bonus variants, regardless of the specific formalisation of uncertainty estimates in a given model.

Table 2

Results from regression analyses of S1 choice repetition probability.

ESTIMATE95% CIz-VALUEp-VALUE
data1Intercept1.35[1.12; 1.58]11.48<.01
Reward0.11[0.05; 0.18]3.47<.01
Transition–0.07[–0.14; –0.01]–2.14.03
Reward*Transition0.47[0.36; 0.59]8.10<.01
data2Intercept1.73[1.53; 1.94]16.68<.01
Reward0.64[0.51; 0.77]9.89<.01
Transition0.02[–0.03; 0.08]0.81.42
Reward*Transition0.16[0.07; 0.24]3.76<.01

[i] Note. Reward: main effect of reward type (unrewarded vs. rewarded), commonly interpreted as an indicator for MF control; Transition: main effect of transition type (rare vs. common); Reward*Transition: interaction of Reward and Transition type, commonly interpreted as an indicator for MB control.

cpsy-9-1-101-g2.png
Figure 2

Stay-Probabilities of S1 choices and difference scores. Upper panel: MB and MF difference scores as defined by Eppinger et al. (2013), bar heights depict mean scores over all participants, error bars show the standard error. Lower panel: Probabilities for S1 choice repetition as a function of reward (rew+: rewarded; rew-: unrewarded) and transition type (common/rare) of the preceding trial. The left plots (green, A) show results from data1; the right plots (orange) show results from data2.

cpsy-9-1-101-g3.png
Figure 3

Model Comparison Results via the Widely Applied Information Criterion (WAIC) for all Q Models (c.f. Table 1). The upper/lower panel (green/orange bar plots) refer to data1 and data2, respectively. Bandit/Trial refer to the model variants with added heuristic-based exploration bonus using stimulus identity/recency, respectively. HOP: model variants with higher order perseveration term; all other versions use a classic FOP term instead (Q, Q+BANDIT, Q+TRIAL).

Table 3

Results from model comparison of QL-models with a HOP extension using leave-one-out cross-validation (LOO).

DATA SETMODEL–elpddiffsediffWAIC
data1Q + HOP–28.99.617750.21
Q + BANDIT + HOP–4.06.217715.03
Q + TRIAL + HOP0.00.017714.46
data2Q + HOP–13.88.335905.27
Q + BANDIT + HOP–11.26.235887.17
Q + TRIAL + HOP0.00.035871.03

[i] Note. The difference in the expected log pointwise predictive density (elpddiff) and standard error of the difference (sediff). These values show the results of a model comparison using LOO estimates. Each model is compared to the preferred model Q + TRIAL + HOP), as there is no difference between the best-fitting model and itself, values in the first column are always zero.

Table 4

Proportion of correct S1 choice predictions by the winning model Q +HOP.

DATA SETMIN25th PERCENTILEMEDIANMEAN75th PERCENTILEMAX
data1.519.638.764.748.841.916
data2.505.687.767.754.829.977

[i] Note. Summary statistics are based on the comparison of individuals’ choices with model predictions, which were pooled and averaged for each data set.

cpsy-9-1-101-g4.png
Figure 4

Posterior Distributions of Group-Level Mean Parameters From Model Q + HOP. Solid gray lines show the 95% highest density interval (HDI) and dots depict the point-estimate of the mean. Panels A and B (green and orange plots) show results on the basis of data sets data1 and data2, respectively.

cpsy-9-1-101-g5.png
Figure 5

Probabilities of S1 choice repetition as a function of reward and transition type. Y-axis: Stay probabilities for 1st stage choices; data: empirical stay probabilities from data sets data1 (panel A; green) and data2 (panel B; orange). simulation: stay-probabilities from N = 8000 simulated choice sequences per subject, derived from the winning model (Q + HOP).; rew+/–: previous trial was rewarded (+) or unrewarded (–).; common/rare: previous trial followed a common/rare transition, respectively. Error bars in the simulation plots depict the 95% HDI over 8000 simulated data sets.

Table 5

Posterior Estimates of Group-Level Parameters from Model Q + HOP.

PARAMETERdata1data2
MEDIANx95%HDIMEDIANx95%HDI
α10.38[0.10, 0.83]0.59[0.51, 0.67]
α20.82[0.64, 0.96]0.45[0.38, 0.53]
α30.99[0.97, 1.00]0.76[0.68, 0.83]
αHOP0.57[0.35, 0.82]0.98[0.95, 1.00]
ßmb10.59[7.65, 13.33]2.80[1.44, 4.11]
ßmf1.39[0.87, 1.91]3.00[2.46, 3.52]
ßHOP1.44[1.20, 1.65]1.82[1.58, 2.10]
ß29.76[8.26, 11.49]6.84[5.97, 7.72]

[i] Note. Posterior point-estimates of hyperparameter medians and corresponding 95% highest density intervals (95%HDI) for data1 and data2 from the winning model (Q + HOP) for all subject-level parameters × listed in the first column.

cpsy-9-1-101-g6.png
Figure 6

Associations of model-agnostic and model-derived indices of MB and MF control for data1 (a) and data2 (b). Empty tiles (left panels) indicate non-significant associations. ßrew, ßtrans, ßrew:trans: regression weights for main effects of reward, transition type and their interaction; MBdiff, MFdiff: differences scores of MB and MF influences on S1 stay probabilities respectively; ßMB, ßMF: MB and MF S1 choice parameters from the winning model; ßHOP: S1 higher order perseveration parameter; mean reward: mean reward gained throughout TST (data1: 300 trials, data2: 200 trials). Right panel: association of model-derived MB (ßMB) and habit step-size parameter ∝HOP with mean reward. Circles depict individual participants. Plots in panel A (top row, green) are based on data1, plots in panel B (bottom row, orange) are based on data2.

cpsy-9-1-101-g7.png
Figure 7

Posterior Density Estimates Based on The Full Sample of Data2. Group-level parameter estimates from model variant Q+HOP derived from fitting the full sample of the original publication (Gillan et al., 2016; N = 548; Experiment 1). The lower panel of Figure 4 shows corresponding results based on data2 (N = 100). Grey dots indicate the mean point-estimate, bars depict the 95%-HDI.

DOI: https://doi.org/10.5334/cpsy.101 | Journal eISSN: 2379-6227
Language: English
Submitted on: Jun 21, 2023
Accepted on: Dec 17, 2024
Published on: Feb 11, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Angela Mariele Brands, David Mathar, Jan Peters, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.