Signatures of Perseveration and Heuristic-Based Directed Exploration in Two-Step Sequential Decision Task Behaviour

Angela Mariele Brands; David Mathar; Jan Peters

doi:10.5334/cpsy.101

Figures & Tables

Outline of the two-step task (TST). Transition probabilities from the first stage to the second stage remain the same in both versions of the task. The second stage with a green frame depicts the modified task version employed in data set *data1* (Mathar et al., 2022): after making a S2-choice subjects receive feedback in the form of continuous reward magnitudes (rounded to the next integer). The lower S2 stage (orange frame) depicts the classic version (used in data set *data2*; Gillan et al., 2016), in which the S2 feedback is presented in a binary fashion (rewarded vs. unrewarded based on fluctuating reward probabilities).

Table 1

Free and fixed parameters for all models.

MODEL	FREE PARAMETERS
Q	α, α₂, α₃, ß_MB, ß_MF, ß_persev, ß₂
Q + BANDIT	α, α₂, α₃, ß_MB, ß_MF, ß_persev, ß₂, φ
Q + TRIAL	α, α₂, α₃, ß_MB, ß_MF, ß_persev, ß₂, φ
Q + HOP	α, α₂, α₃, α_HOP, ß_MB, ß_MF, ß_persev, ß₂
Q + BANDIT + HOP	α, α₂, α₃, α_HOP, ß_MB, ß_MF, ß_persev, ß₂, φ
Q + TRIAL + HOP	α, α₂, α₃, α_HOP, ß_MB, ß_MF, ß_persev, ß₂, φ

[i] Note. Q refers to the basic hybrid model with a FOP term. BANDIT/TRIAL = added first-stage exploration bonus based on respective counter heuristic (c.f. Computational Models section); φ: parameter that scales the exploration bonus. Note that this parameter remains the same for both exploration bonus variants, regardless of the specific formalisation of uncertainty estimates in a given model.

Table 2

Results from regression analyses of S1 choice repetition probability.

		ESTIMATE	95% CI	z-VALUE	p-VALUE
data1	Intercept	1.35	[1.12; 1.58]	11.48	<.01
	Reward	0.11	[0.05; 0.18]	3.47	<.01
	Transition	–0.07	[–0.14; –0.01]	–2.14	.03
	Reward*Transition	0.47	[0.36; 0.59]	8.10	<.01
data2	Intercept	1.73	[1.53; 1.94]	16.68	<.01
	Reward	0.64	[0.51; 0.77]	9.89	<.01
	Transition	0.02	[–0.03; 0.08]	0.81	.42
	Reward*Transition	0.16	[0.07; 0.24]	3.76	<.01

[i] Note. Reward: main effect of reward type (unrewarded vs. rewarded), commonly interpreted as an indicator for MF control; Transition: main effect of transition type (rare vs. common); Reward*Transition: interaction of Reward and Transition type, commonly interpreted as an indicator for MB control.

Stay-Probabilities of S1 choices and difference scores. Upper panel: MB and MF difference scores as defined by Eppinger et al. (2013), bar heights depict mean scores over all participants, error bars show the standard error. Lower panel: Probabilities for S1 choice repetition as a function of reward (rew+: rewarded; rew-: unrewarded) and transition type (common/rare) of the preceding trial. The left plots (green, A) show results from data1; the right plots (orange) show results from data2.

*Model Comparison Results via the Widely Applied Information Criterion (WAIC) for all Q Models (c.f. Table 1*). The upper/lower panel (green/orange bar plots) refer to data1 and data2, respectively. *Bandit/Trial* refer to the model variants with added heuristic-based exploration bonus using stimulus identity/recency, respectively. HOP: model variants with higher order perseveration term; all other versions use a classic FOP term instead (Q, Q+BANDIT, Q+TRIAL).

Table 3

Results from model comparison of QL-models with a HOP extension using leave-one-out cross-validation (LOO).

DATA SET	MODEL	–elpd_diff	se_diff	WAIC
data1	Q + HOP	–28.9	9.6	17750.21
	Q + BANDIT + HOP	–4.0	6.2	17715.03
	Q + TRIAL + HOP	0.0	0.0	17714.46
data2	Q + HOP	–13.8	8.3	35905.27
	Q + BANDIT + HOP	–11.2	6.2	35887.17
	Q + TRIAL + HOP	0.0	0.0	35871.03

[i] Note. The difference in the expected log pointwise predictive density (elpddiff) and standard error of the difference (sediff). These values show the results of a model comparison using LOO estimates. Each model is compared to the preferred model Q + TRIAL + HOP), as there is no difference between the best-fitting model and itself, values in the first column are always zero.

Table 4

Proportion of correct S1 choice predictions by the winning model Q +HOP.

DATA SET	MIN	25^th PERCENTILE	MEDIAN	MEAN	75^th PERCENTILE	MAX
data1	.519	.638	.764	.748	.841	.916
data2	.505	.687	.767	.754	.829	.977

[i] Note. Summary statistics are based on the comparison of individuals’ choices with model predictions, which were pooled and averaged for each data set.

Posterior Distributions of Group-Level Mean Parameters From Model Q + HOP. Solid gray lines show the 95% highest density interval (HDI) and dots depict the point-estimate of the mean. Panels A and B (green and orange plots) show results on the basis of data sets data1 and data2, respectively.

Probabilities of S1 choice repetition as a function of reward and transition type. Y-axis: Stay probabilities for 1st stage choices; data: empirical stay probabilities from data sets data1 (panel A; green) and data2 (panel B; orange). simulation: stay-probabilities from N = 8000 simulated choice sequences per subject, derived from the winning model (Q + HOP).; rew+/–: previous trial was rewarded (+) or unrewarded (–).; common/rare: previous trial followed a common/rare transition, respectively. Error bars in the simulation plots depict the 95% HDI over 8000 simulated data sets.

Table 5

Posterior Estimates of Group-Level Parameters from Model Q + HOP.

PARAMETER	data1		data2
PARAMETER	MEDIAN_x	95%HDI	MEDIAN_x	95%HDI
α₁	0.38	[0.10, 0.83]	0.59	[0.51, 0.67]
α₂	0.82	[0.64, 0.96]	0.45	[0.38, 0.53]
α₃	0.99	[0.97, 1.00]	0.76	[0.68, 0.83]
α_HOP	0.57	[0.35, 0.82]	0.98	[0.95, 1.00]
ß_mb	10.59	[7.65, 13.33]	2.80	[1.44, 4.11]
ß_mf	1.39	[0.87, 1.91]	3.00	[2.46, 3.52]
ß_HOP	1.44	[1.20, 1.65]	1.82	[1.58, 2.10]
ß₂	9.76	[8.26, 11.49]	6.84	[5.97, 7.72]

[i] Note. Posterior point-estimates of hyperparameter medians and corresponding 95% highest density intervals (95%HDI) for data1 and data2 from the winning model (Q + HOP) for all subject-level parameters × listed in the first column.

Associations of model-agnostic and model-derived indices of MB and MF control for data1 **(a)** and data2 **(b)**. Empty tiles (left panels) indicate non-significant associations. ß_rew, ß_trans, ß_rew:trans: regression weights for main effects of reward, transition type and their interaction; *MB_diff*, *MF_diff*: differences scores of MB and MF influences on S1 stay probabilities respectively; ß_MB, ß_MF: MB and MF S1 choice parameters from the winning model; ß_HOP: S1 higher order perseveration parameter; mean reward: mean reward gained throughout TST (data1: 300 trials, data2: 200 trials). Right panel: association of model-derived MB (ß_MB) and habit step-size parameter ∝_HOP with mean reward. Circles depict individual participants. Plots in panel A (top row, green) are based on data1, plots in panel B (bottom row, orange) are based on data2.

Posterior Density Estimates Based on The Full Sample of Data2. Group-level parameter estimates from model variant Q+HOP derived from fitting the full sample of the original publication (Gillan et al., 2016; N = 548; Experiment 1). The lower panel of Figure 4 shows corresponding results based on data2 (N = 100). Grey dots indicate the mean point-estimate, bars depict the 95%-HDI.

Signatures of Perseveration and Heuristic-Based Directed Exploration in Two-Step Sequential Decision Task Behaviour

Figures & Tables

Figure 1

Table 1

Table 2

Figure 2

Figure 3

Table 3

Table 4

Figure 4

Figure 5

Table 5

Figure 6

Figure 7

Paradigm

My account