Table 1
ORL Model and Parameter Computation.
| ORL MODEL PARAMETER | PARAMETER REPRESENTS | HIGHER VALUES INDICATE | EQUATION | COMPUTATION NOTES | |
|---|---|---|---|---|---|
| A + | Reward/ Punishment Learning Rates | The rate at which an individual updates expected value and expected outcome frequency for a given deck following gains or losses, respectively | faster learning/ more volatile updating in a gains or loss domain, respectively | Reward and punishment learning rates are estimated seperately and are shared between the EV computation (left) and the computation (below). Expected value is updated using objective outcome amount x(t). | |
| A- | |||||
| βf | Win Frequency Sensitivity | The effect of gain frequency (as opposed to outcome magnitude) on the subjective value for a given deck | greater preference for decks with a higher win frequency over objectively equivalent decks that win less often | Expected win frequency is tracked seperately from EV. The signum function (sgn(x(t)) returns 1, 0, or -1 for positive, 0, or negative outcome values on trial (t), respectively. Expected win frequency is also updated for unchosen decks (j’) on trial (t), where C is the number of possible alternative choices for the chosen deck (j) (here, 3). | |
| βp | Perseveration Tendency | The tendency to stick with a previous selection (as opposed to switching among decks), regardless of outcomes | more choice consistency, less switching | The perseverance weight of the chosen deck (j) is set to 1 on each trial (t), and then the perseverance weights decay exponentially before a choice is made on the next trial. | |
| K | Memory Decay | The extent to which an individual forgets their own history of selecting decks | greater forgetting; remembering a shorter (rather than longer) sequence of deck selections | K is a decay parameter controlling how quickly decision makers forget past deck selections. |
[i] The ORL model assumes expected value (EV), expected frequency (EF), and choice perseverance (PS) signals are integrated linearly to generate a value signal for each deck (j) at time (t) as follows:
To generate choice probabilities, the estimated value above is entered into a softmax function, where D(t) is the chosen deck at trial t as follows:
The five free parameters are computed as follows:

Figure 1
Overall Modeling Approach and Resulting Four Models. At the person-level, Models 1 and 2 used the traditional summary score (proportion good deck selected) to model gross task behavior and Models 3 and 4 used the ORL computational model to estimate trial-level task behavior in terms of five parameters (Reward Learning Rate (A+), Punishment Learning Rate (A-), Win Frequency Sensitivity (βf), Perseveration Tendency (βp), Memory Decay (K)). At the group-level, Models 1 and 3 estimated person-level metrics separately at each testing session and subsequently used these estimates in two-step test-retest correlations, and Models 2 and 4 used a generative approach to model person-level metrics (summary score or ORL parameters, respectively) across both testing sessions while simultaneously estimating, within the same hierarchical model, the test-retest associations between the model’s person-level metrics.

Figure 2
Associations between ORL Parameters and the Summary Score. Scatterplots represent the association between the Model 1 summary score, ‘percentage good deck selected’ (x-axis) and the posterior means for each of the ORL parameters (y-axis; Reward Learning Rate (A+), Punishment Learning Rate (A-), Win Frequency Sensitivity (βf), Perseveration Tendency (βp), and Memory Decay (K)), for Models 3 and 4, for each testing session. Interestingly, the influences of Reward and Punishment Learning Rates on overall performance appeared to be strengthened and attenuated, respectively, for session 2 compared to session 1.

Figure 3
Model 1 versus Model 2 Summary Scores and Test-Retest Reliability. (A) HDI plot showing the posterior distribution of Model 2 estimated test-retest reliability coefficient for θ. The 95% highest density interval of estimates is indicated by the horizontal red line, and the vertical red line indicates the posterior mean for Model 2’s estimated test-retest reliability coefficient (r = .41). The Model 1 two-step test-retest reliability coefficient (Pearson’s r) for the summary score (r = .37) is indicated by the solid black line. (B) The relationship between the Model 1 and Model 2 estimates. Model 1 data points represent observed summary score means (‘percentage good deck selected’) at each of the two testing sessions (two-step approach). Model 2 data points represent the generatively modeled person-level posterior means for θ (‘probability of good deck selection’), modeled jointly across sessions. Grey lines connect Model 1 and Model 2 estimates for each participant, demonstrating the effect of the hierarchical model pooling estimates toward group-level means. The dashed grey line represents a perfect test-retest correlation of r = 1.

Figure 4
Model 3 versus Model 4 Metrics and Test-Retest Reliability. (A) HDI plots showing the posterior distributions of the Model 4 estimated test-retest reliability coefficients for each of the ORL five free parameters. The 95% highest density intervals for Model 4 estimates are indicated by horizontal red lines, and vertical red lines indicate posterior means for the Model 4 estimated test-retest reliability coefficients (A+ r = .73; A- r = .67; K r = .78; βf r = .64; βp r = .82). The Model 3 two-step test-retest reliability coefficients (Pearson’s r; A+ r = .39; A- r = .36; K r = .52; βf r = .39; βp r = .65) are indicated by solid black lines. (B) The relationship between the Model 3 and Model 4 estimates. Model 3 data points represent person-level posterior means for the ORL parameter estimates modeled separately at each of the two testing sessions. Model 4 data points represent generatively modeled person-level posterior means for the ORL parameter estimates, modeled jointly across testing sessions (full generative approach). Grey lines connect Model 3 and Model 4 estimates for each participant, demonstrating the effect of the hierarchical model pooling individual estimates toward group-level means. Dashed grey lines represent perfect test-retest correlation of r = 1.
Table 2
Model 1 and Model 2 Construct Validity. Correlations between self-report measures and Model 1 and Model 2 summary scores. Correlations with 95% BCa CIs that do not include zero are bolded.
| SELF-REPORT COLLECTED AT SAME SESSION | MODEL 1 | MODEL 2 | ||
|---|---|---|---|---|
| PERCENTAGE GOOD DECK SELECTED | PROBABILITY OF GOOD DECK SELECTION (θ) | |||
| SESSION 1 | SESSION 2 | SESSION 1 | SESSION 2 | |
| BAS Total | –.25 [–.53, .05] | .09 [–.17, .36] | –.24 [–.52, .05] | .08 [–.18, .37] |
| BAS Drive | –.38 [–.66, –.05] | –.04 [–.34, .27] | –.37 [–.66, –.03] | –.05 [–.35, .26] |
| BAS Fun | –.10 [–.40, .20] | .13 [–.16, .40] | –.08 [–.38, .20] | .12 [–.15, .39] |
| BAS Reward Responsivity | –.14 [–.43, .13] | .13 [–.16, .39] | –.14 [–.42, .12] | .12 [–.17, .39] |
| BIS Total | –.20 [–.49, .09] | –.03 [–.34, .25] | –.19 [–.48, .12] | –.04 [–.33, .25] |
| PANAS PA | .02 [–.22, .25] | –.30 [–.50, –.06] | .01 [–.25, .24] | –.29 [–.49, –.06] |
| PANAS NA | –.13 [–.33, .07] | –.40 [–.62, –.13] | –.14 [–.34, .05] | –.40 [–.61, –.13] |
| MASQ General Distress Anxious | –.13 [–.39, .21] | –.14 [–.40, .21] | ||
| MASQ Anxious Arousal | –.19 [–.45, .08] | –.21 [–.46, .07] | ||
| MASQ General Distress Depressive | .01 [–.27, .39] | –.02 [–.29, .36] | ||
| MASQ Anhedonic Depression | .11 [–.23, .46] | .11 [–.25, .47] | ||
| SHAPS | .01 [–.24, .27] | .04 [–.21, .31] | .01 [–.24, .26] | .05 [–.21, .30] |
| PROMIS-D | .15 [–.16, .52] | .12 [–.20, .49] | ||
[i] a At session 1, the n for MASQ and PROMIS-D correlations is 46; the n for all other session 1 correlations is 48.
b At session 2, the n for all correlations is 46.
Table 3
Model 3 and Model 4 Construct Validity. Correlations between self-report measures and ORL model estimates. Correlations with 95% BCa CIs that do not include zero are bolded.
| MODEL 3 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ORL ESTIMATES (MODELED SEPARATELY AT EACH SESSION) | ||||||||||
| SELF-REPORT COLLECTED AT SAME SESSION | SESSION 1 | SESSION 2 | ||||||||
| A+ | A- | K | βf | βp | A+ | A- | K | βf | βp | |
| BAS Total | .03 [–.29, .30] | –.17 [–.52, .17] | .02 [–.35, .33] | –.11 [–.42, .19] | .08 [–.25, .41] | .01 [–.26, .27] | .15 [–.11, .41] | .12 [–.17, .43] | –.03 [–.30, .23] | .24 [–.03, .49] |
| BAS Drive | .02 [–.34, .36] | –.29 [–.60, .07] | –.05 [–.39, .24] | –.07 [–.38, .20] | .01 [–.33, .38] | .08 [–.20, .37] | .10 [–.16, .37] | .10 [–.18, .38] | .00 [–.25, .24] | .21 [–.09, .48] |
| BAS Fun | .05 [–.23, .33] | –.01 [–.36, .28] | .09 [–.25, .36] | –.11 [–.37, .16] | .14 [–.16, .42] | .02 [–.23, .28] | .18 [–.10, .43] | .13 [–.20, .47] | .07 [–.24, .35] | .17 [–.13, .40] |
| BAS Reward Responsivity | .01 [–.26, .26] | –.10 [–.43, .18] | .02 [–.36, .33] | –.10 [–.40, .18] | .06 [–.26, .35] | –.09 [–.36, .16] | .08 [–.18, .33] | .06 [–.24, .33] | –.13 [–.41, .14] | .17[–.06, .40] |
| BIS Total | .02 [–.26, .29] | –.17 [–.47, .13] | –.32 [–.59, .08] | –.20 [–.51, .09] | –.18 [–.46, .20] | .00 [–.34, .31] | .08 [–.17, .30] | .02 [–.28, .33] | –.21 [–.49, .07] | –.01 [–.27, .25] |
| PANAS PA | –.08 [–.36, .18] | –.20 [–.44, .07] | .07 [–.24, .40] | –.11 [–.37, .20] | .14 [–.15, .42] | .31 [.03, .50] | .06 [–.25, .45] | –.18 [–.43, .11] | –.07 [–.29, .18] | .00 [–.33, .37] |
| PANAS NA | .09 [–.17, .31] | –.09 [–.31, .12] | –.04 [–.36, .34] | –.29 [–.52, –.01] | .08 [–.20, .37] | .40 [.16, .60] | .14 [–.15, .40] | –.05 [–.30, .27] | –.01 [–.37, .26] | .00 [–.24, .29] |
| MASQ General Distress Anxious | .07 [–.23, .34] | .10 [–.20, .36] | –.07 [–.35, .25] | –.16 [–.39, .14] | –.06 [–.35, .24] | |||||
| MASQ Anxious Arousal | .13 [–.14, .38] | .12 [–.12, .37] | –.06 [–.36, .31] | –.11 [–.37, .28] | –.11 [–.40, .19] | |||||
| MASQ General Distress Depressive | .05 [–.25, .31] | .28 [.02, .60] | .07 [–.26, .45] | –.06 [–.49, .27] | –.13 [–.43, .21] | |||||
| MASQ Anhedonic Depression | –.08 [–.39, .23] | .18 [–.15, .51] | .03 [–.29, .36] | .01 [–.33, .27] | –.05 [–.37, .27] | |||||
| SHAPS | –.09 [–.35, .20] | –.12 [–.35, .16] | .11 [–.21, .42] | –.10 [–.35, .16] | .27 [–.01, .51] | –.15 [–.41, .14] | –.04 [–.30, .19] | .15 [–.15, .42] | –.21 [–.43, .04] | .24 [–.04, .45] |
| PROMIS-D | –.04 [–.35, .25] | .26 [–.03, .53] | .16 [–.17, .48] | –.21 [–.60, .08] | .07 [–.25, .40] | |||||
| MODEL 4 ORL ESTIMATES (MODELED JOINTLY ACROSS SESSIONS) | ||||||||||
| SELF-REPORT COLLECTED AT SAME SESSION | SESSION 1 | SESSION 2 | ||||||||
| A+ | A- | K | βf | βp | A+ | A- | K | βf | βp | |
| BAS Total | .00 [–.33, .31] | –.13 [–.48, .19] | .04 [–.28, .34] | –.12 [–.41, .19] | .10 [–.24, .40] | –.02 [–.30, .24] | .09 [–.19, .38] | .15 [–.19, .46] | –.08 [–.36, .20] | .26 [–.03, .50] |
| BAS Drive | –.01 [–.36, .33] | –.22 [–.53, .12] | .01 [–.30, .31] | –.11 [–.41, .17] | .05 [–.28, .36] | .04 [–.24, .33] | .02 [–.27, .32] | .10 [–.19, .39] | –.02 [–.28, .25] | .22 [–.08, .48] |
| BAS Fun | .07 [–.22, .36] | .03 [–.31, .32] | .09 [–.19, .35] | –.06 [–.33, .21] | .12 [–.17, .38] | –.01 [–.25, .27] | .16 [–.13, .45] | .18 [–.16, .50] | .05 [–.28, .33] | .19 [–.11, .43] |
| BAS Reward Responsivity | –.05 [–.36, .24] | –.13 [–.43, .19] | .00 [–.33, .31] | –.13 [–.43, .15] | .09 [–.21, .37] | –.10 [–.37, .14] | .02 [–.24, .30] | .06 [–.24, .35] | –.20 [–.47, .08] | .19 [–.08, .40] |
| BIS Total | –.07 [–.35, .23] | –.18 [–.46, .12] | –.28 [–.55, .09] | –.24 [–.53, .03] | –.16 [–.41, .18] | .00 [–.34, .32] | .00 [–.24, .24] | –.04 [–.32, .27] | –.28 [–.54, .00] | .00 [–.25, .29] |
| PANAS PA | –.01 [–.27, .27] | –.15 [–.41, .13] | .07 [–.24, .38] | –.14 [–.40, .20] | .15 [–.14, .43] | .27 [.02, .49] | .02 [–.28, .34] | –.16 [–.42, .14] | –.13 [–.34, .13] | .01 [–.33, .36] |
| PANAS NA | .24 [–.04, .52] | .05 [–.22, .29] | –.08 [–.37, .27] | –.28 [–.52, .01] | .05 [–.23, .31] | .41 [.15, .62] | .11 [–.17, .36] | .01 [–.30, .33] | –.11 [–.49, .15] | .05 [–.20, .34] |
| MASQ General Distress Anxious | .20 [–.10, .46] | .20 [–.10, .44] | –.18 [–.43, .14] | –.12 [–.36, .18] | –.10 [–.36, .18] | |||||
| MASQ Anxious Arousal | .28 [–.01, .55] | .22 [–.06, .44] | –.17 [–.43, .20] | –.08 [–.34, .31] | –.15 [–.40, .14] | |||||
| MASQ General Distress Depressive | .30 [–.02, .56] | .40 [.17, .63] | –.06 [–.37, .27] | .01 [–.45, .35] | –.15 [–.44, .19] | |||||
| MASQ Anhedonic Depression | .00 [–.29, .29] | .21 [–.06, .50] | –.01 [–.31, .29] | .05 [–.31, .32] | –.10 [–.40, .21] | |||||
| SHAPS | –.13 [–.41, .25] | –.11 [–.40, .18] | .17 [–.16, .46] | –.13 [–.38, .15] | .30 [.02, .52] | –.14 [–.40, .13] | .06 [–.24, .35] | .11 [–.17, .45] | –.24 [–.45, .02] | .22 [–.06, .44] |
| PROMIS-D | .16 [–.18, .45] | .31 [.05, .54] | .05 [–.24, .39] | –.17 [–.60, .11] | .00 [–.28, .34] | |||||
[i] a At session 1, the n for MASQ and PROMIS-D correlations is 46; the n for all other session 1 correlations is 48.
b At session 2, the n for all correlations is 46.
