
Figure 1
CONSORT (Consolidated Standards of Reporting Trials) diagram.

Figure 2
Timeline. The therapeutic interventions are shown in color. The green bars indicate the timing of the task administrations. The bar below shows the model parameters. From task administration 3 onwards, an additional appetitive Pavlovian bias was included, and from administration 4 onwards an additional aversive Pavlovian bias.

Figure 3
Experimental paradigm for Orthogonalized Go/No-go task. A) On each trial, subjects see one of four fractals. After a fixation period, a target is shown an subjects have to either go and respond to the target with the correct key (left for left stimulus, right for right stimulus), or nogo. There are 20 practice trials for target detection, followed by 240 trials (60 trials per condition), divided into nine minute sessions. Completion takes 36 minutes. Total amount that can be earned is USD 24. B) There are four trial types, and the probability of the outcomes are shown for both a go- and a nogo response for each trial type.
Table 1
Clinical variables.
| DIAGNOSES | FREQUENCY N (%) | |
|---|---|---|
| MINI | MDD current | 13 (100%) |
| MDD past | 8 (73%) | |
| Bipolar II current | 1 (9%) | |
| Anxiety comorbidity | 5 (46%) | |
| SCALES | MEAN (ST. DEV.) | |
| IDS-SR | 34.5 (8.5) | |
| GAD-7 | 9.9 (3.9) | |
| STAI | State | 43.9 (6.9) |
| Trait | 54.3 (7.6) |
Table 2
Clinical course. IDS-SR = Inventory of Depressive Symptomatology. GAD 7 = Generalized Anxiety Disorder 7 item self-report. STAI (S/T) State Trait Anxiety Inventory.
| VARIABLE | BASELINE | WEEK 4 | WEEK 7 | WEEK 10 |
|---|---|---|---|---|
| IDS-SR | 34.5 (8.5) | 24.7 (9.7) | 16.3 (7.9) | 10.9 (6.3)* |
| GAD7 | 9.9 (3.9) | 9.0 (5.3) | 6.1 (5.5) | 3.8 (3.1) |
| STAI State | 43.9 (6.9) | 41.8 (8.2) | 41.1 (8.2) | 35.3 (6.8) |
| STAI Trait | 54.3 (7.6) | 51.3 (8.0) | 48.5 (10.4) | 44.4 (11.5)* |
Table 3
Response rates.
| VARIABLE | DEFINITION | N (%) |
|---|---|---|
| Responder1 | 25% reduction IDS-SR score (baseline-Week 9) | 12 (92.0%) |
| Responder2 | 50% reduction of the IDS-SR baseline to Week 9 | 10 (76.9%) |
| Remission | IDS below 14 at Week 9 | 8 (61.5%) |

Figure 4
Computational modelling of task data. A: Individual trajectories of depression severity measurements. The grey areas show the three periods of therapy between different task administrations. B: Average probability correct in each of the four task conditions (gray bars). Simulated data from the various models is superimposed in colour. Overall choice accuracy was higher in congruent conditions (i.e., go to reward and no-go to avoid loss) than incongruent conditions (i.e., go to avoid loss and no-go to reward), T(11) = 3.58, p = 0.004. Learning accuracy was higher for go versus no-go conditions, F(1,11) = 10.91, p = 0.007, and the interaction of action-by-valence was significant, F(1,11) = 12.80, p = 0.004. C–F: Learning curves showing average probability go over the course of 60 trials in each of the four conditions. The data is shown in black, and simulated data from the various models is superimposed in colour. G: Model comparison. Left panel shows how well the data is fitted in terms of average posterior choice probability. The right panel shows the integrated BIC in comparison to the best model. This penalizes models for complexity. The best model is the one with the lowest iBIC score. Here, the most complex model fits the data sufficiently well to warrant the complexity. H–K: Comparison of data and most parsimonious model (2 Pav) over the multiple sessions. Sessions are only concatenated for display purposes.

Figure 5
Relationship between therapeutic response and sustained task parameter changes. A) Mixed effects regression matrix. Temporal improvement in depression scores was modelled as a mixture of an individual mean, a fixed improvement after the appetitive (t3) and after the aversive (t7) session, and a fixed improvement proportional to the change in behaviourally measured positive and negative Pavlovian parameters after the appetitive and aversive intervention. B) Regression weights for both the positive and negative Pavlovian change parameters were significantly negative, suggesting that an increase in both appetitive and aversive Pavlovian parameters promoted symptom reduction.

Figure 6
Reinforcement learning drives learning to evaluate activities and relates to improvements in anhedonia. a) Reward prediction errors predict changes in amount of reward predicted for engaging with an activity when reward prediction change is computed as the difference between reward predicted when the activity is planned and reward predicted immediately after it is completed. Reward prediction errors are defined as the difference between reward reported upon completing the activity and reward predicted when the activity was planned. b) Punishment prediction errors predict changes in amount of punishment predicted for engaging with an activity where punishment prediction change is computed as the difference between punishment predicted when the activity is planned and punishment predicted immediately after it is completed. Punishment prediction errors are defined as the difference between punishment reported upon completing the activity and punishment predicted when the activity was planned c) Reward prediction errors predict changes in amount of reward predicted for engaging with an activity when reward prediction change is computed as the difference between reward predictions for successive times the same activity is planned. d) Punishment prediction errors predict changes in amount of punishment predicted for engaging with an activity when punishment prediction change is computed as the difference between punishment predictions for successive times the same activity is planned. e,f) The chances of repeating the same activity two days in a row are modulated by both the reward (e) and punishment (f) reported on the first day. g) The effect of reward prediction errors on immediate reward prediction change is greater in individuals that have larger improvements on item 21, which measures capacity for pleasure or enjoyment, of the IDS-SR. h) The effect of punishment prediction errors on punishment prediction change between successive times the activity is planned lesser in individuals that have larger improvements on item 19, which measures general interest, of the IDS-SR. a-h) Points display averages of single subjects. For a-f each color corresponds to a different subject. For g-h, color corresponds to whether a subject’s item change was greater than the median. Lines show group-level predictions of mixed effects models. Error bars designate 95% intervals.
