
Figure 1.
An example open field environment. A) A simple deterministic gridworld with two terminal states: one rewarding (blue) and one aversive (red). B–D) States colored by their value under different levels of pessimism, with arrows showing an optimal trajectory. In B, for an optimistic agent (w = 1), all states (other than the harmful state) take on positive value. In C, for a pessimistic agent (w = 0.5), negative value spreads from the source to antecedent states. In D, with increasing pessimism (w = 0), the extent of the spread grows worse, and the return-optimizing trajectory becomes more distorted and avoidant. (Parameters: γ = 0.95).

Figure 2.
The balloon analog risk task (BART; Lejuez et al., 2002). A) The risk of balloon burst (point loss) increases with each pump and does so earlier for the high-risk (red) than low-risk (blue) balloons. (For full rules of the task, see Methods). The optimal policy (number of pumps) under increasingly pessimistic valuation is presented for B) low-risk and C) high-risk balloons. The optimistic agent (w = 1) prefers a policy reflecting the true environmental risk. The moderate (w = 0.6) and strongly (w = 0.2) pessimistic agents cash out earlier, as is observed in anxious individuals. D, E) In the sleeping predator task, the risk of loss is constant, but the cost of loss still increases as more rewards are gathered. The value of reward pursuit under increasingly pessimistic valuation is presented for scenarios with D) low risk and E) high risk of predator awakening. The relative value of approach (vs. avoid) decreases with loss amount and threat level, and more so under pessimistic assumptions. (Parameters: γ = 1.0).

Figure 3.
The decision tree environment (Huys et al., 2012; Lally et al., 2017). A) An optimistic agent (w = 1) prefers the optimal loss-minimizing policy through the initial large loss (left branch). B) A pessimistic agent (w = 0.5) comes to prefer the branch without the large loss so as to avoid being unable to recoup the large initial loss. One-step rewards (or costs) are presented in each state; the net value Q on each path is shown numerically. (Parameters: γ = 1.0).

Figure 4.
An example of the anxiety–depression transition. A) A simple deterministic gridworld with two terminal states: one rewarding (blue) and one aversive (red). B, C) The development of value expectancies over three steps of learning, for two levels of pessimism. States are colored by their values under different levels of pessimism, with arrows showing an optimal trajectory. In B, for an optimistic agent (w = 1), all states (other than the harmful state) take on positive value with learning. In C, for a pessimistic agent (w = 0.6), negative value spreads from the source to antecedent states. As a result of avoidance, the agent learns reward is unobtainable and develops anergic symptoms (i.e., forgoes action). (Parameters: γ = 0.95).

Figure 5.
The free choice premium task (Leotti & Delgado, 2011, 2014) with equal chance of outcome R ∈ [ 1 , − 1 ] . A) Nonanxious participants exhibit a preference for the free choice option (blue) despite it conferring no benefit over the fixed choice option (gray). B) Pessimistic agents show an attenuated free choice bias. The fractional preference for the free choice option over the simulated experiment is shown for three populations of subjects with levels of pessimism w (y-axis; each dot represents an individual simulated agent, and the smoothed density of free choice bias for each pessimism level is also shown). (Parameters: Q-learning with γ = 1.0, and inverse temperature, β, increased from 1 to 15 over 100 episodes).
