
Figure 1
Illustration of the trial structure of a single Ambiguous trial in the picture selection task. Each trial comprised a 3-sentence auditory narrative. Participants were instructed that whenever pictures appeared on the screen they were to select the picture which “fits best with what [they had] just heard in the story”.
Table 1
Examples of 3-sentence narrative structures in the three conditions.
| CONDITION | SENTENCE | EXAMPLE NARRATIVE |
|---|---|---|
| Ambiguous | Sentence 1 | The shop had some complicated items that needed repair. |
| Sentence 2 | It would be a difficult job. | |
| Sentence 3 | The expert was careful when he looked at the organ. | |
| Unambiguous (item-matched) | Sentence 1 | The shop had some complicated items that needed repair. |
| Sentence 2 | It would be a difficult job. | |
| Sentence 3 | The expert was careful when he looked at the piano. | |
| Unambiguous (set-matched) | Sentence 1 | Gina pinned the piece of cotton onto the doll. |
| Sentence 2 | It didn’t seem right. | |
| Sentence 3 | She thought it might look better with some leather. |
Table 2
Descriptive Statistics. Frequency is given in frequency per million words, based on SUBTLEX-UK (van Heuven et al., 2014). The table contains frequency means, with standard deviations in brackets. Age of acquisition and Familiarity ratings were taken from Scott et al., 2019. Number of syllables was calculated for British pronunciations, by eSpeak speech synthesiser (http://espeak.sourceforge.net/). Narrative and Picture information was newly collected (see below).
| AMBIGUOUS (N = 66) | UNAMBIGUOUS (ITEM-MATCHED) (N = 66) | UNAMBIGUOUS (SET-MATCHED) (N = 66) | |
|---|---|---|---|
| Target word characteristics | |||
| Frequency | 43.72 (74.16) | 34.32 (58.41) | 42.39 (97.59) |
| Age of acquisition | 3.23 (0.87) | 3.48 (1.18) | 3.00 (0.85) |
| Familiarity | 5.65 (0.79) | 5.78 (0.68) | 5.70 (0.75) |
| Number of syllables | 0.97 (0.86) | 1.71 (1.15) | 1.17 (0.82) |
| Narrative characteristics | |||
| Number of words | 25.04 (3.69) | 25.04 (3.69) | 24.15 (3.79) |
| Narrative naturalness rating | 5.27 (1.66) | 5.27 (1.66) | 5.27 (1.67) |
| Key word fit: LSA score | 0.07 (0.11) | 0.07 (0.09) | 0.08 (0.10) |
| Target picture characteristics | |||
| Picture representativeness | 4.93 (1.78) | 5.16 (1.78) | 5.74 (1.48) |
Table 3
Summary of analysis aims and statistical models. For simplicity, covariates are not included in this summary. Amb = Ambiguous, Unamb = Unambiguous.
| ANALYSIS | CONDITION COMPARISON | AIM | MAXIMAL MODEL | COMPARISON MODEL |
|---|---|---|---|---|
| Group-level | Amb vs Unambi (item-matched) | Replicate ambiguity effect using Unambiguous (item-matched) | 1 + Condition + List + Condition:List + (1 + Condition|subjects) + (1 + Condition|items) | 1 + List + Condition:List + (1 + Condition|subjects) + (1 + Condition|items) |
| Amb vs Unamb (set-matched) | Replicate ambiguity effect using Unambiguous (set-matched) | 1 + Condition + (1 + Condition|subjects) + (1|items) | 1 + (1 + Condition|subjects) + (1|items) | |
| Unamb (item-matched) vs Unamb (set-matched) | Ensure control conditions are comparable | 1 + Condition + (1 + Condition|subjects) + (1|items) | 1 + (1 + Condition|subjects) + (1|items) | |
| Individual differences | Amb vs Unamb (set-matched) | Assess individual differences in task performance | 1 + Condition + (1|subjects) + (1|items) | 1 + Condition + (1|items) |
| Amb vs Unamb (set-matched) | Assess individual differences in ambiguity effect | 1 + Condition + (1 + Condition|subjects) + (1|items) | 1 + Condition + (1|subjects) + (1|items) |

Figure 2
Mean accuracy (proportion) and response time on correct trials (ms) in each of the three conditions on Sentence 1 and Sentence 3 probes for each participant. Boxplots show median and quartiles, diamond shows mean across the sample.
* p < .05; ** p < .01; *** p < .001’

Figure 3
Estimates for individual participants, for the intercept and the condition difference in accuracy and response time of picture selection. We used the dotplot() function from the lattice package, and the ranef() function from lme4, to plot conditional modes from the maximal model (i.e. “predictions” for means of individual participants, based on the parameter estimates of our model). Each row represents the conditional mode (and standard error) for one participant, in terms of its deviation from the population mean (centred at 0). Individual participants are rank-ordered according to the conditional mode of their intercept, from highest estimate (i.e., positive deviation from the mean) to lowest (i.e., negative deviation from the mean).
Table 4
Reliability estimates for each dependent variable and condition. Estimates are Spearman-Brown corrected mean correlation coefficients (and 95% confidence intervals) based on 5000 random splits of the data (Parsons, 2021).
| RELIABILITY ESTIMATE | ||
|---|---|---|
| ACCURACY | RESPONSE TIME (LOG-TRANSFORMED) | |
| Averages | ||
| Ambiguous | 0.67, 95% CI [0.48, 0.81] | 0.92, 95% CI [0.88, 0.95] |
| Unambiguous (item-matched) | 0.69, 95% CI [0.47, 0.83] | 0.93, 95% CI [0.90, 0.95] |
| Unambiguous (set-matched) | 0.67, 95% CI [0.40, 0.85] | 0.93, 95% CI [0.90, 0.96] |
| Difference scores | ||
| Ambiguous – Unambiguous (set-matched) | 0.43, 95% CI [0.07, 0.67] | 0.14, 95% CI [–0.22, 0.47] |
