A Distributional Response Time Analysis of the Perceptual Disfluency Effect

Jason Geller; Pablo Gomez; Erin Buchanan; Dominique Makowski

doi:10.5334/joc.469

Full Article

We live in a world that is, even for adults, “blooming and buzzing with confusion” (James, 1890, p. 488). Yet we can still decipher cursive writing or follow conversations in noisy bars. This ability to cope with a noisy, confusing environment has long been studied at the intersection of education and cognitive psychology. Decades of work show that encoding difficulty can enhance long-term memory. Although people often assume that easier learning is better, many findings demonstrate the opposite: under certain conditions, making learning more effortful can improve retention. This phenomenon, known as the desirable difficulties principle (Bjork & Bjork, 2011), includes robust effects such as spacing study sessions (Carpenter et al., 2022), interleaving concepts rather than blocking them (Rohrer & Taylor, 2007), and generating or retrieving information instead of simply re-reading it (Roediger & Karpicke, 2006; Slamecka & Graf, 1978).

One straightforward example involves altering the perceptual characteristics of study materials to make them harder to process. A growing literature shows that such manipulations can improve memory (e.g., Geller et al., 2018; Geller & Peterson, 2021; Halamish, 2018; Rosner et al., 2015), a benefit referred to as the perceptual disfluency effect (see Geller et al., 2018).

The Perceptual Disfluency Effect

The link between perceptual disfluency and memory dates back to the late 1980s. Nairne (1988), using the term perceptual-interference effect, employed backward masking with hash marks (e.g., ####) to make word encoding more difficult. Since then, a range of manipulations has been shown to elicit similar effects, including high-level blurring (Rosner et al., 2015), word inversion (Sungkhasettee et al., 2011), small text size (Halamish, 2018), handwritten cursive (Geller et al., 2018), and unusual typefaces (Geller & Peterson, 2021; Weissgerber & Reinhard, 2017; Weltman & Eakin, 2014).

Because these manipulations are simple to implement, researchers quickly began touting their educational potential. Interest grew following Diemand-Yauman et al. (2011), who reported that presenting material in disfluent typefaces (e.g., Comic Sans, Bodoni MT, Haettenschweiler, Monotype Corsiva) enhanced memory both in the lab and in high school classrooms across multiple content areas.

However, evidence for the effect has been inconsistent. A striking example is Sans Forgetica, a font designed to promote memory through slanted, gapped letters, forcing individuals to “generate” the missing parts of each word (Earp, 2018). Despite early claims, multiple studies have failed to replicate its benefits, finding it produces no memory benefit over and beyond normal fonts (Cushing & Bodner, 2022; Geller et al., 2020; Huff et al., 2022; Roberts et al., 2023; Taylor et al., 2020a; Wetzler et al., 2021). Similar null results have been reported for other perceptual manipulations like small fonts (Rhodes & Castel, 2008), degraded auditory stimuli (Rhodes & Castel, 2009), minor blurring (Yue et al., 2013), and alternative typefaces (Rummer et al., 2015).

Given these mixed findings, recent work has focused on discovering boundary conditions for the effect. Geller et al. (2018) showed a “Goldilocks” zone: memory benefits emerge only when stimuli are moderately, not excessively, difficult to read (e.g., easy-to-read cursive). Geller and Peterson (2021) further demonstrated that disfluency effects are stronger when test expectancy is low, reasoning that explicit test instructions lead participants to process all items deeply, reducing any added benefit of disfluency. Individual differences also play a role. For instance, Eskenazi and Nix (2021) found that strong spellers gained more from a disfluent font (i.e., Sans Forgetica) than weaker spellers.

Overall, perceptual disfluency can enhance memory in specific contexts but appears limited as an educational intervention, where students are typically aware of upcoming tests. Nonetheless, as Geller and Peterson (2021) argue, disfluency may hold practical value in everyday settings where memory is often incidental. The key challenge is to predict when and where such effects will reliably occur.

Theoretical Accounts of the Disfluency Effect

To apply perceptual disfluency effectively in real-world settings, its underlying mechanisms must be better understood. Several theories have been proposed, with Geller et al. (2018) reviewing two major accounts. The metacognitive account (Alter et al., 2007; Pieger et al., 2016) views disfluency as a cue that prompts greater cognitive control and regulatory processing. Here, disfluency is detected after stimulus identification, and the specific type of disfluency is less important than the learner’s perception that material is difficult, which triggers regulatory processes. The compensatory processing account (Mulligan, 1996), on the other hand, is rooted in the interactive activation model of word recognition (McClelland & Rumelhart, 1981) and argues that disfluency enhances memory by recruiting top-down support from lexical and semantic representations. When input is noisy during encoding (e.g. by masking or blurring the stimulus), higher-level knowledge feeds back to aid recognition, and this deeper processing strengthens memory.

More recently, Ptok and colleagues proposed a limited-capacity, stage-specific model (Ptok et al., 2019; Ptok et al., 2020). They showed that memory benefits from encoding conflict depend on (1) the processing level engaged by the task and (2) metacognitive monitoring and control. Across six experiments, they found improved recognition when target words were paired with incongruent semantic distractors (e.g., Chair – Alive vs. Chair – Inanimate), but not with incongruent response distractors (e.g., Lisa – left/right). Both conditions slowed responses, but only semantic conflict boosted memory, suggesting the effect arises when tasks emphasize meaning. Pupillometry, a measure of cognitive effort (see Mathôt, 2018; van der Wel & Steenbergen, 2018 for reviews), confirmed that both types of conflict increase effort; yet, only semantic conflict translated into memory benefits. Importantly, the effect vanished when endogenous attention was manipulated (e.g., by using a chin-rest), mirroring perceptual disfluency findings. That is, disfluency benefits are eliminated when attention is manipulated by increasing test expectancy—for example, by telling participants they will be tested or by asking them to estimate how likely they are to remember an item on a later test (Besken & Mulligan, 2013; Geller & Peterson, 2021; Rosner et al., 2015).

Together, these accounts highlight different loci for the disfluency effect. The metacognitive account situates it post-lexically, after word recognition. The compensatory account links it directly to recognition, with disfluent words receiving more top-down support. The stage-specific model associates it with semantic-level processing, while also incorporating attentional and control processes that modulate when disfluency effects appear.

Moving Beyond the Mean: Modeling Reaction Time (RT) Distributions

Ex-Gaussian Distribution

To test the stages involved in the perceptual disfluency effect, researchers need methods that provide a finer-grained analysis of encoding. In learning and memory research, differences between fluent and disfluent conditions are typically assessed with mean reaction times (e.g., Geller et al., 2018; Geller & Peterson, 2021; Rosner et al., 2015). Although standard, mean RT analyses have been criticized for obscuring important distributional patterns (Balota & Yap, 2011).

RTs are typically unimodal, positively skewed, and often heteroscedastic, thereby violating assumptions of standard linear models (Wilcox, 1998). Reliance on mean RTs can obscure effects that selectively influence distributional shape—for example, the slow tail, which captures the longest responses, central tendency, or both. Moreover, because RTs reflect a mixture of decisional and non-decisional processes, mean RTs provide only limited leverage for isolating specific cognitive stages (Balota & Yap, 2011).

A widely used alternative is the ex-Gaussian distribution (Balota & Yap, 2011; Ratcliff, 1978), which decomposes RTs into three parameters: μ (mean of the Gaussian component, reflecting typical response speed), σ (its standard deviation), and τ (mean/SD of the exponential component, reflecting the slow tail). The overall mean equals μ + τ, and the SD is $\sqrt{σ^{2} + τ^{2}}$ . This decomposition allows researchers to distinguish between manipulations that shift the whole distribution, stretch the tail, or both.

For example, Heathcote et al. (1991) analyzed Stroop effects with an ex-Gaussian model. They found facilitation and interference effects on μ, interference on σ, and interference on τ. Mean RT analysis revealed only interference, as facilitation on μ and interference on τ canceled out—a finding hidden without distributional modeling.

Exploring effects from a distributional perspective has provided a richer understanding of how different experimental manipulations affect word recognition. Experimental manipulations can produce several distinct patterns. One pattern involves a shift of the entire RT distribution to the right, without increasing the tail or skew. A pattern such as this one would suggest a general effect and would manifest as an effect on μ, but not τ. As an example, semantic priming effects–where responses are faster to targets when preceded by a semantically related prime compared to an unrelated prime–can be nicely explained by a simple shift in the RT distribution (Balota et al., 2008). Alternatively, an experimental manipulation could produce a pattern where the RT distribution is skewed or stretched in the slower condition. This result suggests that the manipulation only impacts a subset of trials, and is discernable as an increase in τ.

An example of an effect that only impacts τ is the transposed letter effect in visual word recognition (Johnson et al., 2012). The transposed letter (TL) effect involves misidentification of orthographically similar stimuli that with transposed internal like, like mistaking “JUGDE” for “JUDGE” (Perea & Lupker, 2003). Finally, you could observe a pattern wherein an experimental manipulation results in both changes in μ and τ, which would shift and stretch the RT distribution. Recognizing low frequency words have been shown to not only shift the RT distribution, but also stretch the RT distribution (Andrews & Heathcote, 2001; Balota & Spieler, 1999; Staub, 2010).

Although largely descriptive, the model has been used to link parameters to processing stages. For instance, μ and σ have been tied to early, automatic processes such as spreading activation in semantic priming (Balota et al., 2008; Wit & Kinoshita, 2015). Conversely, τ has been linked to later, controlled processes involving attention and working memory (Balota & Spieler, 1999; Fitousi, 2020a; Kane & Engle, 2003). For example, τ differences in the transposed-letter effect have been attributed to post-lexical checking on a subset of trials (Johnson et al., 2012). Still, mapping distributional parameters onto cognitive processes remains debated and should be interpreted carefully (Heathcote et al., 1991; Matzke & Wagenmakers, 2009).

Goals of the Present Experiments

In the present experiments, we pursued two aims related to perceptual disfluency. The first was to examine the replicability of the perceptual disfluency effect. To maximize the likelihood of observing this effect, we employed a manipulation that has previously been shown to enhance memory: perceptual blurring. Rosner et al. (2015) demonstrated across several studies that high-blur, but not low-blur can boost memory in a recognition memory test. Thus, not all disfluency manipulations are created equal (see Geller et al., 2018). Different perceptual manipulations affect processing in distinct ways, making it critical to identify which manipulations reliably produce disfluency effects and at what stage of processing. Following Rosner et al. (2015), we presented participants, with clear words (no blur), low-blurred words, and high-blurred words.

The second, more pivota, aim was to expand the methodological toolkit for investigating perceptual disfluency during encoding. To this end, we applied the ex-Gaussian distribution, which allows RT distributions to be decomposed into parameters reflecting different stages of processing. This approach offers a richer perspective beyond what mean response times alone can reveal. The ex-Gaussian model is widely used in the word recognition field (Balota et al., 2008), and its parameters are both interpretable and straightforward to implement. By using a distributional approach coupled with varying levels of perceptual disfluency, we aim to clarify the specific processing stages at which perceptual disfluency affects encoding, thereby providing a more mechanistic account of when disfluency enhances memory and when it does not.

Predictions

Table 1 summarizes each theoretical account of perceptual disfluency and their predicted outcomes. Some of these accounts are articulated verbally and can be formalized in different ways. We made a good-faith effort to translate these verbal descriptions into models, while recognizing that reasonable researchers may make alternative modeling choices—an unavoidable reality of scientific inference (see McElreath, 2020). The ex-Gaussian distribution provides a descriptive framework for assessing how disfluency manipulations affect encoding. Each account makes specific predictions about the loci of the perceptual disfluency effect, which can be mapped onto model parameters:

If the metacognitive account (e.g., Alter et al., 2007; Pieger et al., 2016) holds, and the effect arises primarily at a post-lexical stage, one would expect a lengthening of the distribution tail (increases in τ) for blurred relative to clear words. Importantly, this perspective suggests that memory performance may not differ between high- and low-blurred words, given that perceptual disfluency is assumed to be largely subjective in nature.
In contrast, the compensatory processing account (Mulligan, 1996) would predict a shift in the distribution (increase in μ) for high-blurred words compared to low- and no-blurred words and better memory. Memory effects arising in this account are thought to be purely lexical/semantic. This expectation is in line with findings from Rosner et al. (2015), who reported that highly blurred words are associated with longer latencies, increased error rates, and better recognition memory.
If the disfluency effect reflects both early and late processing, the stage-specific account (Ptok et al., 2019; Ptok et al., 2020) predicts that high-blur (vs. clear/low-blur) will increase both μ (overall rightward shift) and τ (heavier tail). Similar encoding patterns have been observed with hard-to-read handwriting (e.g., Perea et al., 2016; Vergara-Martínez et al., 2021). Because low-blur is unlikely to recruit substantial post-lexical control, the account predicts no reliable change in either parameter for low-blur items.

Table 1

Mapping model predictions to theoretical constructs.

ACCOUNT	DESCRIPTION	LOCI	CONTRAST	EX-GAUSSIAN PREDICTIONS	QUANTILE PLOTS	RECOGNITION MEMORY PREDICTIONS
Meta-cognitive	Perceptual disfluency affects meta-cognitive processes via increased system 2 processing	Post-lexical	High-blur vs. Low-blur/Clear	μ: × β/τ: ↑	Late difference	High > Low/Clear
			Low-blur vs. Clear	μ: × β/τ: ↑	Late difference	Low > Clear
Compensatory-processing	Perceptual disfluency affects word recognition	Lexical/semantic	High-blur vs. Low-blur/Clear	μ: ↑ β/τ: ×	Complete shift	High > Low/Clear
			Low-blur vs. Clear	μ: × β/τ: ×	No difference	Low = Clear
Stage-specific	Disfluency effects rely on (1) the stage or level of processing tapped by the task and (2) monitoring and control processes	Lexical/semantic and Post-lexical	High-blur vs. Low-blur/Clear	μ: ↑ β/τ: ↑	Complete shift Shift + Late differences	High > Low/Clear
			Low-blur vs. Clear	μ: ↑ β/τ: ×	No difference	Low = Clear

[i] Note. ↑ = higher estimate; ↓ = decrease estimate; x = no effect on parameter of interest.

Experiment 1A: Context Reinstatement

In Experiment 1A, we collected RTs from a lexical decision task (LDT) during encoding followed by a surprise recognition memory test. Using a two-choice task, like the LDT, allowed us to examine how perceptual disfluency affects encoding processes using mathematical models. Based on previous work (Geller & Peterson, 2021), there was no mention of the recognition test when participants signed up for the study to give us the best chance of observing a disfluency effect.

Method

Transparency and Openness

This study complies with transparency and openness guidelines. The preregistered analysis plan for this experiment can be found here: https://osf.io/q3fjn. All raw and summary data, materials, and R scripts for pre-processing, analysis, and plotting can be found at https://osf.io/6sy7k/.¹ All deviations and changes from the preregistration are noted herein.

Participants

All participants were recruited through the Rutgers University subject pool (SONA system). We preregistered a sample size of 216 participants. A design of this size provides at least 90% power to detect effect sizes of δ ≥ 0.20, assuming a one-sided test with α = 0.05. A total of 263 participants completed the study. Per our exclusion criteria, 15 participants were removed for completing the experiment more than once, and 16 were removed for accuracy below 80%. No participants were excluded for being non-native English speakers or under 18 years of age. To account for oversampling and to ensure equal numbers across lists, we randomly selected 36 participants from each list, yielding a final sample of 216 participants. The study protocol was reviewed and approved by the Rutgers University Institutional Review Board.

Apparatus and stimuli

The experiment was run using PsychoPy software and hosted on Pavlovia (www.pavlovia.org). You can see an example of the experiment by navigating to this website: https://run.pavlovia.org/Jgeller112/ldt_dd_l1_jol_context.

We used 84 words and 84 nonwords for the LDT. Words were obtained from the LexOPS package (Taylor et al., 2020b). All of our words were matched on a number of different lexical dimensions. All words were nouns, 4–6 letters in length, had a known word recognition rate of 90–100%, had a low neighborhood density (OLD20 score between 1–2), high concreteness, imageability, and word frequency. Our nonwords were created using the English Lexicon Project (Balota et al., 2007). Stimuli can be found at our OSF project page cited above.

Blurring. Blurred stimuli were processed through the {imager} package (Barthelme, 2023) and a personal script (https://osf.io/gr5qv). Each image was processed through a high-blur filter (Gaussian blur of 15) and low-blur filter (Gaussian blur of 10). These pictures were then imported into PsychoPy as picture files. See Figure 1 for examples how clear, low-blurred, and high-blurred words appeared in the experiment.

Clear (left), low-blur (10% blur) (right), and high-blur (15% blur) (center) examples.

Design

We created two lists: 1) one list (84 words; 28 clear, 28 low-blur, and 28 high-blur) served as a study (old) list for the LDT task while the 2) other list served as a test (new) list (84 words; 28 clear, 28 low-blur, and 28 high-blur) for our recognition memory test that occurred after the LDT. We counterbalanced each list so each word served as an old word and a new word and were presented in clear, low-blurred, and high-blurred across participants. This counterbalancing resulted in six lists. Lists were assigned to participants so that across participants each word occurs equally often in the six possible conditions: clear old, low-blurold, high-blur old, clear new, low-blur new, and high-blur new. For the LDT task, we generated a set of 84 legal nonwords that we obtained from the English Lexicon Project. These 84 nonwords were used across all 6 lists.

Procedure

The experiment consisted of two phases: an encoding phase (LDT) and a test phase. During the encoding phase, a fixation cross appeared at the center of the screen for 500 ms. The fixation cross was immediately replaced by a letter string in the same location. To continue to the next trial, participants had to decide if the letter string presented on screen was a word or not by either pressing designated keys on the keyboard (“m” or “z”) or by tapping on designated areas on the screen (word vs. nonword) if they were using a cell phone/tablet. After the encoding phase, participants were given a surprise old/new recognition memory test. During the test phase, a word appeared in the center of the screen that either had been presented during study (“old”) or had not been presented during study (“new”). Old words occurred in their original typeface, and following the counterbalancing procedure, each of the new words was presented as clear, low-blurred, or high-blurred. All words were individually randomized for each participant during both the study and test phases and progress was self-paced. After the experiment, participants were debriefed. The entire experiment lasted approximately 15 minutes.

Data Analysis Plan

All models were fit in R (v. 4.5.1; R Core Team, 2025) using the Stan modeling language (Grant et al., 2017) via the {brms} package (Bürkner, 2017). We used maximal random-effects structures justified by the design (Barr et al., 2013).

We ran four chains of 5,000 MCMC iterations (1,000 warm-up), totaling 16,000 post-warm-up samples (except for the diffusion model, which used 2,000 iterations to reduce computation time). Model quality was checked via prior/posterior predictive checks, R̂, and effective sample size (ESS; Vehtari et al., 2021). Convergence was assessed using R̂ (target ≤ 1.01) and effective sample size (ESS ≥ 1000) (Bürkner, 2017). Default (non-informative) priors were used for most parameters. Weakly informative priors were used for population-level parameters to enable Bayes factor (Evidence Ratio; ER) calculations for two sided-hypotheses against a point null. Full prior specifications are available in the Quarto source file on OSF: https://osf.io/6vew2.

We report posterior means and 90% credible intervals (CrIs) for one-sided hypotheses (preregistered differences), and 95% CrIs for two-sided hypotheses (against zero). Estimated marginal means were extracted using a combination of {emmeans} (Lenth, 2023) and {brms} (Bürkner, 2017). Additionally, we report the posterior probability that an effect lies in a particular direction and ER, which is a generalization of the Bayes factor for directional hypotheses.² An ER > 3 indicates moderate to strong evidence for the hypothesis; ER < 0.3 indicates support for the alternative; and ER values between 0.3 and 3 are considered inconclusive. ERs were also used to assess point-null hypotheses (δ = 0). Hypotheses were considered supported if zero was excluded from the CrI, the posterior probability approached 1, and ER was > 3.

For all models, we applied ANOVA-style (effects) coding using contrast variables. For the Blur factor in Experiments 1A, 1B, and 2, we defined two orthogonal contrasts to capture the primary comparisons of interest. Contrast 1 compared high-blur against the average of clear and low-blur, coding high-blur as 0.5 and both clear and low-blur as –0.5. Contrast 2 isolated the difference between low-blur and clear, with low-blur coded as 0.5, clear as –0.5, and high-blur as 0. In Experiment 2, we also included a Frequency factor, with high-frequency coded as 0.5 and low-frequency as –0.5. Although these contrasts deviate from our preregistered comparisons, we believe they offer a more targeted test of our hypotheses. For transparency, we provide all pairwise comparisons in the accompanying visualizations

Accuracy. Accuracy (coded as correct [1] vs. incorrect [0]) was modeled using a Bayesian logistic regression with a Bernoulli distribution.

Ex-Gaussian. We modeled response times (in seconds) with an ex-Gaussian distribution,³ allowing the Gaussian mean/location (μ), the Gaussian standard deviation (σ) and the exponential scale (β = 1/λ) to vary by condition. Please note that when we refer to β we are referring to τ (β/τ). When fitting the ex-Gaussian distribution we use the identity link for μ, and the log link for σ and (β/τ).

Quantile and Delta Plots. In addition to ex-Gaussian analyses, we provide a graphical description of changes to the RT distribution using quantile and delta plots (Balota et al., 2008; De Jong et al., 1994). The process of visualization through quantile analysis can be broken down into four distinct steps:

Sorting and plotting: For correct trials, RTs are arranged in ascending order within each condition. We then plot the average of the specified quantiles (e.g., .1, .2, .3, .4, .5, .9).
Quantile averaging across participants: The individual quantiles for each participant are averaged, a concept reminiscent of Vincentiles.
Between-condition quantile averaging: The average for each quantile is computed between the conditions.
Difference calculation: We determine the difference between the conditions, ensuring the sign of the difference remains unchanged.

Typically, there are four observable patterns in the graphical depiction. No observable difference occurs when the conditions do not show any noticeable distinction. Late differences emerge when increasing differences appear later in the sequence, suggesting that the conditions diverge over time. A complete shift indicates a consistent difference across all quantiles, signaling an overall shift in the distribution. Finally, early differences reveal distinctions early in the reaction time distribution, suggesting an initial divergence between conditions.

Recognition Memory. Following recent trends (see Zloteanu & Vuorre, 2024), recognition memory was analyzed using a Bayesian generalized linear multilevel model (GLMM; a Bernoulli distribution with a probit link). Here the response of the participant (“say old” vs. “say new”) is modeled as a function of item status (“is old” vs. “is new”) and condition.

Bayesian GLMMs provide a more precise and flexible approach than traditional signal detection theory analyses. Following Signal Detection Theory (SDT; Green & Swets, 1966), participant responses can be classified as hits, correct rejections, misses, or false alarms, depending on the item status (“old” vs. “new”). In the probit regression framework, the interaction between item status and a predictor of interest corresponds directly to d′, while the main effects reflect response criterion (DeCarlo, 1998; for a detailed discussion of Bayesian SDT modeling see Zloteanu & Vuorre, 2024). Note that the model parameterization reflects –c (i.e., reversed sign) and this facet is what is reported in the paper. For visualization purposes, we use the conventional parameterization: positive values indicate more conservative responding, and negative values indicate a more liberal bias.

Results

All models presented no divergences, and all chains mixed well and produced comparable estimates ( $\hat{R} < 1.01$ and ESS > 1000).

Accuracy

The analysis of accuracy is based on 17,873 data points, after removing fast (< .2 s) and slow (> 2.5 s) RTs (2%). Model estimates can be found in Table 2. high-blur words had lower accuracy compared to clear and low-blurred words, b = –1.031, 90% CrI [–1.293, –0.77], ER = Inf. However, the evidence was weak for no significant differences in the identification accuracy between clear and low-blurred words, b = 0.041, 90% CrI[–0.216, 0.297], ER = 1.257.

Table 2

Posterior distribution estimates for accuracy model (Experiments 1A and 1B).

EXPERIMENT	HYPOTHESIS	MEAN	SE	CrI*	ER	POSTERIOR PROB
Experiment 1A	High-blur < (Low-blur + Clear)	–1.03	0.16	[–1.293, –0.77]	Inf	1.00
Experiment 1A	Low-blur < Clear	0.04	0.13	[–0.216, 0.297]	1.26	0.56
Experiment 1B	High-blur < (Low-blur + Clear)	–1.10	0.17	[–1.376, –0.829]	Inf	1.00
Experiment 1B	Low-blur = Clear	0.03	0.15	[–0.278, 0.322]	0.90	0.47

[i] Note. CrI: 90% for one-sided tests and 95% for two-sided tests against 0. Posterior probability indicates the proportion of the posterior distribution that falls on one side of zero (either positive or negative), representing the probability that the effect is greater than or less than zero.

RTs: Ex-Gaussian

The analysis of RTs (correct trials and words) is based on 16,980 data points, after removing fast (< .2 s) and slow (> 2.5) RTs (1%).

A visualization of how blurring affected processing during word recognition can be seen in the quantile and delta plots in A summary of the ex-Gaussian model can be found in Table 3. Beginning with the μ parameter, there was greater shifting for high-blurred words compared to clear and low-blurred words, b = 0.107, 90% CrI [0.1, 0.114], ER = Inf. low-blurred compared to clear words showed greater shifting, b = 0.016, 90% CrI [0.012, 0.02], ER = Inf.

Table 3

Posterior distribution estimates for ex-Gaussian distribution (Experiments 1A and 1B).

EXPERIMENT	HYPOTHESIS	PARAMETER	MEAN	SE	CrI*	ER	POSTERIOR PROB
Experiment 1A	High-blur > (Low-blur + Clear)	Mu (µ)	0.11	0.00	[0.1, 0.114]	Inf	1.00
Experiment 1B	High-blur > (Low-blur + Clear)	Mu (µ)	0.12	0.01	[0.11, 0.127]	Inf	1.00
Experiment 1A	High-blur > (Low-blur + Clear)	Sigma (σ)	0.16	0.06	[0.057, 0.253]	163.95	0.99
Experiment 1B	High-blur > (Low-blur + Clear)	Sigma (σ)	0.32	0.07	[0.214, 0.43]	Inf	1.00
Experiment 1A	High-blur > (Low-blur + Clear)	Beta (β/τ)	0.43	0.04	[0.367, 0.487]	Inf	1.00
Experiment 1B	High-blur > (Low-blur + Clear)	Beta (β/τ)	0.38	0.03	[0.318, 0.43]	Inf	1.00
Experiment 1A	Low-blur = Clear	Sigma (σ)	0.03	0.05	[–0.066, 0.136]	16.22	0.94
Experiment 1B	Low-blur = Clear	Sigma (σ)	–0.09	0.06	[–0.212, 0.035]	5.92	0.85
Experiment 1A	Low-blur = Clear	Beta (β/τ)	–0.00	0.03	[–0.062, 0.061]	7.77	0.89
Experiment 1B	Low-blur = Clear	Beta (β/τ)	0.03	0.03	[–0.026, 0.084]	5.05	0.83
Experiment 1A	Low-blur > Clear	Mu (µ)	0.02	0.00	[0.012, 0.02]	Inf	1.00
Experiment 1B	Low-blur > Clear	Mu (µ)	0.01	0.00	[0.006, 0.015]	Inf	1.00

[i] Note. CrI: 90% for one-sided tests and 95% for two-sided tests against 0. Posterior probability indicates the proportion of the posterior distribution that falls on one side of zero (either positive or negative), representing the probability that the effect is greater than or less than zero.

Analyses of the σ and β/τ parameters yielded a similar pattern. Variance was higher for high-blurred words compared to clear and low-blurred words, b = 0.157, 90% CrI [0.057, 0.253], ER = 163.948. Variance did not differ between low-blurred and clear words, b = 0.034, 90% CrI [–0.066, 0.136], ER = 16.22. There was greater skewing for high-blurred words compared to clear and low-blurred words, b = 0.427, 90% CrI [0.367, 0.487], ER = Inf. There was strong evidence for no difference between low-blurred and clear words, b = 0.00, 90% CrI [–0.062, 0.061], ER = 7.769.

Recognition Memory

Sensitivity. Figure 2 highlights d′ and c means and comparisons across all groups. Sensitivity was higher for high-blurred words than for clear and low-blurred words, β = 0.131, 90% CrI [0.07, 0.193], ER = 7999. The evidence for no difference in sensitivity between clear words and low-blurred words was strong, β = 0.005, 90% CrI [–0.061, 0.072], ER = 1.194.

Estimated posterior distributions for d-prime and criterion, and differences, with 95% CrIs.

Exploratory Analyses: Bias. Low-blurred words had a bias towards more “old” responses compared to clear words, β = 0.116, 90% CrI [0.062, 0.171], ER = 2665.57. High-blurred words showed a slightly more liberal bias compared to clear and low-blurred words, β = 0.020, 90% CrI [–0.02, 0.06], ER = 4.22.

Discussion

Experiment 1A successfully replicated the pattern of results found in Rosner et al. (2015). Specifically, we found high-blurred words had lower accuracy than clear and low-blurred words but had better memory.

Distributional Modeling

Adding to these results, we used the ex-Gaussian distribution for modeling. Descriptively, high-blurred words induced a more pronounced shift in the RT distribution (µ) and exhibited a higher degree of skew (β/τ) compared to clear and low-blurred words. However, low-blurred words compared to clear words did not differ on µ or β. These patterns can be clearly seen in the quantile and delta plots in Figure 3.

Quantile plots for each blur condition in Experiments 1A and 1B **(A)** and delta plots depicting the magnitude of the effect for hypotheses of interest over time in Experiments 1A **(B)** and 1B **(C)**. Each dot represents the mean RT at the .1, .3, .5, .7 and .9 quantiles.

This pattern argues against a purely metacognitive account (e.g., Pieger et al., 2016) and instead supports explanations that emphasize a combination of early and higher-level processing (e.g., stage-specific; Ptok et al., 2019), or compensatory processing (Mulligan, 1996)). However, considerable debate remains regarding the appropriateness of the ex-Gaussian distribution for drawing inferences about cognitive processes or stages (Fitousi, 2020a; Matzke and Wagenmakers, 2009).

DDM Results

Unlike the ex-Gaussian distribution, which makes little theoretical assumptions regarding process, the drift diffusion model (DDM) (see Ratcliff et al., 2016, for a comprehensive introduction) is a process-model, and its parameters can be linked to latent cognitive constructs (Gomez et al., 2013). The DDM is a popular computational model commonly used in binary speeded decision tasks such as the lexical decision task (LDT). The DDM assumes a decision is a cumulative process that begins at stimulus onset and ends once a noisy accumulation of evidence has reached a decision threshold. The DDM has led to important insights into cognition in a wide range of choice tasks, including perceptual-, memory-, and value-based decisions (Myers et al., 2022).

In the DDM, RTs are decomposed into several parameters that represent distinct cognitive processes. The most relevant to our purposes here are the drift rate (v) and non-decision time (ndt; Ter) parameters. Drift rate (v) represents the rate at which evidence is accumulated towards a decision boundary. In essence, it is a measure of how quickly information is processed to make a decision. A higher (more positive) v indicates a steeper slope, meaning that evidence is accumulated more quickly, leading to faster decisions. Conversely, a lower v indicates a shallower slope, meaning that evidence is accumulated more slowly. Drift rate is closely linked to the decision-making process itself and serves as an index of global processing demands imposed by factors such as task difficulty, memory load, or other concurrent cognitive demands—particularly when these processes compete for the same cognitive resources (Boag, Strickland, Loft, et al., 2019). Additionally, drift rates have been implicated as a key mechanism of reactive inhibitory control (Braver, 2012), where critical events (e.g., working memory updates or task switches) trigger inhibition of prepotent response drift rates (Boag, Strickland, Loft, et al., 2019; Boag, Strickland, Heathcote, et al., 2019). The Ter parameter represents the time taken for processes other than the decision-making itself. This includes early sensory processing (like visual or auditory processing of the stimulus) and late motor processes (like executing the response).

The DDM has been shown to be a valuable tool for studying the effects of different experimental manipulations on cognitive processes in visual word recognition. For example, Gomez and Perea (2014) demonstrated certain manipulations can differentially affect specific parameters of the model. For instance, manipulating the orientation of words (rotating them by 0, 90, or 180 degrees) affected the Ter component, but not v component. In contrast, word frequency (high-frequency words vs. low-frequency words) primarily influenced both the drift rate and non-decision time. These findings highlight the sensitivity of the DDM in identifying and differentiating the impact of various stimulus manipulations on different cognitive processes involved in decision-making.

We preregistered DDM analyses, and the model results can be found in the Appendix. Overall, we found high-blurred words impacted both an early non-decision component and a later, more analytic component evinced by higher Ter and a lower v than clear or low-blurred words. On the other hand, low-blurred words only affected Ter.

Conclusion

Herein, we present evidence that different levels of disfluency can influence distinct stages of encoding, potentially contributing to the presence or absence of a mnemonic effect for perceptually blurred stimuli. Unlike most studies that commonly employ a single level of disfluency, our study incorporated two levels of disfluency. The results indicate that a subtle manipulation such as low-blur primarily affects early processing stages, whereas a more pronounced perceptual manipulation (i.e., high-blur) impacts both early and late processing stages. Regarding recognition memory, high-blurred stimuli were better recognized compared to low-blurred and clear words. This suggests that to observe a perceptual disfluency effect, the perceptual manipulation must be sufficiently disfluent to do so and tap later stages of encoding.

Given the important theoretical implications of these findings, Experiment 1B served as a conceptual replication. Due to the bias observed in the recognition memory test (i.e., low-blurred words were responded to more liberally), we do not present old and new items as blurred at test, instead all the words were presented in a clear, different, font at test.