Skip to main content
Have a personal or library account? Click to login
Lexical Feedback in the Time-Invariant String Kernel (TISK) Model of Spoken Word Recognition Cover

Lexical Feedback in the Time-Invariant String Kernel (TISK) Model of Spoken Word Recognition

Open Access
|Apr 2024

Figures & Tables

Figure 1

A simple word recognition network incapable of encoding temporal order or repeated phonemes (Magnuson, 2018a).

Figure 2

TRACE’s time-as-space encoding (Magnuson, 2018b). At the bottom, inputs corresponding to /k/, /æ/, and /t/ have specific alignments (in TRACE, these would be distributed representations of over-time pseudo-spectral features). Those inputs activate phoneme templates aligned with them, which in turn activate aligned words. Darkness of shading indicates degree of activation. The maximally-activated copies of CAB, CAT and TAB are those aligned with the input, though degree of activation reflects amount and temporal distribution of phonetic overlap (CAB > CAT > TAB).

Table 1

Examples of ordered open diphones.

WORDORDERED OPEN DIPHONES
CATkæ, kt, æt
TACKtæ, tk, æk
ACTæk, æt, kt
DADdæ, dd, æd
ADDæd
SOULso, sl, ol
SOLOso x 2, sl, ol, oo
Figure 3

Overall TISK architecture (Figure 3 from Hannagan et al., 2013). Inputs are presented one at a time on time-specific copies of each possible phoneme. Phonemes activate corresponding diphones and single nodes in the N-phone layer. N-phone units activate corresponding words. Lateral inhibition governs lexical competition (indicated by knobbed recurrent link in top right). The greyed out arrow from words to N-phones indicated that the original TISK model did not have lexical feedback (which is the only structural alteration in the model introduced in this paper). The symmetry network (not shown; see Figure 4 from Hannagan et al., 2013) allows an input like /ba/ to activate both the /ba/ and /ab/ diphones, but activates the diphone corresponding to the input order much more strongly. See Hannagan et al. (2013, pp. 5–6) for details.

Table 2

Original (without feedback) parameters for TISK, and parameters that promote high performance with feedback. Parameters in the ‘optimized without feedback’ column that differ from original parameters are in bold. Parameters in the ‘optimized with feedback’ column that differ from parameters in the ‘optimized without feedback’ and/or ‘original TISK’ columns are also in bold.

PARAMETERORIGINAL TISKOPTIMIZED WITHOUT FEEDBACKOPTIMIZED WITH FEEDBACK
Input phoneme decay0.0100.0010.001
N-phone decay0.0010.0010.100
Word decay0.0100.0500.050
Phoneme to N-phone1.0000.1000.100
Diphone to word0.0500.0500.050
Single phone to word0.0100.0100.010
Word to word inhibition–0.005–0.005–0.010
Positive word to N-phone feedback0.150
Negative word to N-phone feedback–0.050
Figure 4

Mean time course for targets and different classes of competitors in TRACE and TISK with and without feedback (including the original model, as well as the version with parameters ‘optimized’ for graceful degradation, as detailed later). Each line represents the mean for a class of items over all 211 words in the original TRACE lexicon. Cohorts overlap in the first two phonemes. Rhymes overlap in all but the first phoneme. Unrelated is the mean activation of all words in the lexicon. Ribbons indicate standard error.

Figure 5

RT correlations for original TISK (without feedback), TISKfb (TISK with feedback), and TRACE. Left panel: TISKfb vs. TISK. Middle panel: TISKfb vs. TRACE. Right panel: original TISK vs. TRACE. Diagonal grey lines indicate the identity line, dashed lines indicate best linear fit.

Figure 6

item-specific RTs in TRACE, TISKfb (with feedback), TISK without feedback with parameters optimized for noise, and original TISK (without feedback), as a function of lexical dimensions for the 211-word TRACE lexicon. Dimensions: Length is number of phonemes, Embeddings is how many words embed within the target word (e.g., CAB and IN embed in CABINET), Onset competitors are cohorts (words overlapping in the first two phonemes), ex-Embeddings are the number of words the target word embeds into (e.g., CAB embeds in CABINET, CABARET, etc.), Neighbors are the number of words differing from the target by no more than a 1-phoneme deletion, addition, or substitution (so-called DAS neighbors), and Rhymes items are items that mismatch the target only at the first phoneme (by deletion, addition, or substitution; e.g., for CAT, these would include AT, SCAT, and BAT).

Figure 7

Lexical effects on phoneme activations (Ganong effects) for ten 4-phoneme words (Simulation 2). We observe robust Ganong effects (lexical restoration) at each position with lexical feedback enabled, with stronger effects in later positions. The key results are that (a) greater ambiguity is apparent for continuum steps near the nonword endpoint and (b) the upward shift for the center continuum step (4). Error ribbons indicate standard error.

Figure 8

Retroactive phoneme restoration by following context (Simulation 3). In the lexicon, plug and blush are words, but *blug and *plush are not (even though plush is a word in English). Note that the delayed activations of ambiguous phonemes is due to failure to reach the activation threshold from the initial input. The discrete delay of 10 cycles is due to new TISK inputs ‘arriving’ every 10 cycles.

Figure 9

Phoneme restoration given noise vs. silence (Simulation 4). Mean results from simulations with ten 4-phoneme words. Top row: TISK without feedback. Bottom row: TISK with feedback. With feedback, moderate levels of noise (standard deviation ≥ 0.3) drive restoration, although the resulting activation is always less than that observed with the intact phoneme. Without feedback, noise level matters little, and even modest levels of noise drive expected phonemes to saturation. Note that phoneme activations remain at approximately 0 given silence replacement. Error ribbons depict standard error.

Figure 10

Effects of noise on accuracy and recognition time in TISK with feedback, and three variants of the model without feedback: the original, Hannagan et al. (2013) parameters, the no-feedback parameters optimized for graceful degradation, and the parameters optimized for feedback but with feedback turned off (Simulation 5). Ribbons indicate standard error. Feedback maximizes the ability of the model to exhibit graceful degradation: feedback preserves accuracy better under higher levels of noise. In contrast to results with TRACE (Magnuson et al., 2018), the feedback benefit does not extend immediately to recognition time, though an advantage emerges at high levels of noise.

Figure 11

Effects of noise on accuracy and recognition time in TISK with feedback and without (with optimized parameters), but restricted to words that were recognized by both models. This reveals a smaller initial difference and earlier cross-over to a feedback advantage compared to Figure 10. This suggests that the apparent disadvantage for feedback is largely due to the additional words the model with feedback can recognize at higher levels of noise. Ribbons indicate standard error.

Figure 12

Effects of noise on recognition time in TISK with and without feedback for one model run. Each panel’s label indicates the noise level. Red squares plot mean RT with and without feedback.

Figure 13

Effects of noise on recognition time in TISK with and without feedback for all 15 model runs. Each panel’s label indicates the noise level. Red squares plot the mean RT values with and without feedback. Color indicates run.

Figure A1

Exploration of positive (x-axis) and negative (y-axis) feedback. In each panel, the solid line is the ‘graceful degradation’ result (see Figure 11) and the dashed line is the Ganong effect. The number in the upper right of each panel is mean accuracy over the full range of noise in the graceful degradation simulations. Panels are shaded yellow if mean accuracy in graceful degradation is > 0.5, or purple if mean accuracy was > 0.4. Panels have red outlines if there is a plausible Ganong effect (maximum difference ≥ 0.15, minimum > 0). Informally, we consider panels that are yellow or purple and highlighted in red to indicate parameter ranges that result in robust performance with feedback (approximately 16% of the combinations explored here).

Figure A2

Further exploration of positive (x-axis) and negative (y-axis) feedback. In each panel, retroactive lexical influence simulations (as in Figure 8) are plotted with different feedback parameters. For simplicity, intact or ambiguous cases that are lexically consistent or inconsistent are averaged. Cases where, given ambiguous input, the lexically consistent phoneme’s activation excedes the inconsistent phoneme’s by 0.05 and, given consistent input, the lexically inconsistent phoneme’s activation does not excede 0.05 are shaded yellow or green. Green shading indicates cases that yield robust graceful degredation in Figure A1 (yellow or purple shading with red outline). Thus, a fairly broad range of parameters yields robust performance with feedback (green shading corresponds to ~16% of explored combinations, which includes all cases shaded in yellow or purple and outlined in red in Figure A1).

Figure A3

Parameter exploration without feedback. This figure shows graceful degradation results as a function of word-to-word inhibition (x-axis) and N-phone decay (y-axis) with other parameters already optimized. Parameters outside these ranges yield unstable results. A fairly narrow range of parameters (approximately 4% of explored combinations) leads to fairly robust graceful degradation results (purple shading indicates combinations that yield mean accuracy over noise levels > 0.4).

DOI: https://doi.org/10.5334/joc.362 | Journal eISSN: 2514-4820
Language: English
Submitted on: Dec 22, 2023
Accepted on: Apr 3, 2024
Published on: Apr 26, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 James S. Magnuson, Heejo You, Thomas Hannagan, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.