Lexical Feedback in the Time-Invariant String Kernel (TISK) Model of Spoken Word Recognition

James S. Magnuson; Heejo You; Thomas Hannagan

doi:10.5334/joc.362

Lexical Feedback in the Time-Invariant String Kernel (TISK) Model of Spoken Word Recognition

Journal of Cognition

Volume 7 (2024): Issue 1

By: James S. Magnuson , Heejo You and Thomas Hannagan

Open Access

|Apr 2024

Figures & Tables

A simple word recognition network incapable of encoding temporal order or repeated phonemes (Magnuson, 2018a).

TRACE’s time-as-space encoding (Magnuson, 2018b). At the bottom, inputs corresponding to /k/, /æ/, and /t/ have specific alignments (in TRACE, these would be distributed representations of over-time pseudo-spectral features). Those inputs activate phoneme templates aligned with them, which in turn activate aligned words. Darkness of shading indicates degree of activation. The maximally-activated copies of CAB, CAT and TAB are those aligned with the input, though degree of activation reflects amount and temporal distribution of phonetic overlap (CAB > CAT > TAB).

Table 1

Examples of ordered open diphones.

WORD	ORDERED OPEN DIPHONES
CAT	kæ, kt, æt
TACK	tæ, tk, æk
ACT	æk, æt, kt
DAD	dæ, dd, æd
ADD	æd
SOUL	so, sl, ol
SOLO	so x 2, sl, ol, oo

Overall TISK architecture (Figure 3 from Hannagan et al., 2013). Inputs are presented one at a time on time-specific copies of each possible phoneme. Phonemes activate corresponding diphones and single nodes in the N-phone layer. N-phone units activate corresponding words. Lateral inhibition governs lexical competition (indicated by knobbed recurrent link in top right). The greyed out arrow from words to N-phones indicated that the original TISK model did not have lexical feedback (which is the only structural alteration in the model introduced in this paper). The symmetry network (not shown; see Figure 4 from Hannagan et al., 2013) allows an input like /ba/ to activate both the /ba/ and /ab/ diphones, but activates the diphone corresponding to the input order much more strongly. See Hannagan et al. (2013, pp. 5–6) for details.

Table 2

Original (without feedback) parameters for TISK, and parameters that promote high performance with feedback. Parameters in the ‘optimized without feedback’ column that differ from original parameters are in bold. Parameters in the ‘optimized with feedback’ column that differ from parameters in the ‘optimized without feedback’ and/or ‘original TISK’ columns are also in bold.

PARAMETER	ORIGINAL TISK	OPTIMIZED WITHOUT FEEDBACK	OPTIMIZED WITH FEEDBACK
Input phoneme decay	0.010	0.001	0.001
N-phone decay	0.001	0.001	0.100
Word decay	0.010	0.050	0.050
Phoneme to N-phone	1.000	0.100	0.100
Diphone to word	0.050	0.050	0.050
Single phone to word	0.010	0.010	0.010
Word to word inhibition	–0.005	–0.005	–0.010
Positive word to N-phone feedback			0.150
Negative word to N-phone feedback			–0.050

Mean time course for targets and different classes of competitors in TRACE and TISK with and without feedback (including the original model, as well as the version with parameters ‘optimized’ for graceful degradation, as detailed later). Each line represents the mean for a class of items over all 211 words in the original TRACE lexicon. *Cohorts* overlap in the first two phonemes. *Rhymes* overlap in all but the first phoneme. *Unrelated* is the mean activation of all words in the lexicon. Ribbons indicate standard error.

RT correlations for original TISK (without feedback), TISKfb (TISK with feedback), and TRACE. Left panel: TISKfb vs. TISK. Middle panel: TISKfb vs. TRACE. Right panel: original TISK vs. TRACE. Diagonal grey lines indicate the identity line, dashed lines indicate best linear fit.

item-specific RTs in TRACE, TISKfb (with feedback), TISK without feedback with parameters optimized for noise, and original TISK (without feedback), as a function of lexical dimensions for the 211-word TRACE lexicon. Dimensions: *Length* is number of phonemes, *Embeddings* is how many words embed within the target word (e.g., CAB and IN embed in CABINET), *Onset competitors are cohorts* (words overlapping in the first two phonemes), *ex-Embeddings* are the number of words the target word embeds into (e.g., CAB embeds in CABINET, CABARET, etc.), *Neighbors* are the number of words differing from the target by no more than a 1-phoneme deletion, addition, or substitution (so-called DAS neighbors), and *Rhymes* items are items that mismatch the target only at the first phoneme (by deletion, addition, or substitution; e.g., for CAT, these would include AT, SCAT, and BAT).

Lexical effects on phoneme activations (Ganong effects) for ten 4-phoneme words (Simulation 2). We observe robust Ganong effects (lexical restoration) at each position with lexical feedback enabled, with stronger effects in later positions. The key results are that (a) greater ambiguity is apparent for continuum steps near the nonword endpoint and (b) the upward shift for the center continuum step (4). Error ribbons indicate standard error.

Retroactive phoneme restoration by following context (Simulation 3). In the lexicon, *plug* and *blush* are words, but *blug and *plush are not (even though *plush* is a word in English). Note that the delayed activations of ambiguous phonemes is due to failure to reach the activation threshold from the initial input. The discrete delay of 10 cycles is due to new TISK inputs ‘arriving’ every 10 cycles.

Phoneme restoration given noise vs. silence (Simulation 4). Mean results from simulations with ten 4-phoneme words. Top row: TISK without feedback. Bottom row: TISK with feedback. With feedback, moderate levels of noise (standard deviation ≥ 0.3) drive restoration, although the resulting activation is always less than that observed with the intact phoneme. Without feedback, noise level matters little, and even modest levels of noise drive expected phonemes to saturation. Note that phoneme activations remain at approximately 0 given silence replacement. Error ribbons depict standard error.

Effects of noise on accuracy and recognition time in TISK with feedback, and three variants of the model without feedback: the original, Hannagan et al. (2013) parameters, the no-feedback parameters optimized for graceful degradation, and the parameters optimized for feedback but with feedback turned off (Simulation 5). Ribbons indicate standard error. Feedback maximizes the ability of the model to exhibit *graceful degradation:* feedback preserves accuracy better under higher levels of noise. In contrast to results with TRACE (Magnuson et al., 2018), the feedback benefit does not extend immediately to recognition time, though an advantage emerges at high levels of noise.

Effects of noise on accuracy and recognition time in TISK with feedback and without (with optimized parameters), but restricted to words that were recognized by both models. This reveals a smaller initial difference and earlier cross-over to a feedback advantage compared to Figure 10. This suggests that the apparent disadvantage for feedback is largely due to the additional words the model with feedback can recognize at higher levels of noise. Ribbons indicate standard error.

Effects of noise on recognition time in TISK with and without feedback for one model run. Each panel’s label indicates the noise level. Red squares plot mean RT with and without feedback.

Effects of noise on recognition time in TISK with and without feedback for all 15 model runs. Each panel’s label indicates the noise level. Red squares plot the mean RT values with and without feedback. Color indicates run.

Exploration of positive (x-axis) and negative (y-axis) feedback. In each panel, the solid line is the ‘graceful degradation’ result (see Figure 11) and the dashed line is the Ganong effect. The number in the upper right of each panel is mean accuracy over the full range of noise in the graceful degradation simulations. Panels are shaded yellow if mean accuracy in graceful degradation is > 0.5, or purple if mean accuracy was > 0.4. Panels have red outlines if there is a plausible Ganong effect (maximum difference ≥ 0.15, minimum > 0). Informally, we consider panels that are yellow or purple and highlighted in red to indicate parameter ranges that result in robust performance with feedback (approximately 16% of the combinations explored here).

Further exploration of positive (x-axis) and negative (y-axis) feedback. In each panel, retroactive lexical influence simulations (as in Figure 8) are plotted with different feedback parameters. For simplicity, intact or ambiguous cases that are lexically consistent or inconsistent are averaged. Cases where, given ambiguous input, the lexically consistent phoneme’s activation excedes the inconsistent phoneme’s by 0.05 and, given consistent input, the lexically inconsistent phoneme’s activation does not excede 0.05 are shaded yellow or green. Green shading indicates cases that yield robust graceful degredation in Figure A1 (yellow or purple shading with red outline). Thus, a fairly broad range of parameters yields robust performance with feedback (green shading corresponds to ~16% of explored combinations, which includes all cases shaded in yellow or purple and outlined in red in Figure A1).

Parameter exploration without feedback. This figure shows graceful degradation results as a function of word-to-word inhibition (x-axis) and N-phone decay (y-axis) with other parameters already optimized. Parameters outside these ranges yield unstable results. A fairly narrow range of parameters (approximately 4% of explored combinations) leads to fairly robust graceful degradation results (purple shading indicates combinations that yield mean accuracy over noise levels > 0.4).

References

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.5334/joc.362 | Journal eISSN: 2514-4820

Journal RSS Feed

Language: English

Submitted on: Dec 22, 2023

Accepted on: Apr 3, 2024

Published on: Apr 26, 2024

Published by: Ubiquity Press

In partnership with: Paradigm Publishing Services

Publication frequency: 1 issue per year

Keywords:

Computational models,

neural networks,

spoken word recognition,

interaction,

feedback

© 2024 James S. Magnuson, Heejo You, Thomas Hannagan, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.

Volume 7 (2024): Issue 1

Lexical Feedback in the Time-Invariant String Kernel (TISK) Model of Spoken Word Recognition

Figures & Tables

Figure 1

Figure 2

Table 1

Figure 3

Table 2

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10

Figure 11

Figure 12

Figure 13

Figure A1

Figure A2

Figure A3

Paradigm

My account