Have a personal or library account? Click to login
Predicting Lexical Norms: A Comparison between a Word Association Model and Text-Based Word Co-occurrence Models Cover

Predicting Lexical Norms: A Comparison between a Word Association Model and Text-Based Word Co-occurrence Models

Open Access
|Nov 2018

Figures & Tables

Table 1

Information about the lexico-semantic norms used in Study 1 and 2: Amount of words, number of raters per word, and split-half reliabilities.

Study 1Study 2
WordsRatersReliabilityWordsRatersReliability
Valencea4,29964.99d13,91520.91
Arousala4,29964.97d13,91520.69
Dominancea4,29964.96d13,91520.77
AoAb4,29932.97d30,12118+.92
Concretenessc30,07015.91–.93d,e37,05825+

[i] a Norms from Moors et al. (2013) for Study 1 and from Warriner et al. (2013) for Study 2. b Norms from Moors et al. (2013) for Study 1 and from Kuperman et al. (2012) for Study 2. c Norms from Brysbaert, Stevens, et al. (2014) for Study 1 and from Brysbaert, Warriner, and Kuperman (2014) for Study 2. d Spearman-Brown corrected split-half correlations calculated on 10,000 different randomizations of the participants. e Reliabilities of each of five lists of ca. 6,000 words were within this range.

joc-1-1-50-g1.png
Figure 1

Correlations between predicted ratings and human ratings for valence, arousal, dominance, AoA, and concreteness, using association data or word co-occurrence data. Values of k are 1 to 50, 60, 70, 80, 90, and 100.

Table 2

The highest correlations and 95% confidence intervals for each variable per source of data (associations and text co-occurrences) using k-NN. All cross-validation correlations use the leave-one-out principle. The respective size of k is listed between square brackets.

k-NN
NAssociationsWord co-occurrences
Valence2,831.91 (.91–.92) [50].78 (.77–.80) [38]
Arousal2,831.84 (.83–.85) [19].73 (.71–.75) [8]
Dominance2,831.84 (.83–.85) [8].66 (.64–.68) [8]
AoA2,831.71 (.69–.73) [43].64 (.61–.66) [24]
Concreteness2,831.87 (.86–.88) [10].87 (.86–.88) [11]
joc-1-1-50-g2.png
Figure 2

Correlations between estimated values based on the word association data and human ratings for valence, arousal, dominance, AoA, and concreteness. Values of k are 1 to 50, 60, 70, 80, 90, and 100.

Table 3

Highest correlations (r), 95% confidence intervals (95% CI), sample size (N) for each variable using k-NN with their respective value of k (k). All cross-validation correlations use the leave-one-out principle.

Nr95% CIk
Valence8770.86(.86–.87)24
Arousal8770.69(.68–.70)44
Dominance8770.75(.74–.76)25
AoA10032.59(.58–.61)26
Concreteness10957.87(.86–.87)8
Table 4

Highest correlations (r), 95% confidence intervals (95% CI), sample size (N) for each variable using k-NN with their respective value of k (k), for the ANEW (Bradley & Lang, 1999) norms. All cross-validation correlations use the leave-one-out principle.

Ni95% CIk
Valence946.92(.91–.93)11
Arousal946.74(.71–.77)10
Dominance946.83(.81–.85)10
Table 5

Highest correlations (r), 95% confidence intervals (95% CI), sample size (N) for each variable using k-NN with their respective value of k (k). Data is trained on the Warriner et al. (2013) norms, and tested with the ANEW (Bradley & Lang, 2017) norms.

Nr95% CIk
Valence2156.89(.88–.89)13
Arousal2156.71(.68–.73)24
Dominance2156.76(.74–.77)23
DOI: https://doi.org/10.5334/joc.50 | Journal eISSN: 2514-4820
Language: English
Submitted on: Jul 1, 2018
Accepted on: Nov 6, 2018
Published on: Nov 27, 2018
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2018 Hendrik Vankrunkelsven, Steven Verheyen, Gert Storms, Simon De Deyne, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.