Updating the German Psycholinguistic Word Toolbox with AI-Generated Estimates of Concreteness, Valence, Arousal, Age of Acquisition, and Familiarity

Javier Conde; Gonzalo Martínez; María Grandury; Carlos Arriaga; Juan Haro; Sascha Schroeder; Florian Hintz; Pedro Reviriego; Marc Brysbaert

doi:10.5334/joc.482

References

Balota, D. A., Pilotti, M., & Cortese, M. J. (2001). Subjective frequency estimates for 2,938 monosyllabic words. Memory & Cognition, 29, 639–647. 10.3758/BF03200465
Open DOI Search in Google Scholar Back to article
Baschek, I.-L., Bredenkamp, J., Oehrle, B., & Wippich, W. (1977). Determination of imagery, concreteness and meaningfulness of 800 nouns. Zeitschrift für Experimentelle und Angewandte Psychologie, 24(3), 353–396.
Search in Google Scholar Back to article
Bayer, M., Sommer, W., & Schacht, A. (2010). Reading emotional words within sentences: the impact of arousal and valence on event-related potentials. International Journal of Psychophysiology, 78(3), 299–307. 10.1016/j.ijpsycho.2010.09.004
Open DOI Search in Google Scholar Back to article
Bestgen, Y., & Vincze, N. (2012). Checking and bootstrapping lexical norms by means of word similarity indexes. Behavior Research Methods, 44(4), 998–1006. 10.3758/s13428-012-0195-z
Open DOI Search in Google Scholar Back to article
Bethke, S., Meyer, A. S., & Hintz, F. (2025). The German Auditory and Image (GAudI) vocabulary test: A new German receptive vocabulary test and its relationships to other tests measuring linguistic experience. PLoS One, 20(4): e0318115. 10.1371/journal.pone.0318115
Open DOI Search in Google Scholar Back to article
Birchenough, J. M., Davies, R., & Connelly, V. (2017). Rated age-of-acquisition norms for over 3,200 German words. Behavior research methods, 49(2), 484–501. 10.3758/s13428-016-0718-0
Open DOI Search in Google Scholar Back to article
Botarleanu, R. M., Watanabe, M., Dascalu, M., Crossley, S. A., & McNamara, D. S. (2024). Multilingual Age of Exposure 2.0. International Journal of Artificial Intelligence in Education, 34(4), 1353–1377. 10.1007/s40593-023-00386-7
Open DOI Search in Google Scholar Back to article
Brysbaert, M., Buchmeier, M., Conrad, M., Jacobs, A. M., Bölte, J., & Böhl, A. (2011). The word frequency effect: A review of recent developments and implications for the choice of frequency estimates in German. Experimental Psychology, 58(5), 412–424. 10.1027/1618-3169/a000123
Open DOI Search in Google Scholar Back to article
Brysbaert, M., & Ellis, A. W. (2016). Aphasia and age of acquisition: Are early-learned words more resilient? Aphasiology, 30(11), 1240–1263. 10.1080/02687038.2015.1106439
Open DOI Search in Google Scholar Back to article
Brysbaert, M., Martínez, G., & Reviriego, P. (2025). Moving beyond word frequency based on tally counting: AI-generated familiarity estimates of words and phrases are an interesting additional index of language knowledge. Behavior Research Methods, 57, 28. 10.3758/s13428-024-02561-7
Open DOI Search in Google Scholar Back to article
Brysbaert, M., Stevens, M., De Deyne, S., Voorspoels, W., & Storms, G. (2014). Norms of age of acquisition and concreteness for 30,000 Dutch words. Acta Psychologica, 150, 80–84. 10.1016/j.actpsy.2014.04.010
Open DOI Search in Google Scholar Back to article
Charbonnier, J., & Wartena, C. (2020). Predicting the concreteness of German words. Proceedings of the 5th Swiss Text Analytics Conference (SwissText) & 16th Conference on Natural Language Processing (KONVENS), CEUR Workshop Proceedings Vol. 2624.
Search in Google Scholar Back to article
Chen, X., & Dong, Y. (2019). Evaluating objective and subjective frequency measures in L2 lexical processing. Lingua, 230, 102738. 10.1016/j.lingua.2019.102738
Open DOI Search in Google Scholar Back to article
Conde, J., González, M., Grandury, M., Martínez, G., Reviriego, P., & Brysbaert, M. (2025a). Psycholinguistic Word Features: a New Approach for the Evaluation of LLMs Alignment with Humans. arXiv preprint arXiv:2506.22439.
Search in Google Scholar Back to article
Conde, J., Grandury, M., Fu, T., Arriaga, C., Martínez, G., Clark, T., … & Brysbaert, M. (2025b). Adding LLMs to the psycholinguistic norming toolbox: A practical guide to getting the most out of human ratings. arXiv preprint arXiv:2509.14405.
Search in Google Scholar Back to article
Cordier, F., & Le Ny, J. F. (2005). Evidence for several components of word familiarity. Behavior Research Methods, 37(3), 528–537. 10.3758/BF03192724
Open DOI Search in Google Scholar Back to article
Gernsbacher, M. A. (1984). Resolving 20 years of inconsistent interactions between lexical familiarity and orthography, concreteness, and polysemy. Journal of Experimental Psychology: General, 113(2), 256–281. 10.1037/0096-3445.113.2.256
Open DOI Search in Google Scholar Back to article
Gimenes, M., & New, B. (2016). Worldlex: Twitter and blog word frequencies for 66 languages. Behavior Research Methods, 48, 963–972. 10.3758/s13428-015-0621-0
Open DOI Search in Google Scholar Back to article
Grandy, T. H., Lindenberger, U., & Schmiedek, F. (2020). Vampires and nurses are rated differently by younger and older adults—Age-comparative norms of imageability and emotionality for about 2500 German nouns. Behavior Research Methods, 52(3), 980–989. 10.3758/s13428-019-01294-2
Open DOI Search in Google Scholar Back to article
Grosse, G., Streubel, B., Gunzenhauser, C., & Saalbach, H. (2021). Let’s talk about emotions: the development of children’s emotion vocabulary from 4 to 11 years of age. Affective Science, 2(2), 150–162. 10.1007/s42761-021-00040-2
Open DOI Search in Google Scholar Back to article
Günther, F., Marelli, M., & Bölte, J. (2020). Semantic transparency effects in German compounds: A large dataset and multiple-task investigation. Behavior Research Methods, 52(3), 1208–1224. 10.3758/s13428-019-01311-4
Open DOI Search in Google Scholar Back to article
Heyman, T., & Heyman, G. (2024). The impact of ChatGPT on human data collection: A case study involving typicality norming data. Behavior Research Methods, 56(5), 4974–4981. 10.3758/s13428-023-02235-w
Open DOI Search in Google Scholar Back to article
Hollis, G., Westbury, C., & Lefsrud, L. (2017). Extrapolating human judgments from skip-gram vector representations of word meaning. Quarterly Journal of Experimental Psychology, 70(8), 1603–1619. 10.1080/17470218.2016.1195417
Open DOI Search in Google Scholar Back to article
Hussain, Z., Binz, M., Mata, R., & Wulff, D. U. (2024). A tutorial on open-source large language models for behavioral science. Behavior Research Methods, 56(8), 8214–8237. 10.3758/s13428-024-02455-8
Open DOI Search in Google Scholar Back to article
Jared, D., Jouravlev, O., & Joanisse, M. F. (2017). The effect of semantic transparency on the processing of morphologically derived words: Evidence from decision latencies and event-related potentials. Journal of Experimental Psychology: Learning, Memory, and Cognition, 43(3), 422–450. 10.1037/xlm0000316
Open DOI Search in Google Scholar Back to article
Kanske, P., & Kotz, S. A. (2010). Leipzig affective norms for German: A reliability study. Behavior Research Methods, 42(4), 987–991. 10.3758/BRM.42.4.987
Open DOI Search in Google Scholar Back to article
Keuleers, E., Stevens, M., Mandera, P., & Brysbaert, M. (2015). Word knowledge in the crowd: Measuring vocabulary size and word prevalence in a massive online experiment. Quarterly Journal of Experimental Psychology, 68(8), 1665–1692. 10.1080/17470218.2015.1022560
Open DOI Search in Google Scholar Back to article
Kliegl, R., Wei, P., Dambacher, M., Yan, M., & Zhou, X. (2011). Experimental effects and individual differences in linear mixed models: Estimating the relationship between spatial, object, and attraction effects in visual attention. Frontiers in Psychology, 1, 238. 10.3389/fpsyg.2010.00238
Open DOI Search in Google Scholar Back to article
Köper, M., & Schulte im Walde, S. (2016). Automatically generated affective norms of abstractness, arousal, imageability and valence for 350 000 German lemmas. Proceedings of the 10th International Conference on Language Resources and Evaluation (Portoroz), pp. 2595–2598.
Search in Google Scholar Back to article
Krautz, A. E., & Keuleers, E. (2022). LinguaPix database: A megastudy of picture-naming norms. Behavior Research Methods, 54(2), 941–954. 10.3758/s13428-021-01651-0
Open DOI Search in Google Scholar Back to article
Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods, 44(4), 978–990. 10.3758/s13428-012-0210-4
Open DOI Search in Google Scholar Back to article
Lahl, O., Göritz, A. S., Pietrowsky, R., & Rosenberg, J. (2009). Using the World-Wide Web to obtain large-scale word norms: 190,212 ratings on a set of 2,654 German nouns. Behavior Research Methods, 41, 13–19. 10.3758/BRM.41.1.13
Open DOI Search in Google Scholar Back to article
Lenhard, W., & Lenhard, A. (2021). Bedeutung und Diagnostik des Wortschatzes am Beispiel des Peabody Picture Vocabulary Test (PPVT-IV). Bulletin suisse de linguistique appliquée, 114.
Search in Google Scholar Back to article
Lin, Z. (2024). AnalysisLin: Exploratory Data Analysis. 10.32614/CRAN.package.AnalysisLin
Open DOI Search in Google Scholar Back to article
Lüdtke, J., & Hugentobler, K. G. (2022). Using emotional word ratings to extrapolate norms for valence, arousal, imageability, and concreteness: The German list of extrapolated affective norms (GLEAN). Poster presented at KogWis2022. Available at https://osf.io/a6w53/files/osfstorage
Search in Google Scholar Back to article
Lynott, D., Connell, L., Brysbaert, M., Brand, J., & Carney, J. (2020). The Lancaster Sensorimotor Norms: multidimensional measures of perceptual and action strength for 40,000 English words. Behavior research methods, 52, 1271–1291. 10.3758/s13428-019-01316-z
Open DOI Search in Google Scholar Back to article
Mandera, P., Keuleers, E., & Brysbaert, M. (2015). How useful are corpus-based methods for extrapolating psycholinguistic variables? Quarterly Journal of Experimental Psychology, 68(8), 1623–1642. 10.1080/17470218.2014.988735
Open DOI Search in Google Scholar Back to article
Martínez, G., Conde, J., Merino-Gómez, E., Bermúdez-Margaretto, B., Hernández, J. A., Reviriego, P., & Brysbaert, M. (2024). Establishing vocabulary tests as a benchmark for evaluating large language models. PloS One, 19(12), e0308259. 10.1371/journal.pone.0308259
Open DOI Search in Google Scholar Back to article
Martínez, G., Conde, J., Reviriego, P., & Brysbaert, M. (2025a). AI-generated estimates of familiarity, concreteness, valence and arousal for over 100,000 Spanish words. Quarterly Journal of Experimental Psychology. Advance publication at 10.1177/17470218241306694
Open DOI Search in Google Scholar Back to article
Martínez, G., Molero, J. D., González, S., Conde, J., Brysbaert, M., & Reviriego, P. (2025b). Using large language models to estimate features of multi-word expressions: Concreteness, valence, arousal. Behavior Research Methods, 57, 5. 10.3758/s13428-024-02515-z
Open DOI Search in Google Scholar Back to article
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., … & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744.
Search in Google Scholar Back to article
Pauligk, S., Kotz, S. A., & Kanske, P. (2019). Differential impact of emotion on semantic processing of abstract and concrete words: ERP and fMRI evidence. Scientific reports, 9(1), 14439. 10.1038/s41598-019-50755-3
Open DOI Search in Google Scholar Back to article
Schepens, J., Dijkstra, T., & Grootjen, F. (2012). Distributions of cognates in Europe as based on Levenshtein distance. Bilingualism: Language and Cognition, 15(1), 157–166. 10.1017/S1366728910000623
Open DOI Search in Google Scholar Back to article
Schmidtke, D., & Conrad, M. (2018). Effects of affective phonological iconicity in online language processing: Evidence from a letter search task. Journal of Experimental Psychology: General, 147(10), 1544. 10.1037/xge0000499
Open DOI Search in Google Scholar Back to article
Schmidtke, D. S., Schröder, T., Jacobs, A. M., & Conrad, M. (2014). ANGST: Affective norms for German sentiment terms, derived from the affective norms for English words. Behavior Research Methods, 46, 1108–1118. 10.3758/s13428-013-0426-y
Open DOI Search in Google Scholar Back to article
Schröder, A., Gemballa, T., Ruppin, S., & Wartenburger, I. (2012). German norms for semantic typicality, age of acquisition, and concept familiarity. Behavior Research Methods, 44, 380–394. 10.3758/s13428-011-0164-y
Open DOI Search in Google Scholar Back to article
Schroeder, S., Würzner, K. M., Heister, J., Geyken, A., & Kliegl, R. (2015). childLex: A lexical database of German read by children. Behavior Research Methods, 47, 1085–1094. 10.3758/s13428-014-0528-1
Open DOI Search in Google Scholar Back to article
Schroeders, U., & Achaa-Amankwaa, P. (2025). Developing NOVA: Next-Generation Open Vocabulary Assessment. Unpublished manuscript. Available at https://osf.io/vhakw_v1/download
Search in Google Scholar Back to article
Schröter, P., & Schroeder, S. (2017). The Developmental Lexicon Project: A behavioral database to investigate visual word recognition across the lifespan. Behavior Research Methods, 49, 2183–2203. 10.3758/s13428-016-0851-9
Open DOI Search in Google Scholar Back to article
Scott, G. G., Keitel, A., Becirspahic, M., Yao, B., & Sereno, S. C. (2019). The Glasgow Norms: Ratings of 5,500 words on nine scales. Behavior Research Methods, 51, 1258–1270. 10.3758/s13428-018-1099-3
Open DOI Search in Google Scholar Back to article
Sendín, E., Conde, J., Reviriego, P., Haro, J., Ferré, P., Hinojosa, J. A., & Brysbaert, M. (2025, June). Combining the power of large language models with fine-tuning based on strategically collected human ratings: A case study about age-of-acquisition estimates of Spanish words. ResearchGate. 10.13140/RG.2.2.27255.12967
Open DOI Search in Google Scholar Back to article
Smith, K. E., Woodard, K., & Pollak, S. D. (2025). Arousal may not be anything to get excited about. Emotion Review, 17(1), 3–15. 10.1177/17540739241303499
Open DOI Search in Google Scholar Back to article
Thompson, B., & Lupyan, G. (2018). Automatic estimation of lexical concreteness in 77 languages. Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 40). https://escholarship.org/uc/item/7dz7k3k1
Search in Google Scholar Back to article
Trott, S. (2024). Can large language models help augment English psycholinguistic datasets? Behavior Research Methods, 56, 6082–6100. 10.3758/s13428-024-02337-z
Open DOI Search in Google Scholar Back to article
van Paridon, J., & Thompson, B. (2021). subs2vec: Word embeddings from subtitles in 55 languages. Behavior Research Methods, 53(2), 629–655. 10.3758/s13428-020-01406-3
Open DOI Search in Google Scholar Back to article
Võ, M. L., Conrad, M., Kuchinke, L., Urton, K., Hofmann, M. J., & Jacobs, A. M. (2009). The Berlin affective word list reloaded (BAWL-R). Behavior Research Methods, 41(2), 534–538. 10.3758/BRM.41.2.534
Open DOI Search in Google Scholar Back to article
Võ, M. L. H., Jacobs, A. M., & Conrad, M. (2006). Cross-validating the Berlin affective word list (BAWL). Behavior Research Methods, 38, 606–609. 10.3758/BF03193892
Open DOI Search in Google Scholar Back to article
Westbury, C. (2014). You can’t drink a word: Lexical and individual emotionality affect subjective familiarity judgments. Journal of psycholinguistic research, 43(5), 631–649. 10.1007/s10936-013-9266-2
Open DOI Search in Google Scholar Back to article
Wippich, W., & Bredenkamp, J. (1979). Bildhaftigkeit und Lernen. Dr. Dietrich Steinkopff Verlag. 10.1007/978-3-642-85759-1
Open DOI Search in Google Scholar Back to article
Wood, S. N. (2001). mgcv: GAMs and generalized ridge regression for R. R News, 1(2), 20–25.
Search in Google Scholar Back to article
Xu, Z., Liu, J., & Fan, L. (2025). Affective Norms for German as a Second Language (ANGL2). Behavior Research Methods, 57, 6. 10.3758/s13428-024-02539-5
Open DOI Search in Google Scholar Back to article
Yap, M. J., Pexman, P. M., Wellsby, M., Hargreaves, I. S., & Huff, M. J. (2012). An abundance of riches: Cross-task comparisons of semantic richness effects in visual word recognition. Frontiers in Human Neuroscience, 6, 72. 10.3389/fnhum.2012.00072
Open DOI Search in Google Scholar Back to article
Zhao, H., Chen, H., Yang, F., Liu, N., Deng, H., Cai, H., … & Du, M. (2024). Explainability for large language models: A survey. ACM Transactions on Intelligent Systems and Technology, 15(2), 1–38. 10.1145/3639372
Open DOI Search in Google Scholar Back to article

Updating the German Psycholinguistic Word Toolbox with AI-Generated Estimates of Concreteness, Valence, Arousal, Age of Acquisition, and Familiarity

References

Paradigm

My account