References
- Balota, D. A., Pilotti, M., & Cortese, M. J. (2001). Subjective frequency estimates for 2,938 monosyllabic words. Memory & Cognition, 29, 639–647. 10.3758/BF03200465
- Baschek, I.-L., Bredenkamp, J., Oehrle, B., & Wippich, W. (1977). Determination of imagery, concreteness and meaningfulness of 800 nouns. Zeitschrift für Experimentelle und Angewandte Psychologie, 24(3), 353–396.
- Bayer, M., Sommer, W., & Schacht, A. (2010). Reading emotional words within sentences: the impact of arousal and valence on event-related potentials. International Journal of Psychophysiology, 78(3), 299–307. 10.1016/j.ijpsycho.2010.09.004
- Bestgen, Y., & Vincze, N. (2012). Checking and bootstrapping lexical norms by means of word similarity indexes. Behavior Research Methods, 44(4), 998–1006. 10.3758/s13428-012-0195-z
- Bethke, S., Meyer, A. S., & Hintz, F. (2025). The German Auditory and Image (GAudI) vocabulary test: A new German receptive vocabulary test and its relationships to other tests measuring linguistic experience. PLoS One, 20(4):
e0318115 . 10.1371/journal.pone.0318115 - Birchenough, J. M., Davies, R., & Connelly, V. (2017). Rated age-of-acquisition norms for over 3,200 German words. Behavior research methods, 49(2), 484–501. 10.3758/s13428-016-0718-0
- Botarleanu, R. M., Watanabe, M., Dascalu, M., Crossley, S. A., & McNamara, D. S. (2024). Multilingual Age of Exposure 2.0. International Journal of Artificial Intelligence in Education, 34(4), 1353–1377. 10.1007/s40593-023-00386-7
- Brysbaert, M., Buchmeier, M., Conrad, M., Jacobs, A. M., Bölte, J., & Böhl, A. (2011). The word frequency effect: A review of recent developments and implications for the choice of frequency estimates in German. Experimental Psychology, 58(5), 412–424. 10.1027/1618-3169/a000123
- Brysbaert, M., & Ellis, A. W. (2016). Aphasia and age of acquisition: Are early-learned words more resilient? Aphasiology, 30(11), 1240–1263. 10.1080/02687038.2015.1106439
- Brysbaert, M., Martínez, G., & Reviriego, P. (2025). Moving beyond word frequency based on tally counting: AI-generated familiarity estimates of words and phrases are an interesting additional index of language knowledge. Behavior Research Methods, 57, 28. 10.3758/s13428-024-02561-7
- Brysbaert, M., Stevens, M., De Deyne, S., Voorspoels, W., & Storms, G. (2014). Norms of age of acquisition and concreteness for 30,000 Dutch words. Acta Psychologica, 150, 80–84. 10.1016/j.actpsy.2014.04.010
- Charbonnier, J., & Wartena, C. (2020). Predicting the concreteness of German words. Proceedings of the 5th Swiss Text Analytics Conference (SwissText) & 16th Conference on Natural Language Processing (KONVENS), CEUR Workshop Proceedings Vol. 2624.
- Chen, X., & Dong, Y. (2019). Evaluating objective and subjective frequency measures in L2 lexical processing. Lingua, 230, 102738. 10.1016/j.lingua.2019.102738
- Conde, J., González, M., Grandury, M., Martínez, G., Reviriego, P., & Brysbaert, M. (2025a). Psycholinguistic Word Features: a New Approach for the Evaluation of LLMs Alignment with Humans. arXiv preprint arXiv:2506.22439.
- Conde, J., Grandury, M., Fu, T., Arriaga, C., Martínez, G., Clark, T., … & Brysbaert, M. (2025b). Adding LLMs to the psycholinguistic norming toolbox: A practical guide to getting the most out of human ratings. arXiv preprint arXiv:2509.14405.
- Cordier, F., & Le Ny, J. F. (2005). Evidence for several components of word familiarity. Behavior Research Methods, 37(3), 528–537. 10.3758/BF03192724
- Gernsbacher, M. A. (1984). Resolving 20 years of inconsistent interactions between lexical familiarity and orthography, concreteness, and polysemy. Journal of Experimental Psychology: General, 113(2), 256–281. 10.1037/0096-3445.113.2.256
- Gimenes, M., & New, B. (2016). Worldlex: Twitter and blog word frequencies for 66 languages. Behavior Research Methods, 48, 963–972. 10.3758/s13428-015-0621-0
- Grandy, T. H., Lindenberger, U., & Schmiedek, F. (2020). Vampires and nurses are rated differently by younger and older adults—Age-comparative norms of imageability and emotionality for about 2500 German nouns. Behavior Research Methods, 52(3), 980–989. 10.3758/s13428-019-01294-2
- Grosse, G., Streubel, B., Gunzenhauser, C., & Saalbach, H. (2021). Let’s talk about emotions: the development of children’s emotion vocabulary from 4 to 11 years of age. Affective Science, 2(2), 150–162. 10.1007/s42761-021-00040-2
- Günther, F., Marelli, M., & Bölte, J. (2020). Semantic transparency effects in German compounds: A large dataset and multiple-task investigation. Behavior Research Methods, 52(3), 1208–1224. 10.3758/s13428-019-01311-4
- Heyman, T., & Heyman, G. (2024). The impact of ChatGPT on human data collection: A case study involving typicality norming data. Behavior Research Methods, 56(5), 4974–4981. 10.3758/s13428-023-02235-w
- Hollis, G., Westbury, C., & Lefsrud, L. (2017). Extrapolating human judgments from skip-gram vector representations of word meaning. Quarterly Journal of Experimental Psychology, 70(8), 1603–1619. 10.1080/17470218.2016.1195417
- Hussain, Z., Binz, M., Mata, R., & Wulff, D. U. (2024). A tutorial on open-source large language models for behavioral science. Behavior Research Methods, 56(8), 8214–8237. 10.3758/s13428-024-02455-8
- Jared, D., Jouravlev, O., & Joanisse, M. F. (2017). The effect of semantic transparency on the processing of morphologically derived words: Evidence from decision latencies and event-related potentials. Journal of Experimental Psychology: Learning, Memory, and Cognition, 43(3), 422–450. 10.1037/xlm0000316
- Kanske, P., & Kotz, S. A. (2010). Leipzig affective norms for German: A reliability study. Behavior Research Methods, 42(4), 987–991. 10.3758/BRM.42.4.987
- Keuleers, E., Stevens, M., Mandera, P., & Brysbaert, M. (2015). Word knowledge in the crowd: Measuring vocabulary size and word prevalence in a massive online experiment. Quarterly Journal of Experimental Psychology, 68(8), 1665–1692. 10.1080/17470218.2015.1022560
- Kliegl, R., Wei, P., Dambacher, M., Yan, M., & Zhou, X. (2011). Experimental effects and individual differences in linear mixed models: Estimating the relationship between spatial, object, and attraction effects in visual attention. Frontiers in Psychology, 1, 238. 10.3389/fpsyg.2010.00238
- Köper, M., & Schulte im Walde, S. (2016). Automatically generated affective norms of abstractness, arousal, imageability and valence for 350 000 German lemmas. Proceedings of the 10th International Conference on Language Resources and Evaluation (Portoroz), pp. 2595–2598.
- Krautz, A. E., & Keuleers, E. (2022). LinguaPix database: A megastudy of picture-naming norms. Behavior Research Methods, 54(2), 941–954. 10.3758/s13428-021-01651-0
- Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods, 44(4), 978–990. 10.3758/s13428-012-0210-4
- Lahl, O., Göritz, A. S., Pietrowsky, R., & Rosenberg, J. (2009). Using the World-Wide Web to obtain large-scale word norms: 190,212 ratings on a set of 2,654 German nouns. Behavior Research Methods, 41, 13–19. 10.3758/BRM.41.1.13
- Lenhard, W., & Lenhard, A. (2021). Bedeutung und Diagnostik des Wortschatzes am Beispiel des Peabody Picture Vocabulary Test (PPVT-IV). Bulletin suisse de linguistique appliquée, 114.
- Lin, Z. (2024). AnalysisLin: Exploratory Data Analysis. 10.32614/CRAN.package.AnalysisLin
- Lüdtke, J., & Hugentobler, K. G. (2022). Using emotional word ratings to extrapolate norms for valence, arousal, imageability, and concreteness: The German list of extrapolated affective norms (GLEAN). Poster presented at KogWis2022. Available at
https://osf.io/a6w53/files/osfstorage - Lynott, D., Connell, L., Brysbaert, M., Brand, J., & Carney, J. (2020). The Lancaster Sensorimotor Norms: multidimensional measures of perceptual and action strength for 40,000 English words. Behavior research methods, 52, 1271–1291. 10.3758/s13428-019-01316-z
- Mandera, P., Keuleers, E., & Brysbaert, M. (2015). How useful are corpus-based methods for extrapolating psycholinguistic variables? Quarterly Journal of Experimental Psychology, 68(8), 1623–1642. 10.1080/17470218.2014.988735
- Martínez, G., Conde, J., Merino-Gómez, E., Bermúdez-Margaretto, B., Hernández, J. A., Reviriego, P., & Brysbaert, M. (2024). Establishing vocabulary tests as a benchmark for evaluating large language models. PloS One, 19(12),
e0308259 . 10.1371/journal.pone.0308259 - Martínez, G., Conde, J., Reviriego, P., & Brysbaert, M. (2025a). AI-generated estimates of familiarity, concreteness, valence and arousal for over 100,000 Spanish words. Quarterly Journal of Experimental Psychology. Advance publication at 10.1177/17470218241306694
- Martínez, G., Molero, J. D., González, S., Conde, J., Brysbaert, M., & Reviriego, P. (2025b). Using large language models to estimate features of multi-word expressions: Concreteness, valence, arousal. Behavior Research Methods, 57, 5. 10.3758/s13428-024-02515-z
- Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., … & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744.
- Pauligk, S., Kotz, S. A., & Kanske, P. (2019). Differential impact of emotion on semantic processing of abstract and concrete words: ERP and fMRI evidence. Scientific reports, 9(1), 14439. 10.1038/s41598-019-50755-3
- Schepens, J., Dijkstra, T., & Grootjen, F. (2012). Distributions of cognates in Europe as based on Levenshtein distance. Bilingualism: Language and Cognition, 15(1), 157–166. 10.1017/S1366728910000623
- Schmidtke, D., & Conrad, M. (2018). Effects of affective phonological iconicity in online language processing: Evidence from a letter search task. Journal of Experimental Psychology: General, 147(10), 1544. 10.1037/xge0000499
- Schmidtke, D. S., Schröder, T., Jacobs, A. M., & Conrad, M. (2014). ANGST: Affective norms for German sentiment terms, derived from the affective norms for English words. Behavior Research Methods, 46, 1108–1118. 10.3758/s13428-013-0426-y
- Schröder, A., Gemballa, T., Ruppin, S., & Wartenburger, I. (2012). German norms for semantic typicality, age of acquisition, and concept familiarity. Behavior Research Methods, 44, 380–394. 10.3758/s13428-011-0164-y
- Schroeder, S., Würzner, K. M., Heister, J., Geyken, A., & Kliegl, R. (2015). childLex: A lexical database of German read by children. Behavior Research Methods, 47, 1085–1094. 10.3758/s13428-014-0528-1
- Schroeders, U., & Achaa-Amankwaa, P. (2025). Developing NOVA: Next-Generation Open Vocabulary Assessment. Unpublished manuscript. Available at
https://osf.io/vhakw_v1/download - Schröter, P., & Schroeder, S. (2017). The Developmental Lexicon Project: A behavioral database to investigate visual word recognition across the lifespan. Behavior Research Methods, 49, 2183–2203. 10.3758/s13428-016-0851-9
- Scott, G. G., Keitel, A., Becirspahic, M., Yao, B., & Sereno, S. C. (2019). The Glasgow Norms: Ratings of 5,500 words on nine scales. Behavior Research Methods, 51, 1258–1270. 10.3758/s13428-018-1099-3
- Sendín, E., Conde, J., Reviriego, P., Haro, J., Ferré, P., Hinojosa, J. A., & Brysbaert, M. (2025, June).
Combining the power of large language models with fine-tuning based on strategically collected human ratings: A case study about age-of-acquisition estimates of Spanish words . ResearchGate. 10.13140/RG.2.2.27255.12967 - Smith, K. E., Woodard, K., & Pollak, S. D. (2025). Arousal may not be anything to get excited about. Emotion Review, 17(1), 3–15. 10.1177/17540739241303499
- Thompson, B., & Lupyan, G. (2018). Automatic estimation of lexical concreteness in 77 languages. Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 40).
https://escholarship.org/uc/item/7dz7k3k1 - Trott, S. (2024). Can large language models help augment English psycholinguistic datasets? Behavior Research Methods, 56, 6082–6100. 10.3758/s13428-024-02337-z
- van Paridon, J., & Thompson, B. (2021). subs2vec: Word embeddings from subtitles in 55 languages. Behavior Research Methods, 53(2), 629–655. 10.3758/s13428-020-01406-3
- Võ, M. L., Conrad, M., Kuchinke, L., Urton, K., Hofmann, M. J., & Jacobs, A. M. (2009). The Berlin affective word list reloaded (BAWL-R). Behavior Research Methods, 41(2), 534–538. 10.3758/BRM.41.2.534
- Võ, M. L. H., Jacobs, A. M., & Conrad, M. (2006). Cross-validating the Berlin affective word list (BAWL). Behavior Research Methods, 38, 606–609. 10.3758/BF03193892
- Westbury, C. (2014). You can’t drink a word: Lexical and individual emotionality affect subjective familiarity judgments. Journal of psycholinguistic research, 43(5), 631–649. 10.1007/s10936-013-9266-2
- Wippich, W., & Bredenkamp, J. (1979).
Bildhaftigkeit und Lernen . Dr. Dietrich Steinkopff Verlag. 10.1007/978-3-642-85759-1 - Wood, S. N. (2001). mgcv: GAMs and generalized ridge regression for R. R News, 1(2), 20–25.
- Xu, Z., Liu, J., & Fan, L. (2025). Affective Norms for German as a Second Language (ANGL2). Behavior Research Methods, 57, 6. 10.3758/s13428-024-02539-5
- Yap, M. J., Pexman, P. M., Wellsby, M., Hargreaves, I. S., & Huff, M. J. (2012). An abundance of riches: Cross-task comparisons of semantic richness effects in visual word recognition. Frontiers in Human Neuroscience, 6, 72. 10.3389/fnhum.2012.00072
- Zhao, H., Chen, H., Yang, F., Liu, N., Deng, H., Cai, H., … & Du, M. (2024). Explainability for large language models: A survey. ACM Transactions on Intelligent Systems and Technology, 15(2), 1–38. 10.1145/3639372
