Have a personal or library account? Click to login
Updating the German Psycholinguistic Word Toolbox with AI-Generated Estimates of Concreteness, Valence, Arousal, Age of Acquisition, and Familiarity Cover

Updating the German Psycholinguistic Word Toolbox with AI-Generated Estimates of Concreteness, Valence, Arousal, Age of Acquisition, and Familiarity

Open Access
|Jan 2026

Abstract

This article presents AI-generated estimates for five characteristics of German words: concreteness, valence, arousal, age of acquisition (AoA), and word familiarity. The estimates were generated using GPT-4o-mini, which was selected due to its good performance in previous studies. Validation studies were conducted comparing the AI-generated estimates with both human ratings and previously generated AI data to ensure their usefulness for research applications. The main results are as follows. The GPT estimates of word concreteness, valence, and arousal show a strong correlation with human ratings but are not better than the best available AI-generated estimates based on semantic vectors. The GPT estimates of AoA are good approximations of human ratings and outperform other available alternatives (except for human ratings), especially after the model was fine-tuned based on 2,000 human ratings. Fine-tuned AI-generated estimates of word familiarity have better predictive value than word frequency for word recognition in lexical decision tasks and vocabulary tests. Estimates for concreteness, valence, arousal, and AoA are available for 167,000 words, which are likely to be known to more than 90% of participants in typical adult studies. Word familiarity estimates are presented for 928,000 word forms. All data and codes, including newly collected human familiarity ratings for 11,000 words, are publicly available at https://osf.io/ghjd2/. The data may be freely used for research purposes, but not for commercial purposes.

DOI: https://doi.org/10.5334/joc.482 | Journal eISSN: 2514-4820
Language: English
Submitted on: Dec 9, 2025
|
Accepted on: Dec 23, 2025
|
Published on: Jan 8, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Javier Conde, Gonzalo Martínez, María Grandury, Carlos Arriaga, Juan Haro, Sascha Schroeder, Florian Hintz, Pedro Reviriego, Marc Brysbaert, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.