Have a personal or library account? Click to login
de-Corp: A Corpus of German-language Fiction and Non-Fiction (1780–1930) Cover

de-Corp: A Corpus of German-language Fiction and Non-Fiction (1780–1930)

Open Access
|Oct 2025

Abstract

de-Corp is a corpus of ~5000 German-language fiction and non-fiction texts published between 1780 and 1930 and 1940 respectively, compiled from the German and U.S. Project Gutenberg libraries. It includes detailed metadata on genre, publication year, and author gender, offering over 300 million tokens across 1,400+ unique authors. The dataset supports large-scale historical and literary analysis and is especially valuable for research in Computational Literary Studies and Computational Linguistics.

DOI: https://doi.org/10.5334/johd.350 | Journal eISSN: 2059-481X
Language: English
Submitted on: Jun 22, 2025
|
Accepted on: Sep 2, 2025
|
Published on: Oct 10, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Katrin Rohrbacher, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.