Have a personal or library account? Click to login
de-Corp: A Corpus of German-language Fiction and Non-Fiction (1780–1930) Cover

de-Corp: A Corpus of German-language Fiction and Non-Fiction (1780–1930)

Open Access
|Oct 2025

Figures & Tables

johd-11-350-g1.png
Figure 1

Fiction Dataset. Distribution of books, sentences, and tokens across decades (1780–1930). The y-axis shows counts: the number of books (blue) is given in absolute values, while sentence counts (orange) are divided by 10,000 and token counts (green) by 100,000 for visualization purposes. This scaling applies to all subsequent figures.

Table 1

Descriptive statistics for token counts per text in the fiction corpus. Values indicate minimum, maximum, median, mean, and standard deviation of token counts across texts.4

TOKENS PER TEXT
Min658
Max374,856
Median48,980
Mean58,995
Std. Dev.45,769
johd-11-350-g2.png
Figure 2

Fiction dataset. Overview of genres.

johd-11-350-g3.png
Figure 3

Fiction Dataset. Literary sub-genres.6

johd-11-350-g4.png
Figure 4

Non-Fiction Dataset. Distribution of books, sentences, and tokens across decades (1780–1940).

Table 2

Descriptive statistics for token counts per text in the non-fiction corpus.

TOKENS PER TEXT
Min2,583
Max978,656
Median64,298
Mean80,670
Std. Dev.75,761
johd-11-350-g5.png
Figure 5

Non-Fiction Dataset. Literary sub-genres.8

DOI: https://doi.org/10.5334/johd.350 | Journal eISSN: 2059-481X
Language: English
Submitted on: Jun 22, 2025
|
Accepted on: Sep 2, 2025
|
Published on: Oct 10, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Katrin Rohrbacher, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.