(1) Overview
Repository location
The dataset is available on Zenodo: https://zenodo.org/doi/10.5281/zenodo.15720778.
Context
The collection Der Neue Pitaval holds a central position in the development of German-language crime literature. Edited by the German authors—and solicitors—Julius Eduard Hitzig and Willibald Alexis (Wilhelm Häring), it represents an interdisciplinary project aimed at compiling “the most interesting criminal cases from all countries, from earlier and more recent times”.1 Der Neue Pitaval comprises a total of 570 criminal case studies, published in sixty volumes between 1842 and 1890.
Its title alludes to its influential French predecessor, François Gayot de Pitaval’s Causes célèbres et intéressantes, avec les jugemens qui les ont décidées (1734/41), a collection that established Causes célèbres as a genre in France. Pitaval’s criminal case studies (and adaptations like François Richer’s) entered the German-speaking world through translations by Carl Wilhelm Franz and Friedrich Schiller, as well as adaptations such as Merkwürdige Rechtsfälle by Anselm von Feuerbach (1808–1811). Der Neue Pitaval built on these traditions and significantly broadened the genre’s reach in the mid-19th century (Behrens & Zelle, 2020b; De Doncker, 2017a).
On the one hand, Der Neue Pitaval continues a literary tradition: it explicitly refers to the aforementioned collections as historical predecessors and partially draws on them as sources (Behrens & Zelle, 2020a; Foik & Löhr, 2023). On the other hand, and more importantly, Der Neue Pitaval innovated the genre by combining literary and legal modes of narration and addressing a broader readership–beyond experts in law and psychology, it also targeted the educated public. Der Neue Pitaval thus balanced educational purpose with narrative appeal, combining the factual with an intent to entertain. This combination contributed considerably to Pitaval’s popularity and established the ‘Pitaval-Geschichten’ as a new genre in German literature.
From a literary-historical perspective, the criminal case studies of Der Neue Pitaval are of interest for two main reasons. First, they are regarded as early forms or prototypes of German crime literature and contributed to the development of the genre’s narrative conventions (Weitin, 2009, pp. 257–291; Beck, 2014; Kirchmeier, 2015; Linder & Schönert, 1983). Second, the narration of factual crime and jurisprudence served educational purposes within the field of law, thereby reflecting legal discourse and developments (Speth, 2024, 2023; Richeux & Zein, 2018; Zelle, 2015; Linder, 2013). Topics such as the significance of evidence and the associated emergence of forensic science are addressed, alongside questions of testimony and the increasing subjectification of law, where individual perspectives and the experiences of those affected by legal norms gain greater prominence (Speth, 2021; Saupe, 2020; De Doncker, 2017b).
The collection has therefore been studied across literary and legal historical contexts, though existing research tends to focus on individual case studies—particularly from the 1840s and 1850s (Linder, 2013; Schönert, 1983). The primary aim of our project is thus to analyze the collection in its entirety, in order to identify what characterizes Der Neue Pitaval as a collection (Weitin & Herget, forthcoming).
The corpus comprises all 570 texts (‘Pitaval-Geschichten’) published–excluding paratextual elements such as introductions and dedications–totaling 6,260,079 tokens. Figure 1 illustrates the variations in text lengths, sorted by volume.

Figure 1
Tokens per volume: whiskers show minimum and maximum values; boxes represent the interquartile range (25th–75th percentiles), with the horizontal line marking the median.
The case studies have a median length of 7,934 tokens, while the mean length is 10,982.59 tokens. Among the shortest texts in the corpus are 12 of the 15 so-called Criminalistische Miscellen, more accurately described as historical notes than criminal case studies.
(2) Method
Steps – The LitLab in Darmstadt employed an established workflow for automatic text recognition,2 using a Microbox book scanner and three workstations running ABBYY Recognition Server software, specifically optimized for Fraktur Optical Character Recognition (OCR). This setup enables high-quality recognition of historical printed and digitized materials (Jain et al., 2021).
For Der Neue Pitaval, digital copies of all volumes were available from the Munich Digitization Centre of the Bavarian State Library and Harvard Library’s digital collections.3, 4
After corpus compilation, automatic OCR was applied, followed by manual correction using ABBYY’s verification stations, which flag characters recognized with low confidence as well as potential spelling errors.5 To improve recognition accuracy, the integrated dictionaries were significantly expanded and adapted to the legal and linguistic conventions of the 19th century.
Quality control
Depending on scan quality, the number of errors per volume varied. On average, ABBYY flagged 30.73 spelling errors and low-confidence character recognitions per page, which were then manually reviewed and corrected by trained student assistants.
(3) Dataset Description
Repository name
Zenodo
Object name
DerNeuePitaval_v1.2.zip (15.9 MB), Pitaval_Gesamtkorpus.zip (8.3 GB).
Format names and versions
‘DerNeuePitaval_v1.2.zip’ is an updated version of the original dataset ‘Pitaval.zip’ and contains 570 UTF-8 encoded .txt files, along with an additional metadata table (see Figure 2). Metadata is also embedded in the filenames in a machine-readable format: volume_yearpublishing_positionvolume_title_yearcase.txt. For example, Bd04_1843_10_DerZiegelbrenneralsMoerder_1724–1730.txt indicates that Der Ziegelbrenner als Mörder is the tenth case study in volume 4 (published in 1843), describing a case from 1724–1730.

Figure 2
Overview of the metadata table.
In addition to the raw text files, scanned images of the original volumes of Der Neue Pitaval are available in the archive Pitaval_Gesamtkorpus.zip (version 1, https://doi.org/10.5281/zenodo.6682897). These scans serve as (1) the basis for the OCR process, (2) a source of bibliographic metadata, and (3) the reference for editorial paratexts such as dedications and introductions.
Creation dates
2019–2022
Dataset creators
Thomas Weitin, Katharina Herget, Constanze Hahn (student assistant), Katja Jungherz (student assistant), Ronja Gramlich (student assistant), Zsofia Pilz (student assistant), Julian Lemmerich (student assistant), Ikira Schielke (student assistant), Vanessa Möschner (student assistant), Helene Hackethal (student assistant), Andrea Hönig (student assistant) (all contributors are or have been affiliated with Technical University of Darmstadt).
Language
German
License
Creative Commons Attribution 4.0 International.
Publication date
2022-06-22 (last updated on 2025-06-23).
(4) Reuse Potential
The corpus can be reused in literary studies, legal history, and discourse analysis, particularly for diachronic research on genre conventions, legal narratives, and the cultural framing of crime and justice in the 19th century. For example, literary scholars may trace narrative motifs across time, while legal historians might examine changes in witness testimony. In the field of Natural Language Processing (NLP), the corpus can also serve as training data for large language models (LLMs), to improve handling of historical German, including orthographic variation and linguistic features of criminal case studies.
Notes
[1] As stated in the subtitle: “Der Neue Pitaval. Eine Sammlung der interessantesten Kriminalgeschichten aller Länder aus älterer und neuerer Zeit.” From volume 29 onwards, the editorial duties were taken over by Anton Vollert.
[5] The underlying workflow was originally developed and documented in the context of a corpus of 19th-century novellas (see: Herget, 2025, pp. 22–27).
Competing Interests
The authors have no competing interests to declare.
Author Contributions
Katharina Herget: conceptualization, data curation, formal analysis, investigation, methodology, supervision, writing– original draft, writing – review & editing.
Thomas Weitin: conceptualization, funding acquisition, project administration, supervision, writing – review & editing.
