1 Overview
The Romankorpus Frühneuhochdeutsch (Roko.UP) is a collection of nine Early New High German narrative prose texts, written between 1450 and 1550, as presented in Table 1. The texts vary in their size and plot-complexity. Except for the Fortunatus, which was originally created as Early New High German prose, the narrations are adaptations of texts that originate from different narrative traditions.
Table 1
The nine texts of the Romankorpus Frühneuhochdeutsch.
| TEXT NAME | YEAR | ORIGINAL NARRATIVE TRADITION | NO. OF TOKENS |
|---|---|---|---|
| Pontus und Sidonia | 2nd half 15th c. | French prose | 78145 |
| Melusine | 1474 | French poetry | 40218 |
| Wilhelm von Österreich | 1481 | Middle High German poetry | 40033 |
| Tristrant und Isalde | 1484 | Middle High German poetry | 53175 |
| Huge Schapler | 1500 | French poetry | 39502 |
| Fortunatus | 1509 | Early New High German prose | 55328 |
| Wigalois vom Rade | 1519 | Middle High German poetry | 24717 |
| Die schöne Magelone | 1535 | French prose | 23750 |
| Der Goldene Esel | 1538 | Latin prose | 67557 |
The number of tokens represents the count of elements that are separated by white spaces; punctuation and page numbering are not counted as tokens.
Repository location
Context
The Roko.UP was collected as a part of the project ‘Wortstellung und Diskursstruktur in der Frühen Neuzeit’. The objective of this project is to evaluate the degree and manner in which discourse and narrative structure motivate word order phenomena in Early New High German.
The earliest text dates from the second half of the 15th century, when fundamental changes in extra-linguistic conditions, such as changes in the history of media as well as changes in terms of the audience, led to a new literary beginning (Bertelsmeier-Kierst, 2014; Buschinger, 2010). The importance of narrative prose at the turn from the Middle Ages to the Early Modern Period is reflected in a rapidly growing number of prose novels in the form of manuscripts and prints. At the same time, a stronger authorial consciousness in conjunction with the newly emerging consciousness of fictionality (Müller, 1997) is found, which is often reflected in the narrator comments to the textual originals (Kellermann, 2010).
The nine prose novels from the 15th and 16th century are selected for their diverse background regarding the narrative traditions in which they were originally compiled (Bertelsmeier-Kierst, 2014, 2019). For this reason, the corpus contains prose versions based on Middle High German narrative poetry, adaptations of French and Latin poetry and prose, as well one text that is not modeled on another, that is, one that is originally composed as an Early New High German prose text.
Parts of the corpus and earlier compilations of the texts have been used in research papers (Bloom, forthcoming-a and Bloom, forthcoming-b), a master’s thesis (Purwin, 2023), and a dissertation in preparation (Reetz, ms).
2 Method
Steps
The steps involved in the compilations of the corpus are visualized in Figure 1. Whenever needed, we returned to earlier steps in the procedure as indicated by the arrows.

Figure 1
Procedure.
The building of the corpus started with the digitization of the texts, wherever necessary, and the conversion into .txt-files. The editions were then verified and in case of Huge Schapler, Der Goldene Esel, Pontus und Sidonia, Wilhelm von Österreich and Wigalois, (parts of) the text needed to be (partially) re-done using either a different edition, or completing the text. For the final version of the corpus, the digitizations of Tristrant und Isalde and Wigalois vom Rade that had previously been made available in a digital .doc-format (INST 336/90-1) were included. For Melusine, Die schöne Magelone, Fortunatus and Huge Schapler digitized versions from Müller (1990) were used. Pontus und Sidonia (Fassung B) (Anonymous, 2nd half 15th. c.) and Der Goldene Esel were newly transcribed.
The texts were subsequently clause-separated by marking the clause boundaries in Notepad by means of a line break so that every finite clause is presented on their own row. In case of center-embedded finite clauses, it was decided that they would be separated from their matrix clause, as illustrated in (1).
| (1) | Der graf gab die kleynet |
| die der iungkfrawen zůgehorten | |
| wider | |
| ‘The count gave the trinket | |
| that belonged to the lady | |
| back.’ (Wigalois, #C iiiir#) |
The purpose of the clause-separation is to be able to straightforwardly extract and annotate particular word orders in finite clauses and clause types without losing the wider discourse-context and overview of the narrative structure. The files were subsequently systematized and proofread. Whenever necessary, this process was repeated.
Quality control
For the systematization of the files, we have decided to keep relatively close to the sources regarding orthography and no insertions of missing words were made. The transcription was reported as in the editions, including capitalization, but paragraph signs (e.g., before the header in Figure 2) and virgules (at the end of the fourth line from below in Figure 2) have been removed. Word-breaks and extra spaces have been resolved.

Figure 2
Header in Tristrant und Isalde, #2r#.
Abbreviations, as for example shown in the last word of the second line in Figure 2, were written out consistent with the available transcriptions of the other texts. Double majuscules at the beginning of episodes (illustrated below the illustration in Figure 2) were presented by two capital letters (ES). When the initial letter has become unreadable, the letter has been added. Diacritics with a superscript o (e.g., <ů>) are transcribed as such and diacritics with a superscript e are represented by a colon followed by an e, e.g., <u:e>.
Any deviation from the original manuscript is surrounded by angle brackets (<>) and listed in the Readme file, where further information and text-specific notes can be found as well.
Titles and picture headings (Simmler, 2013), as illustrated in Figure 2, were indicated by blank rows surrounding the header. Page numbers are represented between two hashtags in each file (e.g., #100#). In case of page breaks within words, the page number is presented directly before or after these words to avoid word breaks. The reported page numbers reflect the numbering of the reported sources. As a result, for texts that have been directly transcribed from the original manuscript the page number may come in a format in which the page number is complex and includes a section letter, which is capitalized; a numeral – Roman or Arabic – that represents the page number; an ‘r’ or ‘v’, labeling the face (recto) or back (verso) of the folio; and ‘a’ or ‘b’ to mark the columns (Clemens & Graham, 2007).
To ensure systematicity in the clause-separation, three annotators first annotated the same 100 lines. Based on this pilot annotation, instructions were written out, which were used for further annotation and updated upon discussion. Any disagreements and difficult cases were resolved through discussion. In particular, afinite constructions (Breitbarth, 2005; Demske, 2022) and sequences in which preverbal adverbs were followed by adverbial clauses (Bloom forthcoming-a) repeatedly caused difficulties and had to be discussed and revisited. In the end, each text was clause-separated by at least one data curator, and systematized and proofread by another.
3 Dataset Description
Object name
Romankorpus Frühneuhochdeutsch (Roko.UP)
Format names and versions
Text files (.txt) with UTF-8 encoding.
Creation dates
2021-08-01 – 2023-11-25.
Dataset creators
Ulrike Demske: Conceptualization, Funding acquisition, Investigation, Methodology, Supervision, Resources, Validation, Manuscript transcription, Universität Potsdam
Barthe Bloom: Data curation, Investigation, Methodology, Supervision, Validation, Universität Potsdam
Malika Reetz: Data curation, Investigation, Methodology, Supervision, Writing, Universität Potsdam
Luisa Boeckmann, Johann Heidrich, Alona Prozorova & Jette Purwin: Data curation, Universität Potsdam
Peter Neumann: Transcription of Pontus und Sidonia
Sanne-Helen Taubner: Transcription of Der Goldene Esel
Language
Early New High German
License
CC-BY4.
Repository name
Publication date
2023-11-28
4 Reuse Potential
Our corpus is likely to be of interest for researchers of historical linguistics and narratology. Future projects might use the corpus to address quite different research questions related to the language system in Early Modern German. In this regard, the corpus is a highly welcome addition to the few existing corpora of this historical period of German, such as the Bonn Early New High German Corpus or the Berlin Ridges Herbology Corpus. The focus of our corpus lies on the genre of narrative prose, which is especially interesting against the background that prose is a new form of narrative fiction in Early New High German (Bertelsmeier-Kierst, 2014). Hence, the Roko.UP provides data for different research subjects in linguistic and literary studies.
Our clause separation allows an easy assessment of the position of elements within a clause, as well as a clear presentation of linked propositions. This makes it a suitable basis to further annotate and process the texts.
Concretely, the corpus can help to shed light on issues concerning narrative representation (Haferland, 2023) and concerning the linguistic realization of phenomena that are restricted to narration, like the simultaneity of referenced time frames; on grammatical means and the implications that this holds for linguistic change (Philipowski & Zeman, 2022; Zeman, 2019). Moreover, the corpus may facilitate quantitative analyses of syntactic questions, e.g., regarding clause linkage (Freywald, 2016). Studies on the correlation between discourse representation and lexical and grammatical expressions on clause level (Speyer, 2010) or on the influence of information structural categories on syntactic phenomena (see e.g., Sapp, 2007, 2014) could also be further expanded and corroborated. As such, the corpus can be used for research that concerns the fixation of German clause structure in the context of discourse-pragmatics, which has thus far mostly concentrated on the periods preceding Early New High German (see articles in Jäger, Ferraresi, & Weiss, 2018).
Funding Information
This research was funded by the Deutsche Forschungsgemeinschaft (Project 456973946, ‘Wortstellung und Diskursstruktur in der Frühen Neuzeit’). In addition part of the data was provided by a project funded by the Deutsche Forschungsgemeinschaft (INST 336/90-1).
Competing interests
The authors have no competing interests to declare.
