(1) Overview
The corpus of this dataset comprises diplomatic transcriptions of the textual witnesses of three Middle Dutch texts:
Dietsche Catoen, a 13th century translation of the influential Latin school text Distichs of Cato, including approbation, prologue and epilogue where present (16 witnesses).
Rhymed Bible, also known as Scolastica, written by the Flemish author Jacob van Maerlant in 1271 (samples from 15 witnesses).
Karel ende Elegast, a chivalric romance in the tradition of Charlemagne epics, originating in Flanders, possibly in the 13th century (14 witnesses1).
The witnesses originate from different regions and periods, as presented in Table 1 (column 3), and include intact manuscripts and printed witnesses as well as fragmentary witnesses, amounting to around 28,600 (parallel) verses in total (Table 2).
Table 1
Overview of the siglum, medium/state, date/region and repository/signature, in chronological order (and alphabetical order when witnesses share a date(range)). The last column lists publicly available images. The dates and regions for Dietsche Catoen are based on Van Buuren (1998) (‘extra’ means the witnesses were not known by Van Buuren) and Sleiderink and Houtsma (2015), for Scolastica on Meuwese (2001), Van Dalen-Oskam (2012; 2014) and Hutter (2023), and for Karel ende Elegast on Duinhoven and Van Thienen (1990), Klein (1989, 1995), Caers (2011) and Langeslag (2015). We use ‘Low Countries’ as the region when specification within the language area was not possible. The Southern Low Countries include Flanders, Brabant and Limburg; the Northern Low Countries include Holland, Zeeland, Utrecht, Gelre and Overijssel.
| DIETSCHE CATOEN | ||||
|---|---|---|---|---|
| SIGLUM | MEDIUM/STATE | DATE/REGION | REPOSITORY/SIGNATURE | IMAGES |
| 1. A (The Oudenaarde verse miscellany) | Manuscript, fragmentary | Around 1290, Flanders | Oudenaarde, Stadsarchief Handschriften en Zeldzame Drukken, nr. 32 | Digital |
| 2. Me | Manuscript, fragmentary | 1350–1400, Brabant (?) | Mechelen, Stadsarchief AA Comptes Communaux, I, 3 | / |
| 3. L | Manuscript, fragmentary | After 1350 or 1400–1450, South Holland (Kienhorst, 2005, II 181) | Leiden, University Library, LTK 221 | IIIF |
| 4. M | Manuscript, complete | 1375–1400, East Flanders | Munich, Bayerische Staatsbibliothek, Cod. Germ. 102. | IIIF |
| 5. C (Comburg manuscript) | Manuscript, complete | 1380–1425 (hand B: end of 14th Century; Brinkman & Schenkel, 1997, 45), Flanders | Stuttgart, Württemberg State Library, Cod. Poet. Et phil. 2° 22 | Digital |
| 6. b | Manuscript, fragmentary | Around 1380 or 1400–1450,South–West Brabant (Kienhorst, 2005, II 25) | Berlin, Staatsbibliothek Preussischer Kulturbesitz, Ms. germ. fol. 751.8 | IIIF |
| 7. Br | Manuscript, fragmentary | 1430–1440,Brabant (?) | Brussels, AR Rekenkamer, 41.234 | / |
| 8. H | Manuscript, complete | 1475–1525,Low Countries | Middelburg, Zeeuwse Bibliotheek, Hs. 6353 | Digital |
| 9. B2 (Jan Phillipsz.) | Manuscript, complete | Around 1478,Leiden (Holland) (Brinkman 1995, 16) | Berlin, Staatsbibliothek Preussischer Kulturbesitz, Ms. germ. quart. 557 | / |
| 10. P | Manuscript, interpolated stanzas | Around 1493,Bruges (?) (Flanders) | Paris, Bibliothèque Nationale Fonds Néerlandais, 106 | IIIF |
| 11. R (Jan Seversz.) | Printed book, fragmentary | 1502–1524,Leiden (Holland) | Rijssel (Lille), University Library, signature: unknown | / |
| 12. D (Henrick Eckert van Homberch) | Printed book, complete | Around 1500, Antwerp (Brabant) | The Hague, National Library of the Netherlands, 229 G 16 | / |
| 13. G (Jan van Ghelen) | Printed book, complete | Around 1540,Antwerp (Brabant) | Mettingen, Draiflessen Collection, Liberna DC(L)M, W 754 (olim: Arenberg nr. 979 and private collection Liberna in Hilversum)3 | / |
| 14. d1 (Hieronymus Verdussen) | Printed book, complete | 1605,Antwerp (Brabant) | Mettingen, Draiflessen Collection, Liberna DC(L)M, W 148 (olim: Arenberg nr. 1261 and private collection Liberna) | / |
| Extra: Antwerp, University of Antwerp, Ruusbroecgenootschap, RG 3114 B 13 | / | |||
| Extra: Antwerp, Letterenhuis, 18.105 | / | |||
| Extra: Antwerp, Hendrik Conscience Heritage Library, C 102837 [S0-268 g] | Digital | |||
| 15. d2 (Pauwels Stroobant) | Printed book, complete | 1596–1617,Antwerp (Brabant) | Mettingen, Draiflessen Collection, Liberna DC(L)M, W 147 (olim: Arenberg, nr. 1290 and private collection Liberna) | / |
| d3 (Stroobant – edition 1) | Printed book, complete | Brussels, Royal Library of Belgium, Cl 6513 RP | / | |
| d4 (Stroobant – edition 1) | Printed book, complete | Antwerp, Hendrik Conscience Heritage Library, F. 41486 | Digital | |
| d5 (Stroobant – edition 1) | Printed book, complete | Amsterdam, Allard Pierson Depot OTM: OK 63–5636 (olim: University Library, 971 G 10) | / | |
| Extra: Antwerp, Hendrik Conscience Heritage Library, 805634:05 [C2-559 i] | Digital | |||
| Extra: Nijmegen, Radboud University, Central Library – Special Collections, OD TBI 201 c 159 nr.9 | / | |||
| Extra: Amsterdam, Allard Pierson Depot, OTM: O 06–8283 | / | |||
| 16. d6 (Stroobant – edition 2) | Printed book, complete | 1596–1617,Antwerp (Brabant) | Mettingen, Draiflessen Collection, Liberna DC(L)M, W 149 (olim: Arenberg, nr. 1257 and private collection Liberna) | / |
| SCOLASTICA | ||||
| SIGLUM | MEDIUM/STATE | DATE/REGION | REPOSITORY/SIGNATURE | IMAGES |
| 1. C | Manuscript, complete | Around 1285,Brabant or Flanders | Brussels, Royal Library of Belgium, 15001 | Digital |
| 2. B | Manuscript, complete | 1300–1325,Flanders | Brussels, Royal Library of Belgium, 19545 | Digital |
| 3. D | Manuscript, complete | 1300–1350,Flanders | The Hague, National Library of the Netherlands, 76 E 16 | Partial |
| 4. E | Manuscript, complete | 1300–1350,Flanders | The Hague, National Library of the Netherlands, 129 A 11 | / |
| 5. A | Manuscript, complete | 1321, Waterduinen (Flanders) | Berlin, Staatsbibliothek – Preußischer Kulturbesitz, Germ. Fol. 662 | / |
| 6. M | Manuscript, complete | Around 1330, Utrecht | The Hague, Museum Meermanno-Westreenianum, 10 B 21 | Digital |
| 7. G (Gronings-Zutphens Maerlant Manuscript) | Manuscript, complete | 1335–1339, Brabant or Gelre | Groningen, University Library, HS 405 | Digital |
| 8. K | Manuscript, complete | 1370–1385, Northern Low Countries | London, British Library, Add. 10044 | /4 |
| 9. L | Manuscript, complete | 1393,Northern Low Countries | London, British Library, Add. 10045 | / |
| 10. F | Manuscript, complete | Around 1400,Utrecht | The Hague, National Library of the Netherlands, KA XVIII | / |
| 11. I (bu) | Manuscript, complete | Around 1434, Southern Low Countries | Brussels, Royal Library of Belgium, 720–722 | Digital |
| 12. J | Manuscript, complete | 1450–1475, Low Countries | Leiden, University Library, Ltk 168 | IIIF |
| 13. O (hk) | Manuscript, complete | 1450–1475, Low Countries | The Hague, National Library of the Netherlands, 75 E 20 | / |
| 14. N | Manuscript, complete | 1453,Southern Low Countries | The Hague, Museum Meermanno-Westreenianum, 10 C 19 | IIIF |
| 15. H | Manuscript, complete | 1460–1470,Gelre or Overijssel | Leiden, University Library, BPL 14 C | IIIF |
| KAREL ENDE ELEGAST | ||||
| SIGLUM | MEDIUM/STATE | DATE/REGION | REPOSITORY/SIGNATURE | IMAGES |
| 1. V | Manuscript, fragmentary | Around 1350, Brabant (?) | Vercelli, Archivio do Stato, Prefettura, Giudiziarro, Fondo Antico, mazzo 41 (fasc. 2) | / |
| 2. Ge | Manuscript, fragmentary | 1350–1400, Southwest Brabant | Ghent, University Library, HS. 896-A | IIIF |
| 3. M | Manuscript, fragmentary | 1350–1400, Brabant | Arras, Bibliothèque de la Ville, Ms. 227–383 | / |
| 4. Br | Manuscript, fragmentary | 1350–1450, Southeast Limburg | Brussels, Archives of the City of Brussels, hs. 1645 | / |
| 5. G | Manuscript, fragmentary | 1350–1450, Southeast Limburg | Munich, Bayerische Staatsbibliothek, Cod. Ger. 5249, Nr. 69 | IIIF |
| 6. H | Manuscript, fragmentary | 1375–1400, Southeast Limburg | The Hague, National Library of the Netherlands, 131 D 5 | / |
| 7. N | Manuscript, fragmentary | 1375–1400, Flanders | Namur, Fonds de la Ville, en dépôt à la Société archéologique de Namur, inv. Ms 196 B 19 | / |
| 8. F | Printed book, fragmentary | 1484–1488, [‘s-Hertogenbosch (Brabant)] | Cambridge, University Library, Inc. 6 E 12.1 | / |
| 9. A | Printed book, complete | 1486–1488, [Delft (Holland)], [Jacob Jacobsz. van der Meer or Christiaen Snellaert] | The Hague, National Library of the Netherlands, 169 G 63 | / |
| 10. B | Printed book, complete | 1493-after 1500, [Antwerp (Brabant)], Govaert Bac | Berlin, Staatsbibliothek Preussischer Kulturbesitz, 8° Inc 4812 | IIIF |
| 11. C | Printed book, complete | 1496–1499, Antwerp (Brabant) | Washington, Library of Congress, Collection Lessing J. Rosenwald, Inc. X.K. 33 | IIIF |
| 12. L | Printed book, complete | After 3 July 1496, [Antwerp (Brabant)] | St. Petersburg, National Library of Russia, 8.13.7.9 | / |
| 13. D | Printed book, complete | c. 1530, [Antwerp (Brabant)], [Adriaen van Berghen, Jan van Doesborch or Jan Berntsz.] | Brussels, Royal Library of Belgium, II 54948 A L.P. | Digital |
| 14. E | Printed book, complete | c. 1550–1596, Antwerp (Brabant), Jan van Ghelen | Brussels, Royal Library of Belgium, II 47686 A L.P. | / |
Table 2
Overview of the transmission of the witnesses in terms of total number of verses, damaged verses and complete verses.
| DIETSCHE CATOEN | |||
|---|---|---|---|
| SIGLUM | TOTAL NUMBER OF VERSES | VERSES WITH DAMAGE | VERSES WITHOUT DAMAGE |
| A | 393 | 37 | 356 |
| Me | 80 | 40 | 40 |
| L | 72 | 1 | 71 |
| M | 233 | 0 | 233 |
| C | 298 | 0 | 298 |
| b | 108 | 19 | 89 |
| Br | 34 | 0 | 34 |
| H | 261 | 0 | 261 |
| B | 263 | 0 | 263 |
| P | 188 | 0 | 188 |
| R | 72 | 0 | 72 |
| D | 287 | 0 | 287 |
| G | 206 | 0 | 206 |
| d1–d6 | 206 (x6) | 0 | 206 (x6) |
| TOTAL | 3731 | 97 | 3634 |
| SCOLASTICA | |||
| SIGLUM | TOTAL NUMBER OF VERSES | VERSES WITH DAMAGE | VERSES WITHOUT DAMAGE |
| C | 1111 | 0 | 1111 |
| B | 1107 | 0 | 1107 |
| D | 1106 | 0 | 1106 |
| E | 1108 | 0 | 1108 |
| A | 1115 | 0 | 1115 |
| M | 1107 | 0 | 1107 |
| G | 1112 | 0 | 1112 |
| K | 1115 | 1 | 1114 |
| L | 1111 | 0 | 1111 |
| F | 1132 | 0 | 1132 |
| I | 709 | 0 | 709 |
| J | 1107 | 0 | 1107 |
| O | 401 | 0 | 401 |
| N | 1114 | 0 | 1114 |
| H | 897 | 0 | 897 |
| TOTAL | 15352 | 1 | 15351 |
| KAREL ENDE ELEGAST | |||
| SIGLUM | TOTAL NUMBER OF VERSES | VERSES WITH DAMAGE | VERSES WITHOUT DAMAGE |
| V | 21 | 0 | 21 |
| Ge | 618 | 617 | 1 |
| M | 241 | 144 | 97 |
| Br | 130 | 100 | 30 |
| G | 173 | 6 | 167 |
| H | 319 | 102 | 217 |
| N | 129 | 6 | 123 |
| F | 36 | 0 | 36 |
| A | 1381 | 17 | 1364 |
| B | 1371 | 0 | 1371 |
| L | 1187 | 0 | 1187 |
| D | 1363 | 0 | 1363 |
| E | 1218 | 72 | 1146 |
| TOTAL | 9558 | 1064 | 8494 |
Repository location
Context
The individual projects of the authors have led to the creation of transcriptions of the texts featured in this paper. Constrained (2022–2026), the doctoral research project of Sofie Moors, seeks to combine this data for the purpose of studying the relation between scribal variation and formal elements. The resulting dataset is presented in this paper and is a continuation of a previously published dataset containing diplomatic editions of 17 witnesses of the Martijn Trilogy by 13th century poet Jacob van Maerlant (Moors, Kestemont & Sleiderink, 2024; see Haverals & Kestemont, 2023 for an example of further use of this dataset).
(2) Method
Steps
1) Data Collection
To study the manuscript and printed tradition of the texts in our corpora, we collected diplomatic, digital transcriptions. For most of the witnesses, existing editions could be digitised using Optical Character Recognition via ABBYY FineReader.5
For Dietsche Catoen, the edition of Van Buuren (1998), which is fully available online, was used, which was supplemented in the same format by Sleiderink and Houtsma (2015), with the editions of two later discovered text witnesses (Me, Br), so that all extant textual witnesses are included in the final dataset.
For the Scolastica, the research data were available thanks to Van Dalen-Oskam (2012). For her research, she took samples from the fifteen complete manuscripts of Scolastica. The surviving fragments were not included in the corpus (see Hutter, 2023 for an inventory).
For Karel ende Elegast, all extant Middle Dutch textual witnesses are included. Most transcriptions were collected from the diplomatic editions by Duinhoven (1969, M, H, N, G, Br, A, B, C, D, E, F), Klein (1989, Ge) and Langeslag (2015, V). The text of L has not previously been published, but was transcribed by Duinhoven (transcription can be consulted in the National Library in The Hague, see also Duinhoven and Van Thienen, 1990).
2) Annotation & Processing
All the transcriptions were encoded using the TEI-MVN framework (via Oxygen XML Editor6) developed by Boot and Brinkman (2020). MVN (Middeleeuwse Verzamelhandschriften uit de Nederlanden) is an edition series specifically for Middle Dutch texts transmitted in multi-text codices. This series provides guidelines for paper as well as digital diplomatic editions, in addition to the standard TEI-XML. Even though the multi-text focus is not foregrounded in this current dataset, the MVN framework offers the most extensive guidelines for digitally editing Middle Dutch and therefore is the most suitable framework for the corpus. As the implementation of this framework is quite complex and labour intensive, a custom programming script to automatically generate XML files was created (Moors, Kestemont & Sleiderink, 2024). The output of the OCR (step 1) was saved as raw text files and enriched with simpler descriptive markup for verse, column and folio structure, capital letters, abbreviations, damage gaps, missing verses, deletions and additions, and uncertain readings. Spelling variation was dealt with according to the standards in Middle Dutch studies. Thus, the difference between i/j and u/v was encoded, and not adapted to modern spelling, but the different allographs of s (normal or long), r and y were not represented. The enriched text files were then automatically converted to XML files. Because this process is fully automated, it sometimes requires intervention and postprocessing to ensure the XML output is fully correct and diplomatic. For example, the script cannot properly handle more than one XML tag per token, thus when a token has multiple tagged properties (e.g. abbreviated and struck through), intervention is required to ensure both properties are visualised.
Additionally, short manuscript descriptions in TEI-XML were added in the XML-header, containing information on the text and its position in the manuscript <msItem>, a description of the physical object <physDesc>, including the (reconstructed) form of the textual witnesses, the material, the number of leaves, collation, leaf size and layout, the font or type and the decoration, and the <history>, including information on the dating and localisation of the witnesses, and, where known, contributors to their production. In the case of fragments, a description is given for the reconstructed codex (where possible) and in the case of convolute volumes the description is limited to the unit containing the text in question.
Sampling strategy
All available text of the witnesses was used, with the exception of those from the Scolastica, for which a selection of shorter samples of roughly two hundred verses was used. The sample selection was the result of the choices made for previous research (Van Dalen-Oskam, 2012). In line with the requirements of this study, the chosen passages concern female Bible protagonists. As transcriptions of these samples are available for all witnesses, the same passages were used here for pragmatic reasons. As this sample size offers a number of verses that is comparable to that of the Martijn Trilogy (Moors, Kestemont & Sleiderink, 2024), the decision was made not to transcribe the remainder of the text (35,000 verses).
Quality control
These prior editions served as a useful basis, but the transcription practices were not uniform. Therefore, a re-collation with photographic facsimiles of the original sources was necessary. The transcriptions were collated manually and normalised to obtain a uniform, maximally faithful diplomatic transcription of each witness. After collating with photographic copies/microfilms, some uncertain passages were further clarified through on-site examination in the holding institutions.
For a more detailed description of the workflow, see Moors, Kestemont and Sleiderink (2024).
(3) Dataset Description
Repository name
Zenodo. The associated scripts in Jupyter Notebooks (Python language) can be accessed through the Github page: www.github.com/SofieMoors/controlcorpus.
Object name
Three main folders: dietschecatoen, scolastica and karelendeelegast. Each folder contains the same file set:
data:
– html: html files with abbreviations expanded, html files with abbreviations unexpanded and extra folders with mvnhtml.css (the css view is optimised for macOS and iOS browsers)
– mvn: framework used (https://github.com/HuygensING/mvn-xml/tree/main/framework)
– plain_txt: text files generated from xml with markup applied (scripts > xml2txt.ipynb)
– rich_txt: raw text files, manually enriched with semantic markup
– viz: timeline and pixelplot
– xlsx: synoptic presentation of all witnesses (abbreviations expanded)
– xml: xml files generated from the rich_txt (scripts > txt2xml.ipynb)
scripts:
– txt2xml.ipynb: code to convert rich_txt to xml
– viz.ipynb: code to create viz
– xml2txt.ipynb: code to convert xml to plain_txt
– xml2xlsx.ipynb: code to convert xml to xlsx
Format names and versions
data: TXT, XML, HTML, XLSX; scripts: .ipynb
Creation dates
dietschecatoen: 2022–2023, scolastica: 2009–2024, karelendeelegast: 2023–2024
Dataset creators
Sofie Moors (University of Antwerp)
– Dietsche Catoen (data collection, annotation & processing)
– Scolastica (data annotation & processing)
– Karel ende Elegast (data processing)
Nicky Voorneveld (University of Antwerp)
– Karel ende Elegast (data collection & annotation)
Karina van Dalen-Oskam (Huygens Institute & University of Amsterdam)
– Scolastica (data collection)
Kamiel Temmermans
– Scolastica (data annotation)
Language
English
License
CC-BY-SA
Publication date
2025-03-21
(4) Reuse Potential
Since the transcriptions in this dataset are diplomatic, they lend themselves well to research on scribal attributions and scribal variation (e.g., Van Dalen-Oskam, 2012, 2014; Vandyck & Kestemont, 2024), stemmatology (e.g., Camps, Fernandez Riva & Gabay, 2021), phylogenetics (e.g., McCollum & Turnbull, 2024), and more. The monolingual nature of this dataset means that, as a standalone corpus, it is limited to use within a single linguistic area, however, while the data itself may be monolingual, it may be used for interlingual comparison with other existing datasets of medieval vernaculars, such as the Middle High German corpus of Flos unde Blankeflos (de Bruijn & Bastert, 2025), La Base de Français Médiéval (Guillot-Barbance, Heiden, & Lavrentiev, 2017) and the Middle English corpus of the Canterbury Tales (North, Bordalejo, Jones, & Robinson, 2020). Furthermore, it enriches an under-researched field with empirical evidence of scribal variation and in doing so fills an important gap in the available (digital) data on multiple textual witnesses of Middle Dutch works, which is now quite limited (e.g., Burgers, 2004; Hendriks & Kuiper, 2018; Moors, Kestemont & Sleiderink, 2024).
We do not offer an edition for scholarly reading, instead, the corpus has been optimised for computational analysis and collation by opting for consistency: a unique identifier was added to each verse and the line division was kept the same across witnesses (Guéville & Wrisley, 2022). Accessibility is further enhanced by the availability of multiple file formats, including the richly encoded XML files as well as HTML and XLSX files, which are more easily used by researchers of a less digital background.
Notes
[1] Karel ende Elegast is also extant in a Ripuarian manuscript (siglum K, Darmstadt, Universitäts- und Landesbibliothek, Hs. 2290), and a Middle High German manuscript (Zeitz, Stifts- und Domherrenbibliothek, 60). Due to their linguistic distance from the Middle Dutch witnesses, these were not included in the corpus.
[2] In the data folders you will find this text witness as ‘B2’ and not as ‘B’. This is because file names are not case sensitive and so otherwise there would be overlap with text witness ‘b’.
[3] Printed books G, d1 and d6 originally came from the collection of Constant Philippe Serrure, who later sold most of his Middle Dutch editions to the Duke of Arenberg (nos. 979, 1261 and 1257) (Cockx-Indestege & Delsaerdt, 2022, 47, 60, 625, 727–728). Van Buuren (1998, 186) writes that the four printed books from the Arenberg Collection (nos. 979, 1261, 1290 and 1257) are in a ‘private collection’ in the Netherlands. He does not mention this collection by name, but from Cockx-Indestege and Delsaerdt (2022) we can conclude that he referred to Bernard Brennikmeijer’s famous ‘Liberna Collection’, formerly in Hilversum (the Netherlands), but part of the Draiflessen Collection in Mettingen (Germany) since 2013. The books are not digitised, but we received pictures through the curator Guido Scholten (P. Delsaerdt & G. Scholten, personal communication, September 9–10, 2024).
[4] Due to a cyberattack on the website of the Britsh Library, it was not possible to verify whether these manuscripts had been digitised. Therefore, we checked through email with the Manuscripts Reference Services. They let us know that the manuscripts are not digitised and that there is no indication of planning to digitise them in future (personal communication, September 3, 2024).
[5] Retrieved from https://pdf.abbyy.com/ (last accessed: 21 December 2023).
[6] Retrieved from https://www.oxygenxml.com/ (last accessed: 21 December 2023).
Acknowledgements
We are grateful to Mike Kestemont for his help in writing the scripts and to Remco Sleiderink for his feedback and suggestions. We also want to thank Kamiel Temmerman for his transcription work. Many thanks to the libraries and library staff who kindly provided access to the witnesses in their collections.
Competing Interests
The authors have no competing interests to declare.
Author Contributions
– Sofie Moors: conceptualisation, data curation, formal analysis, investigation, methodology, resources, software, supervision, visualisation, writing (original draft/review & editing)
– Nicky Voorneveld: investigation, resources, writing (original draft/review & editing)
– Karina van Dalen-Oskam: resources, writing (review & editing)
