Have a personal or library account? Click to login
Witnessing Middle Dutch Textual Traditions. Diplomatic Transcriptions of Dietsche Catoen, Scolastica, and Karel ende Elegast Cover

Witnessing Middle Dutch Textual Traditions. Diplomatic Transcriptions of Dietsche Catoen, Scolastica, and Karel ende Elegast

Open Access
|Aug 2025

Full Article

(1) Overview

The corpus of this dataset comprises diplomatic transcriptions of the textual witnesses of three Middle Dutch texts:

  1. Dietsche Catoen, a 13th century translation of the influential Latin school text Distichs of Cato, including approbation, prologue and epilogue where present (16 witnesses).

  2. Rhymed Bible, also known as Scolastica, written by the Flemish author Jacob van Maerlant in 1271 (samples from 15 witnesses).

  3. Karel ende Elegast, a chivalric romance in the tradition of Charlemagne epics, originating in Flanders, possibly in the 13th century (14 witnesses1).

The witnesses originate from different regions and periods, as presented in Table 1 (column 3), and include intact manuscripts and printed witnesses as well as fragmentary witnesses, amounting to around 28,600 (parallel) verses in total (Table 2).

Table 1

Overview of the siglum, medium/state, date/region and repository/signature, in chronological order (and alphabetical order when witnesses share a date(range)). The last column lists publicly available images. The dates and regions for Dietsche Catoen are based on Van Buuren (1998) (‘extra’ means the witnesses were not known by Van Buuren) and Sleiderink and Houtsma (2015), for Scolastica on Meuwese (2001), Van Dalen-Oskam (2012; 2014) and Hutter (2023), and for Karel ende Elegast on Duinhoven and Van Thienen (1990), Klein (1989, 1995), Caers (2011) and Langeslag (2015). We use ‘Low Countries’ as the region when specification within the language area was not possible. The Southern Low Countries include Flanders, Brabant and Limburg; the Northern Low Countries include Holland, Zeeland, Utrecht, Gelre and Overijssel.

DIETSCHE CATOEN
SIGLUMMEDIUM/STATEDATE/REGIONREPOSITORY/SIGNATUREIMAGES
1. A (The Oudenaarde verse miscellany)Manuscript, fragmentaryAround 1290, FlandersOudenaarde, Stadsarchief Handschriften en Zeldzame Drukken, nr. 32Digital
2. MeManuscript, fragmentary1350–1400, Brabant (?)Mechelen, Stadsarchief AA Comptes Communaux, I, 3/
3. LManuscript, fragmentaryAfter 1350 or 1400–1450, South Holland (Kienhorst, 2005, II 181)Leiden, University Library, LTK 221IIIF
4. MManuscript, complete1375–1400, East FlandersMunich, Bayerische Staatsbibliothek, Cod. Germ. 102.IIIF
5. C (Comburg manuscript)Manuscript, complete1380–1425 (hand B: end of 14th Century; Brinkman & Schenkel, 1997, 45), FlandersStuttgart, Württemberg State Library, Cod. Poet. Et phil. 2° 22Digital
6. bManuscript, fragmentaryAround 1380 or 1400–1450,South–West Brabant (Kienhorst, 2005, II 25)Berlin, Staatsbibliothek Preussischer Kulturbesitz, Ms. germ. fol. 751.8IIIF
7. BrManuscript, fragmentary1430–1440,Brabant (?)Brussels, AR Rekenkamer, 41.234/
8. HManuscript, complete1475–1525,Low CountriesMiddelburg, Zeeuwse Bibliotheek, Hs. 6353Digital
9. B2 (Jan Phillipsz.)Manuscript, completeAround 1478,Leiden (Holland) (Brinkman 1995, 16)Berlin, Staatsbibliothek Preussischer Kulturbesitz, Ms. germ. quart. 557/
10. PManuscript, interpolated stanzasAround 1493,Bruges (?) (Flanders)Paris, Bibliothèque Nationale Fonds Néerlandais, 106IIIF
11. R (Jan Seversz.)Printed book, fragmentary1502–1524,Leiden (Holland)Rijssel (Lille), University Library, signature: unknown/
12. D (Henrick Eckert van Homberch)Printed book, completeAround 1500, Antwerp (Brabant)The Hague, National Library of the Netherlands, 229 G 16/
13. G (Jan van Ghelen)Printed book, completeAround 1540,Antwerp (Brabant)Mettingen, Draiflessen Collection, Liberna DC(L)M, W 754
(olim: Arenberg nr. 979 and private collection Liberna in Hilversum)3
/
14. d1 (Hieronymus Verdussen)Printed book, complete1605,Antwerp (Brabant)Mettingen, Draiflessen Collection, Liberna DC(L)M, W 148
(olim: Arenberg nr. 1261 and private collection Liberna)
/
Extra: Antwerp, University of Antwerp, Ruusbroecgenootschap, RG 3114 B 13/
Extra: Antwerp, Letterenhuis, 18.105/
Extra: Antwerp, Hendrik Conscience Heritage Library, C 102837 [S0-268 g]Digital
15. d2 (Pauwels Stroobant)Printed book, complete1596–1617,Antwerp (Brabant)Mettingen, Draiflessen Collection, Liberna DC(L)M, W 147
(olim: Arenberg, nr. 1290 and private collection Liberna)
/
d3 (Stroobant – edition 1)Printed book, completeBrussels, Royal Library of Belgium, Cl 6513 RP/
d4 (Stroobant – edition 1)Printed book, completeAntwerp, Hendrik Conscience Heritage Library, F. 41486Digital
d5 (Stroobant – edition 1)Printed book, completeAmsterdam, Allard Pierson Depot OTM: OK 63–5636
(olim: University Library, 971 G 10)
/
Extra: Antwerp, Hendrik Conscience Heritage Library, 805634:05 [C2-559 i]Digital
Extra: Nijmegen, Radboud University, Central Library – Special Collections, OD TBI 201 c 159 nr.9/
Extra: Amsterdam, Allard Pierson Depot, OTM: O 06–8283/
16. d6 (Stroobant – edition 2)Printed book, complete1596–1617,Antwerp (Brabant)Mettingen, Draiflessen Collection, Liberna DC(L)M, W 149
(olim: Arenberg, nr. 1257 and private collection Liberna)
/
SCOLASTICA
SIGLUMMEDIUM/STATEDATE/REGIONREPOSITORY/SIGNATUREIMAGES
1. CManuscript, completeAround 1285,Brabant or FlandersBrussels, Royal Library of Belgium, 15001Digital
2. BManuscript, complete1300–1325,FlandersBrussels, Royal Library of Belgium, 19545Digital
3. DManuscript, complete1300–1350,FlandersThe Hague, National Library of the Netherlands, 76 E 16Partial
4. EManuscript, complete1300–1350,FlandersThe Hague, National Library of the Netherlands, 129 A 11/
5. AManuscript, complete1321, Waterduinen (Flanders)Berlin, Staatsbibliothek – Preußischer Kulturbesitz, Germ. Fol. 662/
6. MManuscript, completeAround 1330, UtrechtThe Hague, Museum Meermanno-Westreenianum, 10 B 21Digital
7. G (Gronings-Zutphens Maerlant Manuscript)Manuscript, complete1335–1339, Brabant or GelreGroningen, University Library, HS 405Digital
8. KManuscript, complete1370–1385, Northern Low CountriesLondon, British Library, Add. 10044/4
9. LManuscript, complete1393,Northern Low CountriesLondon, British Library, Add. 10045/
10. FManuscript, completeAround 1400,UtrechtThe Hague, National Library of the Netherlands, KA XVIII/
11. I (bu)Manuscript, completeAround 1434, Southern Low CountriesBrussels, Royal Library of Belgium, 720–722Digital
12. JManuscript, complete1450–1475, Low CountriesLeiden, University Library, Ltk 168IIIF
13. O (hk)Manuscript, complete1450–1475, Low CountriesThe Hague, National Library of the Netherlands, 75 E 20/
14. NManuscript, complete1453,Southern Low CountriesThe Hague, Museum Meermanno-Westreenianum, 10 C 19IIIF
15. HManuscript, complete1460–1470,Gelre or OverijsselLeiden, University Library, BPL 14 CIIIF
KAREL ENDE ELEGAST
SIGLUMMEDIUM/STATEDATE/REGIONREPOSITORY/SIGNATUREIMAGES
1. VManuscript, fragmentaryAround 1350, Brabant (?)Vercelli, Archivio do Stato, Prefettura, Giudiziarro, Fondo Antico, mazzo 41 (fasc. 2)/
2. GeManuscript, fragmentary1350–1400, Southwest BrabantGhent, University Library, HS. 896-AIIIF
3. MManuscript, fragmentary1350–1400, BrabantArras, Bibliothèque de la Ville, Ms. 227–383/
4. BrManuscript, fragmentary1350–1450, Southeast LimburgBrussels, Archives of the City of Brussels, hs. 1645/
5. GManuscript, fragmentary1350–1450, Southeast LimburgMunich, Bayerische Staatsbibliothek, Cod. Ger. 5249, Nr. 69IIIF
6. HManuscript, fragmentary1375–1400, Southeast LimburgThe Hague, National Library of the Netherlands, 131 D 5/
7. NManuscript, fragmentary1375–1400, FlandersNamur, Fonds de la Ville, en dépôt à la Société archéologique de Namur, inv. Ms 196 B 19/
8. FPrinted book, fragmentary1484–1488, [‘s-Hertogenbosch (Brabant)]Cambridge, University Library, Inc. 6 E 12.1/
9. APrinted book, complete1486–1488, [Delft (Holland)], [Jacob Jacobsz. van der Meer or Christiaen Snellaert]The Hague, National Library of the Netherlands, 169 G 63/
10. BPrinted book, complete1493-after 1500, [Antwerp (Brabant)], Govaert BacBerlin, Staatsbibliothek Preussischer Kulturbesitz, 8° Inc 4812IIIF
11. CPrinted book, complete1496–1499, Antwerp (Brabant)Washington, Library of Congress, Collection Lessing J. Rosenwald, Inc. X.K. 33IIIF
12. LPrinted book, completeAfter 3 July 1496, [Antwerp (Brabant)]St. Petersburg, National Library of Russia, 8.13.7.9/
13. DPrinted book, completec. 1530, [Antwerp (Brabant)], [Adriaen van Berghen, Jan van Doesborch or Jan Berntsz.]Brussels, Royal Library of Belgium, II 54948 A L.P.Digital
14. EPrinted book, completec. 1550–1596, Antwerp (Brabant), Jan van GhelenBrussels, Royal Library of Belgium, II 47686 A L.P./
Table 2

Overview of the transmission of the witnesses in terms of total number of verses, damaged verses and complete verses.

DIETSCHE CATOEN
SIGLUMTOTAL NUMBER OF VERSESVERSES WITH DAMAGEVERSES WITHOUT DAMAGE
A39337356
Me804040
L72171
M2330233
C2980298
b1081989
Br34034
H2610261
B2630263
P1880188
R72072
D2870287
G2060206
d1–d6206 (x6)0206 (x6)
TOTAL3731973634
SCOLASTICA
SIGLUMTOTAL NUMBER OF VERSESVERSES WITH DAMAGEVERSES WITHOUT DAMAGE
C111101111
B110701107
D110601106
E110801108
A111501115
M110701107
G111201112
K111511114
L111101111
F113201132
I7090709
J110701107
O4010401
N111401114
H8970897
TOTAL15352115351
KAREL ENDE ELEGAST
SIGLUMTOTAL NUMBER OF VERSESVERSES WITH DAMAGEVERSES WITHOUT DAMAGE
V21021
Ge6186171
M24114497
Br13010030
G1736167
H319102217
N1296123
F36036
A1381171364
B137101371
L118701187
D136301363
E1218721146
TOTAL955810648494

Repository location

doi: 10.5281/zenodo.15064631

Context

The individual projects of the authors have led to the creation of transcriptions of the texts featured in this paper. Constrained (2022–2026), the doctoral research project of Sofie Moors, seeks to combine this data for the purpose of studying the relation between scribal variation and formal elements. The resulting dataset is presented in this paper and is a continuation of a previously published dataset containing diplomatic editions of 17 witnesses of the Martijn Trilogy by 13th century poet Jacob van Maerlant (Moors, Kestemont & Sleiderink, 2024; see Haverals & Kestemont, 2023 for an example of further use of this dataset).

(2) Method

Steps

1) Data Collection

To study the manuscript and printed tradition of the texts in our corpora, we collected diplomatic, digital transcriptions. For most of the witnesses, existing editions could be digitised using Optical Character Recognition via ABBYY FineReader.5

  1. For Dietsche Catoen, the edition of Van Buuren (1998), which is fully available online, was used, which was supplemented in the same format by Sleiderink and Houtsma (2015), with the editions of two later discovered text witnesses (Me, Br), so that all extant textual witnesses are included in the final dataset.

  2. For the Scolastica, the research data were available thanks to Van Dalen-Oskam (2012). For her research, she took samples from the fifteen complete manuscripts of Scolastica. The surviving fragments were not included in the corpus (see Hutter, 2023 for an inventory).

  3. For Karel ende Elegast, all extant Middle Dutch textual witnesses are included. Most transcriptions were collected from the diplomatic editions by Duinhoven (1969, M, H, N, G, Br, A, B, C, D, E, F), Klein (1989, Ge) and Langeslag (2015, V). The text of L has not previously been published, but was transcribed by Duinhoven (transcription can be consulted in the National Library in The Hague, see also Duinhoven and Van Thienen, 1990).

2) Annotation & Processing

All the transcriptions were encoded using the TEI-MVN framework (via Oxygen XML Editor6) developed by Boot and Brinkman (2020). MVN (Middeleeuwse Verzamelhandschriften uit de Nederlanden) is an edition series specifically for Middle Dutch texts transmitted in multi-text codices. This series provides guidelines for paper as well as digital diplomatic editions, in addition to the standard TEI-XML. Even though the multi-text focus is not foregrounded in this current dataset, the MVN framework offers the most extensive guidelines for digitally editing Middle Dutch and therefore is the most suitable framework for the corpus. As the implementation of this framework is quite complex and labour intensive, a custom programming script to automatically generate XML files was created (Moors, Kestemont & Sleiderink, 2024). The output of the OCR (step 1) was saved as raw text files and enriched with simpler descriptive markup for verse, column and folio structure, capital letters, abbreviations, damage gaps, missing verses, deletions and additions, and uncertain readings. Spelling variation was dealt with according to the standards in Middle Dutch studies. Thus, the difference between i/j and u/v was encoded, and not adapted to modern spelling, but the different allographs of s (normal or long), r and y were not represented. The enriched text files were then automatically converted to XML files. Because this process is fully automated, it sometimes requires intervention and postprocessing to ensure the XML output is fully correct and diplomatic. For example, the script cannot properly handle more than one XML tag per token, thus when a token has multiple tagged properties (e.g. abbreviated and struck through), intervention is required to ensure both properties are visualised.

Additionally, short manuscript descriptions in TEI-XML were added in the XML-header, containing information on the text and its position in the manuscript <msItem>, a description of the physical object <physDesc>, including the (reconstructed) form of the textual witnesses, the material, the number of leaves, collation, leaf size and layout, the font or type and the decoration, and the <history>, including information on the dating and localisation of the witnesses, and, where known, contributors to their production. In the case of fragments, a description is given for the reconstructed codex (where possible) and in the case of convolute volumes the description is limited to the unit containing the text in question.

Sampling strategy

All available text of the witnesses was used, with the exception of those from the Scolastica, for which a selection of shorter samples of roughly two hundred verses was used. The sample selection was the result of the choices made for previous research (Van Dalen-Oskam, 2012). In line with the requirements of this study, the chosen passages concern female Bible protagonists. As transcriptions of these samples are available for all witnesses, the same passages were used here for pragmatic reasons. As this sample size offers a number of verses that is comparable to that of the Martijn Trilogy (Moors, Kestemont & Sleiderink, 2024), the decision was made not to transcribe the remainder of the text (35,000 verses).

Quality control

These prior editions served as a useful basis, but the transcription practices were not uniform. Therefore, a re-collation with photographic facsimiles of the original sources was necessary. The transcriptions were collated manually and normalised to obtain a uniform, maximally faithful diplomatic transcription of each witness. After collating with photographic copies/microfilms, some uncertain passages were further clarified through on-site examination in the holding institutions.

For a more detailed description of the workflow, see Moors, Kestemont and Sleiderink (2024).

(3) Dataset Description

Repository name

Zenodo. The associated scripts in Jupyter Notebooks (Python language) can be accessed through the Github page: www.github.com/SofieMoors/controlcorpus.

Object name

Three main folders: dietschecatoen, scolastica and karelendeelegast. Each folder contains the same file set:

data:

  • – html: html files with abbreviations expanded, html files with abbreviations unexpanded and extra folders with mvnhtml.css (the css view is optimised for macOS and iOS browsers)

  • – mvn: framework used (https://github.com/HuygensING/mvn-xml/tree/main/framework)

  • – plain_txt: text files generated from xml with markup applied (scripts > xml2txt.ipynb)

  • – rich_txt: raw text files, manually enriched with semantic markup

  • – viz: timeline and pixelplot

  • – xlsx: synoptic presentation of all witnesses (abbreviations expanded)

  • – xml: xml files generated from the rich_txt (scripts > txt2xml.ipynb)

scripts:

  • – txt2xml.ipynb: code to convert rich_txt to xml

  • – viz.ipynb: code to create viz

  • – xml2txt.ipynb: code to convert xml to plain_txt

  • – xml2xlsx.ipynb: code to convert xml to xlsx

Format names and versions

data: TXT, XML, HTML, XLSX; scripts: .ipynb

Creation dates

dietschecatoen: 2022–2023, scolastica: 2009–2024, karelendeelegast: 2023–2024

Dataset creators

Sofie Moors (University of Antwerp)

  • Dietsche Catoen (data collection, annotation & processing)

  • Scolastica (data annotation & processing)

  • Karel ende Elegast (data processing)

Nicky Voorneveld (University of Antwerp)

  • Karel ende Elegast (data collection & annotation)

Karina van Dalen-Oskam (Huygens Institute & University of Amsterdam)

  • Scolastica (data collection)

Kamiel Temmermans

  • Scolastica (data annotation)

Language

English

License

CC-BY-SA

Publication date

2025-03-21

(4) Reuse Potential

Since the transcriptions in this dataset are diplomatic, they lend themselves well to research on scribal attributions and scribal variation (e.g., Van Dalen-Oskam, 2012, 2014; Vandyck & Kestemont, 2024), stemmatology (e.g., Camps, Fernandez Riva & Gabay, 2021), phylogenetics (e.g., McCollum & Turnbull, 2024), and more. The monolingual nature of this dataset means that, as a standalone corpus, it is limited to use within a single linguistic area, however, while the data itself may be monolingual, it may be used for interlingual comparison with other existing datasets of medieval vernaculars, such as the Middle High German corpus of Flos unde Blankeflos (de Bruijn & Bastert, 2025), La Base de Français Médiéval (Guillot-Barbance, Heiden, & Lavrentiev, 2017) and the Middle English corpus of the Canterbury Tales (North, Bordalejo, Jones, & Robinson, 2020). Furthermore, it enriches an under-researched field with empirical evidence of scribal variation and in doing so fills an important gap in the available (digital) data on multiple textual witnesses of Middle Dutch works, which is now quite limited (e.g., Burgers, 2004; Hendriks & Kuiper, 2018; Moors, Kestemont & Sleiderink, 2024).

We do not offer an edition for scholarly reading, instead, the corpus has been optimised for computational analysis and collation by opting for consistency: a unique identifier was added to each verse and the line division was kept the same across witnesses (Guéville & Wrisley, 2022). Accessibility is further enhanced by the availability of multiple file formats, including the richly encoded XML files as well as HTML and XLSX files, which are more easily used by researchers of a less digital background.

Notes

[1] Karel ende Elegast is also extant in a Ripuarian manuscript (siglum K, Darmstadt, Universitäts- und Landesbibliothek, Hs. 2290), and a Middle High German manuscript (Zeitz, Stifts- und Domherrenbibliothek, 60). Due to their linguistic distance from the Middle Dutch witnesses, these were not included in the corpus.

[2] In the data folders you will find this text witness as ‘B2’ and not as ‘B’. This is because file names are not case sensitive and so otherwise there would be overlap with text witness ‘b’.

[3] Printed books G, d1 and d6 originally came from the collection of Constant Philippe Serrure, who later sold most of his Middle Dutch editions to the Duke of Arenberg (nos. 979, 1261 and 1257) (Cockx-Indestege & Delsaerdt, 2022, 47, 60, 625, 727–728). Van Buuren (1998, 186) writes that the four printed books from the Arenberg Collection (nos. 979, 1261, 1290 and 1257) are in a ‘private collection’ in the Netherlands. He does not mention this collection by name, but from Cockx-Indestege and Delsaerdt (2022) we can conclude that he referred to Bernard Brennikmeijer’s famous ‘Liberna Collection’, formerly in Hilversum (the Netherlands), but part of the Draiflessen Collection in Mettingen (Germany) since 2013. The books are not digitised, but we received pictures through the curator Guido Scholten (P. Delsaerdt & G. Scholten, personal communication, September 9–10, 2024).

[4] Due to a cyberattack on the website of the Britsh Library, it was not possible to verify whether these manuscripts had been digitised. Therefore, we checked through email with the Manuscripts Reference Services. They let us know that the manuscripts are not digitised and that there is no indication of planning to digitise them in future (personal communication, September 3, 2024).

[5] Retrieved from https://pdf.abbyy.com/ (last accessed: 21 December 2023).

[6] Retrieved from https://www.oxygenxml.com/ (last accessed: 21 December 2023).

Acknowledgements

We are grateful to Mike Kestemont for his help in writing the scripts and to Remco Sleiderink for his feedback and suggestions. We also want to thank Kamiel Temmerman for his transcription work. Many thanks to the libraries and library staff who kindly provided access to the witnesses in their collections.

Competing Interests

The authors have no competing interests to declare.

Author Contributions

  • – Sofie Moors: conceptualisation, data curation, formal analysis, investigation, methodology, resources, software, supervision, visualisation, writing (original draft/review & editing)

  • – Nicky Voorneveld: investigation, resources, writing (original draft/review & editing)

  • – Karina van Dalen-Oskam: resources, writing (review & editing)

DOI: https://doi.org/10.5334/johd.328 | Journal eISSN: 2059-481X
Language: English
Submitted on: Mar 21, 2025
Accepted on: May 19, 2025
Published on: Aug 4, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Sofie Moors, Nicky Voorneveld, Karina van Dalen-Oskam, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.