Witnessing Middle Dutch Textual Traditions. Diplomatic Transcriptions of Dietsche Catoen, Scolastica, and Karel ende Elegast

Sofie Moors; Nicky Voorneveld; Karina van Dalen-Oskam

doi:10.5334/johd.328

Full Article

(1) Overview

The corpus of this dataset comprises diplomatic transcriptions of the textual witnesses of three Middle Dutch texts:

Dietsche Catoen, a 13^th century translation of the influential Latin school text Distichs of Cato, including approbation, prologue and epilogue where present (16 witnesses).
Rhymed Bible, also known as Scolastica, written by the Flemish author Jacob van Maerlant in 1271 (samples from 15 witnesses).
Karel ende Elegast, a chivalric romance in the tradition of Charlemagne epics, originating in Flanders, possibly in the 13^th century (14 witnesses¹).

The witnesses originate from different regions and periods, as presented in Table 1 (column 3), and include intact manuscripts and printed witnesses as well as fragmentary witnesses, amounting to around 28,600 (parallel) verses in total (Table 2).

Table 1

Overview of the siglum, medium/state, date/region and repository/signature, in chronological order (and alphabetical order when witnesses share a date(range)). The last column lists publicly available images. The dates and regions for Dietsche Catoen are based on Van Buuren (1998) (‘extra’ means the witnesses were not known by Van Buuren) and Sleiderink and Houtsma (2015), for Scolastica on Meuwese (2001), Van Dalen-Oskam (2012; 2014) and Hutter (2023), and for Karel ende Elegast on Duinhoven and Van Thienen (1990), Klein (1989, 1995), Caers (2011) and Langeslag (2015). We use ‘Low Countries’ as the region when specification within the language area was not possible. The Southern Low Countries include Flanders, Brabant and Limburg; the Northern Low Countries include Holland, Zeeland, Utrecht, Gelre and Overijssel.

DIETSCHE CATOEN
SIGLUM	MEDIUM/STATE	DATE/REGION	REPOSITORY/SIGNATURE	IMAGES
1. A (The Oudenaarde verse miscellany)	Manuscript, fragmentary	Around 1290, Flanders	Oudenaarde, Stadsarchief Handschriften en Zeldzame Drukken, nr. 32	Digital
2. Me	Manuscript, fragmentary	1350–1400, Brabant (?)	Mechelen, Stadsarchief AA Comptes Communaux, I, 3	/
3. L	Manuscript, fragmentary	After 1350 or 1400–1450, South Holland (Kienhorst, 2005, II 181)	Leiden, University Library, LTK 221	IIIF
4. M	Manuscript, complete	1375–1400, East Flanders	Munich, Bayerische Staatsbibliothek, Cod. Germ. 102.	IIIF
5. C (Comburg manuscript)	Manuscript, complete	1380–1425 (hand B: end of 14^th Century; Brinkman & Schenkel, 1997, 45), Flanders	Stuttgart, Württemberg State Library, Cod. Poet. Et phil. 2° 22	Digital
6. b	Manuscript, fragmentary	Around 1380 or 1400–1450,South–West Brabant (Kienhorst, 2005, II 25)	Berlin, Staatsbibliothek Preussischer Kulturbesitz, Ms. germ. fol. 751.8	IIIF
7. Br	Manuscript, fragmentary	1430–1440,Brabant (?)	Brussels, AR Rekenkamer, 41.234	/
8. H	Manuscript, complete	1475–1525,Low Countries	Middelburg, Zeeuwse Bibliotheek, Hs. 6353	Digital
9. B² (Jan Phillipsz.)	Manuscript, complete	Around 1478,Leiden (Holland) (Brinkman 1995, 16)	Berlin, Staatsbibliothek Preussischer Kulturbesitz, Ms. germ. quart. 557	/
10. P	Manuscript, interpolated stanzas	Around 1493,Bruges (?) (Flanders)	Paris, Bibliothèque Nationale Fonds Néerlandais, 106	IIIF
11. R (Jan Seversz.)	Printed book, fragmentary	1502–1524,Leiden (Holland)	Rijssel (Lille), University Library, signature: unknown	/
12. D (Henrick Eckert van Homberch)	Printed book, complete	Around 1500, Antwerp (Brabant)	The Hague, National Library of the Netherlands, 229 G 16	/
13. G (Jan van Ghelen)	Printed book, complete	Around 1540,Antwerp (Brabant)	Mettingen, Draiflessen Collection, Liberna DC(L)M, W 754 (olim: Arenberg nr. 979 and private collection Liberna in Hilversum)³	/
14. d1 (Hieronymus Verdussen)	Printed book, complete	1605,Antwerp (Brabant)	Mettingen, Draiflessen Collection, Liberna DC(L)M, W 148 (olim: Arenberg nr. 1261 and private collection Liberna)	/
			Extra: Antwerp, University of Antwerp, Ruusbroecgenootschap, RG 3114 B 13	/
			Extra: Antwerp, Letterenhuis, 18.105	/
			Extra: Antwerp, Hendrik Conscience Heritage Library, C 102837 [S0-268 g]	Digital
15. d2 (Pauwels Stroobant)	Printed book, complete	1596–1617,Antwerp (Brabant)	Mettingen, Draiflessen Collection, Liberna DC(L)M, W 147 (olim: Arenberg, nr. 1290 and private collection Liberna)	/
d3 (Stroobant – edition 1)	Printed book, complete		Brussels, Royal Library of Belgium, Cl 6513 RP	/
d4 (Stroobant – edition 1)	Printed book, complete		Antwerp, Hendrik Conscience Heritage Library, F. 41486	Digital
d5 (Stroobant – edition 1)	Printed book, complete		Amsterdam, Allard Pierson Depot OTM: OK 63–5636 (olim: University Library, 971 G 10)	/
			Extra: Antwerp, Hendrik Conscience Heritage Library, 805634:05 [C2-559 i]	Digital
			Extra: Nijmegen, Radboud University, Central Library – Special Collections, OD TBI 201 c 159 nr.9	/
			Extra: Amsterdam, Allard Pierson Depot, OTM: O 06–8283	/
16. d6 (Stroobant – edition 2)	Printed book, complete	1596–1617,Antwerp (Brabant)	Mettingen, Draiflessen Collection, Liberna DC(L)M, W 149 (olim: Arenberg, nr. 1257 and private collection Liberna)	/
*SCOLASTICA*
SIGLUM	MEDIUM/STATE	DATE/REGION	REPOSITORY/SIGNATURE	IMAGES
1. C	Manuscript, complete	Around 1285,Brabant or Flanders	Brussels, Royal Library of Belgium, 15001	Digital
2. B	Manuscript, complete	1300–1325,Flanders	Brussels, Royal Library of Belgium, 19545	Digital
3. D	Manuscript, complete	1300–1350,Flanders	The Hague, National Library of the Netherlands, 76 E 16	Partial
4. E	Manuscript, complete	1300–1350,Flanders	The Hague, National Library of the Netherlands, 129 A 11	/
5. A	Manuscript, complete	1321, Waterduinen (Flanders)	Berlin, Staatsbibliothek – Preußischer Kulturbesitz, Germ. Fol. 662	/
6. M	Manuscript, complete	Around 1330, Utrecht	The Hague, Museum Meermanno-Westreenianum, 10 B 21	Digital
7. G (Gronings-Zutphens Maerlant Manuscript)	Manuscript, complete	1335–1339, Brabant or Gelre	Groningen, University Library, HS 405	Digital
8. K	Manuscript, complete	1370–1385, Northern Low Countries	London, British Library, Add. 10044	/⁴
9. L	Manuscript, complete	1393,Northern Low Countries	London, British Library, Add. 10045	/
10. F	Manuscript, complete	Around 1400,Utrecht	The Hague, National Library of the Netherlands, KA XVIII	/
11. I (bu)	Manuscript, complete	Around 1434, Southern Low Countries	Brussels, Royal Library of Belgium, 720–722	Digital
12. J	Manuscript, complete	1450–1475, Low Countries	Leiden, University Library, Ltk 168	IIIF
13. O (hk)	Manuscript, complete	1450–1475, Low Countries	The Hague, National Library of the Netherlands, 75 E 20	/
14. N	Manuscript, complete	1453,Southern Low Countries	The Hague, Museum Meermanno-Westreenianum, 10 C 19	IIIF
15. H	Manuscript, complete	1460–1470,Gelre or Overijssel	Leiden, University Library, BPL 14 C	IIIF
*KAREL ENDE ELEGAST*
SIGLUM	MEDIUM/STATE	DATE/REGION	REPOSITORY/SIGNATURE	IMAGES
1. V	Manuscript, fragmentary	Around 1350, Brabant (?)	Vercelli, Archivio do Stato, Prefettura, Giudiziarro, Fondo Antico, mazzo 41 (fasc. 2)	/
2. Ge	Manuscript, fragmentary	1350–1400, Southwest Brabant	Ghent, University Library, HS. 896-A	IIIF
3. M	Manuscript, fragmentary	1350–1400, Brabant	Arras, Bibliothèque de la Ville, Ms. 227–383	/
4. Br	Manuscript, fragmentary	1350–1450, Southeast Limburg	Brussels, Archives of the City of Brussels, hs. 1645	/
5. G	Manuscript, fragmentary	1350–1450, Southeast Limburg	Munich, Bayerische Staatsbibliothek, Cod. Ger. 5249, Nr. 69	IIIF
6. H	Manuscript, fragmentary	1375–1400, Southeast Limburg	The Hague, National Library of the Netherlands, 131 D 5	/
7. N	Manuscript, fragmentary	1375–1400, Flanders	Namur, Fonds de la Ville, en dépôt à la Société archéologique de Namur, inv. Ms 196 B 19	/
8. F	Printed book, fragmentary	1484–1488, [‘s-Hertogenbosch (Brabant)]	Cambridge, University Library, Inc. 6 E 12.1	/
9. A	Printed book, complete	1486–1488, [Delft (Holland)], [Jacob Jacobsz. van der Meer or Christiaen Snellaert]	The Hague, National Library of the Netherlands, 169 G 63	/
10. B	Printed book, complete	1493-after 1500, [Antwerp (Brabant)], Govaert Bac	Berlin, Staatsbibliothek Preussischer Kulturbesitz, 8° Inc 4812	IIIF
11. C	Printed book, complete	1496–1499, Antwerp (Brabant)	Washington, Library of Congress, Collection Lessing J. Rosenwald, Inc. X.K. 33	IIIF
12. L	Printed book, complete	After 3 July 1496, [Antwerp (Brabant)]	St. Petersburg, National Library of Russia, 8.13.7.9	/
13. D	Printed book, complete	c. 1530, [Antwerp (Brabant)], [Adriaen van Berghen, Jan van Doesborch or Jan Berntsz.]	Brussels, Royal Library of Belgium, II 54948 A L.P.	Digital
14. E	Printed book, complete	c. 1550–1596, Antwerp (Brabant), Jan van Ghelen	Brussels, Royal Library of Belgium, II 47686 A L.P.	/

Table 2

Overview of the transmission of the witnesses in terms of total number of verses, damaged verses and complete verses.

DIETSCHE CATOEN
SIGLUM	TOTAL NUMBER OF VERSES	VERSES WITH DAMAGE	VERSES WITHOUT DAMAGE
A	393	37	356
Me	80	40	40
L	72	1	71
M	233	0	233
C	298	0	298
b	108	19	89
Br	34	0	34
H	261	0	261
B	263	0	263
P	188	0	188
R	72	0	72
D	287	0	287
G	206	0	206
d1–d6	206 (x6)	0	206 (x6)
TOTAL	3731	97	3634
*SCOLASTICA*
SIGLUM	TOTAL NUMBER OF VERSES	VERSES WITH DAMAGE	VERSES WITHOUT DAMAGE
C	1111	0	1111
B	1107	0	1107
D	1106	0	1106
E	1108	0	1108
A	1115	0	1115
M	1107	0	1107
G	1112	0	1112
K	1115	1	1114
L	1111	0	1111
F	1132	0	1132
I	709	0	709
J	1107	0	1107
O	401	0	401
N	1114	0	1114
H	897	0	897
TOTAL	15352	1	15351
*KAREL ENDE ELEGAST*
SIGLUM	TOTAL NUMBER OF VERSES	VERSES WITH DAMAGE	VERSES WITHOUT DAMAGE
V	21	0	21
Ge	618	617	1
M	241	144	97
Br	130	100	30
G	173	6	167
H	319	102	217
N	129	6	123
F	36	0	36
A	1381	17	1364
B	1371	0	1371
L	1187	0	1187
D	1363	0	1363
E	1218	72	1146
TOTAL	9558	1064	8494

Repository location

doi: 10.5281/zenodo.15064631

Context

The individual projects of the authors have led to the creation of transcriptions of the texts featured in this paper. Constrained (2022–2026), the doctoral research project of Sofie Moors, seeks to combine this data for the purpose of studying the relation between scribal variation and formal elements. The resulting dataset is presented in this paper and is a continuation of a previously published dataset containing diplomatic editions of 17 witnesses of the Martijn Trilogy by 13^th century poet Jacob van Maerlant (Moors, Kestemont & Sleiderink, 2024; see Haverals & Kestemont, 2023 for an example of further use of this dataset).

(2) Method

Steps

1) Data Collection

To study the manuscript and printed tradition of the texts in our corpora, we collected diplomatic, digital transcriptions. For most of the witnesses, existing editions could be digitised using Optical Character Recognition via ABBYY FineReader.⁵

For Dietsche Catoen, the edition of Van Buuren (1998), which is fully available online, was used, which was supplemented in the same format by Sleiderink and Houtsma (2015), with the editions of two later discovered text witnesses (Me, Br), so that all extant textual witnesses are included in the final dataset.
For the Scolastica, the research data were available thanks to Van Dalen-Oskam (2012). For her research, she took samples from the fifteen complete manuscripts of Scolastica. The surviving fragments were not included in the corpus (see Hutter, 2023 for an inventory).
For Karel ende Elegast, all extant Middle Dutch textual witnesses are included. Most transcriptions were collected from the diplomatic editions by Duinhoven (1969, M, H, N, G, Br, A, B, C, D, E, F), Klein (1989, Ge) and Langeslag (2015, V). The text of L has not previously been published, but was transcribed by Duinhoven (transcription can be consulted in the National Library in The Hague, see also Duinhoven and Van Thienen, 1990).

2) Annotation & Processing

All the transcriptions were encoded using the TEI-MVN framework (via Oxygen XML Editor⁶) developed by Boot and Brinkman (2020). MVN (Middeleeuwse Verzamelhandschriften uit de Nederlanden) is an edition series specifically for Middle Dutch texts transmitted in multi-text codices. This series provides guidelines for paper as well as digital diplomatic editions, in addition to the standard TEI-XML. Even though the multi-text focus is not foregrounded in this current dataset, the MVN framework offers the most extensive guidelines for digitally editing Middle Dutch and therefore is the most suitable framework for the corpus. As the implementation of this framework is quite complex and labour intensive, a custom programming script to automatically generate XML files was created (Moors, Kestemont & Sleiderink, 2024). The output of the OCR (step 1) was saved as raw text files and enriched with simpler descriptive markup for verse, column and folio structure, capital letters, abbreviations, damage gaps, missing verses, deletions and additions, and uncertain readings. Spelling variation was dealt with according to the standards in Middle Dutch studies. Thus, the difference between i/j and u/v was encoded, and not adapted to modern spelling, but the different allographs of s (normal or long), r and y were not represented. The enriched text files were then automatically converted to XML files. Because this process is fully automated, it sometimes requires intervention and postprocessing to ensure the XML output is fully correct and diplomatic. For example, the script cannot properly handle more than one XML tag per token, thus when a token has multiple tagged properties (e.g. abbreviated and struck through), intervention is required to ensure both properties are visualised.

Additionally, short manuscript descriptions in TEI-XML were added in the XML-header, containing information on the text and its position in the manuscript <msItem>, a description of the physical object <physDesc>, including the (reconstructed) form of the textual witnesses, the material, the number of leaves, collation, leaf size and layout, the font or type and the decoration, and the <history>, including information on the dating and localisation of the witnesses, and, where known, contributors to their production. In the case of fragments, a description is given for the reconstructed codex (where possible) and in the case of convolute volumes the description is limited to the unit containing the text in question.

Sampling strategy

All available text of the witnesses was used, with the exception of those from the Scolastica, for which a selection of shorter samples of roughly two hundred verses was used. The sample selection was the result of the choices made for previous research (Van Dalen-Oskam, 2012). In line with the requirements of this study, the chosen passages concern female Bible protagonists. As transcriptions of these samples are available for all witnesses, the same passages were used here for pragmatic reasons. As this sample size offers a number of verses that is comparable to that of the Martijn Trilogy (Moors, Kestemont & Sleiderink, 2024), the decision was made not to transcribe the remainder of the text (35,000 verses).

Quality control

These prior editions served as a useful basis, but the transcription practices were not uniform. Therefore, a re-collation with photographic facsimiles of the original sources was necessary. The transcriptions were collated manually and normalised to obtain a uniform, maximally faithful diplomatic transcription of each witness. After collating with photographic copies/microfilms, some uncertain passages were further clarified through on-site examination in the holding institutions.

For a more detailed description of the workflow, see Moors, Kestemont and Sleiderink (2024).

(3) Dataset Description

Repository name

Zenodo. The associated scripts in Jupyter Notebooks (Python language) can be accessed through the Github page: www.github.com/SofieMoors/controlcorpus.

Object name

Three main folders: dietschecatoen, scolastica and karelendeelegast. Each folder contains the same file set:

data:

– html: html files with abbreviations expanded, html files with abbreviations unexpanded and extra folders with mvnhtml.css (the css view is optimised for macOS and iOS browsers)
– mvn: framework used (https://github.com/HuygensING/mvn-xml/tree/main/framework)
– plain_txt: text files generated from xml with markup applied (scripts > xml2txt.ipynb)
– rich_txt: raw text files, manually enriched with semantic markup
– viz: timeline and pixelplot
– xlsx: synoptic presentation of all witnesses (abbreviations expanded)
– xml: xml files generated from the rich_txt (scripts > txt2xml.ipynb)

scripts:

– txt2xml.ipynb: code to convert rich_txt to xml
– viz.ipynb: code to create viz
– xml2txt.ipynb: code to convert xml to plain_txt
– xml2xlsx.ipynb: code to convert xml to xlsx

Format names and versions

data: TXT, XML, HTML, XLSX; scripts: .ipynb

Creation dates

dietschecatoen: 2022–2023, scolastica: 2009–2024, karelendeelegast: 2023–2024

Dataset creators

Sofie Moors (University of Antwerp)

– Dietsche Catoen (data collection, annotation & processing)
– Scolastica (data annotation & processing)
– Karel ende Elegast (data processing)

Nicky Voorneveld (University of Antwerp)

– Karel ende Elegast (data collection & annotation)

Karina van Dalen-Oskam (Huygens Institute & University of Amsterdam)

– Scolastica (data collection)

Kamiel Temmermans

– Scolastica (data annotation)

Language

English

License

CC-BY-SA

Publication date

2025-03-21

(4) Reuse Potential

Since the transcriptions in this dataset are diplomatic, they lend themselves well to research on scribal attributions and scribal variation (e.g., Van Dalen-Oskam, 2012, 2014; Vandyck & Kestemont, 2024), stemmatology (e.g., Camps, Fernandez Riva & Gabay, 2021), phylogenetics (e.g., McCollum & Turnbull, 2024), and more. The monolingual nature of this dataset means that, as a standalone corpus, it is limited to use within a single linguistic area, however, while the data itself may be monolingual, it may be used for interlingual comparison with other existing datasets of medieval vernaculars, such as the Middle High German corpus of Flos unde Blankeflos (de Bruijn & Bastert, 2025), La Base de Français Médiéval (Guillot-Barbance, Heiden, & Lavrentiev, 2017) and the Middle English corpus of the Canterbury Tales (North, Bordalejo, Jones, & Robinson, 2020). Furthermore, it enriches an under-researched field with empirical evidence of scribal variation and in doing so fills an important gap in the available (digital) data on multiple textual witnesses of Middle Dutch works, which is now quite limited (e.g., Burgers, 2004; Hendriks & Kuiper, 2018; Moors, Kestemont & Sleiderink, 2024).

We do not offer an edition for scholarly reading, instead, the corpus has been optimised for computational analysis and collation by opting for consistency: a unique identifier was added to each verse and the line division was kept the same across witnesses (Guéville & Wrisley, 2022). Accessibility is further enhanced by the availability of multiple file formats, including the richly encoded XML files as well as HTML and XLSX files, which are more easily used by researchers of a less digital background.

Notes

[1] Karel ende Elegast is also extant in a Ripuarian manuscript (siglum K, Darmstadt, Universitäts- und Landesbibliothek, Hs. 2290), and a Middle High German manuscript (Zeitz, Stifts- und Domherrenbibliothek, 60). Due to their linguistic distance from the Middle Dutch witnesses, these were not included in the corpus.

[2] In the data folders you will find this text witness as ‘B2’ and not as ‘B’. This is because file names are not case sensitive and so otherwise there would be overlap with text witness ‘b’.

[3] Printed books G, d1 and d6 originally came from the collection of Constant Philippe Serrure, who later sold most of his Middle Dutch editions to the Duke of Arenberg (nos. 979, 1261 and 1257) (Cockx-Indestege & Delsaerdt, 2022, 47, 60, 625, 727–728). Van Buuren (1998, 186) writes that the four printed books from the Arenberg Collection (nos. 979, 1261, 1290 and 1257) are in a ‘private collection’ in the Netherlands. He does not mention this collection by name, but from Cockx-Indestege and Delsaerdt (2022) we can conclude that he referred to Bernard Brennikmeijer’s famous ‘Liberna Collection’, formerly in Hilversum (the Netherlands), but part of the Draiflessen Collection in Mettingen (Germany) since 2013. The books are not digitised, but we received pictures through the curator Guido Scholten (P. Delsaerdt & G. Scholten, personal communication, September 9–10, 2024).

[4] Due to a cyberattack on the website of the Britsh Library, it was not possible to verify whether these manuscripts had been digitised. Therefore, we checked through email with the Manuscripts Reference Services. They let us know that the manuscripts are not digitised and that there is no indication of planning to digitise them in future (personal communication, September 3, 2024).

[5] Retrieved from https://pdf.abbyy.com/ (last accessed: 21 December 2023).

[6] Retrieved from https://www.oxygenxml.com/ (last accessed: 21 December 2023).

Acknowledgements

We are grateful to Mike Kestemont for his help in writing the scripts and to Remco Sleiderink for his feedback and suggestions. We also want to thank Kamiel Temmerman for his transcription work. Many thanks to the libraries and library staff who kindly provided access to the witnesses in their collections.

Competing Interests

The authors have no competing interests to declare.

Author Contributions

– Sofie Moors: conceptualisation, data curation, formal analysis, investigation, methodology, resources, software, supervision, visualisation, writing (original draft/review & editing)
– Nicky Voorneveld: investigation, resources, writing (original draft/review & editing)
– Karina van Dalen-Oskam: resources, writing (review & editing)