Have a personal or library account? Click to login
ARletta. Open-Source Handwritten Text Recognition Models for Historic Dutch Cover

ARletta. Open-Source Handwritten Text Recognition Models for Historic Dutch

Open Access
|Jul 2024

Figures & Tables

johd-10-225-g1.png
Figure 1

Example of an incident (FelixArchief, 731#1604, p. 211).

Table 1

Transcription of Figure 1 and translation to English (FelixArchief, 731#1604, p. 211).

1 Mei 1885
Mr Sergoynne
à verbaliser
D
Om 9 1/2 uren ‘s avonds heb ik gezien dat de meid uit het huis n° 57 der Leopoldst matten uitklopte voor hare woning tegen den muer van het gasthuis toen ik er naartoe ging liep zy binnen en is niet meer buiten gekomen (zij veroorzaakte eenen over vloed van stof)
Broes, Dymphna, 16 jaar, geb. te Minderhout, meid
Wonende Leopold str. 57.
Comme cette fille est ci jeune et qu’elle savait pas mieux nous n’y avons pas donné suite [onleesbare handtekening]
1 May 1885
Mr Sergoynne
draw up an official report
D
At 9 1/2 hours in the evening I saw that the maid from the house N° 57 in the Leopold street was beating rugs against the wall of the guesthouse when I came over she ran away inside the house and didn’t come outside any more (she caused a flood of dust).
Broes, Dymphna, 16 years, born in Minderhout, maid
residing in Leopoldstr 57.
Because she was so young and she didn’t know better, we didn’t follow up on this. [unreadable signature-mark]
johd-10-225-g2.png
Figure 2

Visualization of the training process.

johd-10-225-g3.png
Figure 3

An example of a page with annotated text regions (green) and baselines (pink) (FelixArchief, MA#17612, p. 85).

Table 2

Train-validation splits for each dataset.

DATASETSUBSETPAGESREGIONSLINESLINE LENGTH*CHARACTERSVOCABULARY**
VOCtrain4261607913261136.37 (±17.84)4823321120
valid4746551515435.79 (±17.57)542321101
Notarialtrain145336249200335.03 (±20.03)3222690107
valid162377955435.78 (±18.67)34184196
Antw-experttrain24318281076630.63 (±17.10)32980689
valid28209126029.59 (±16.77)3728883
Antw-studentstrain30992749614538729.00 (±17.76)4216129118
valid34530891619628.92 (±17.85)468359106
Antw-testtest101715462830.18 (±17.09)13965890

[i] * Line length expressed in characters (including whitespace). Mean and standard deviation are reported.

** Number of unique characters in the character vocabulary.

Table 3

HTR training results (CAR).

MODELVALIDATION*ANTW-TESTANTW-TEST (RELAXED)**
Manu (base model)NA70.51%72.24%
Manu: VOC94.36%76.57%78.95%
Manu: VOC → Notarial95.68%83.04%85.22%
Manu: VOC → Notarial → Antwexpert91.47%90.01%91.57%
Manu: VOC → Notarial → Antwexpert + Antwstudent89.97%92.58%93.97%
Antwexpert + Antwstudent (scratch)89.05%91.54%92.88%
Manu: VOC + Notarial + Antwexpert + Antwstudent (super model)92.78%92.31%93.69%

[i] *Note that the validation scores cannot be directly compared across model rows below, but are still useful because they give the reader a sense of in-domain model fit.

**Exclusion of whitespace, punctuation and capital letters.

DOI: https://doi.org/10.5334/johd.225 | Journal eISSN: 2059-481X
Language: English
Submitted on: May 21, 2024
Accepted on: Jun 19, 2024
Published on: Jul 11, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Lith Lefranc, Ilja Van Damme, Thibault Clérice, Mike Kestemont, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.