Have a personal or library account? Click to login
A Multi-Dimensional Evaluation Framework for Assessing LLM Performance in TEI Encoding Cover

A Multi-Dimensional Evaluation Framework for Assessing LLM Performance in TEI Encoding

By: Sabrina Strutz  
Open Access
|Mar 2026

Figures & Tables

Table 1

Task Taxonomy for TEI Encoding.

DIMENSIONTASK CATEGORYENCODING TASKS
0Format ConversionTransforming plain text into valid XML
1Source PreservationPreserving evidence of the source’s textual characteristics
2Schema ApplicationSelecting and applying appropriate TEI elements and attributes according to TEI P5 Guidelines and project-specific constraints
3Structural MarkupConstructing document scaffolding: segmenting texts into structural units (e.g., <div>, <opener>, <closer>, paragraph boundaries), and ensuring correct hierarchy and ordering
4Semantic MarkupAnnotating meaning-bearing spans and editorial phenomena, including named entities, temporal expressions, discourse markers, etc.
5Contextual EnrichmentLinking entities to authority records, resolving references, and normalisations
6Metadata ManagementExtracting and normalising descriptive or administrative metadata from sources, and enriching records with external information
7Collection ManagementMaintaining consistent encoding depth and conventions across documents, monitoring quality drift, and checking interoperability standards
johd-12-484-g1.png
Figure 1

Evaluation Framework and Tiered scoring with D0 as validity gate, D1-D4 contributing Final Score with Core Quality (equal 33.33% weighting D1, D3, D4), and D2 applying a multiplicative adjustment factor. D1 also provides context-aware weighting for D4.

johd-12-484-g2.png
Figure 2

OLMo2 processings with five different prompt configurations (10-letter sample).

johd-12-484-g3.png
Figure 3

Qwen3 processings (no thinking) with five different prompt configurations (10-letter sample).

johd-12-484-g4.png
Figure 4

GPT-5-mini processings with five different prompt configurations (10-letter sample).

johd-12-484-g5.png
Figure 5

Claude Sonnet 4.5 processings with five different prompt configurations (10-letter sample).

johd-12-484-g6.png
Figure 6

Cross-model comparison with 100-letter dataset.

DOI: https://doi.org/10.5334/johd.484 | Journal eISSN: 2059-481X
Language: English
Submitted on: Nov 18, 2025
|
Accepted on: Jan 19, 2026
|
Published on: Mar 2, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Sabrina Strutz, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.