Table 1
Task Taxonomy for TEI Encoding.
| DIMENSION | TASK CATEGORY | ENCODING TASKS |
|---|---|---|
| 0 | Format Conversion | Transforming plain text into valid XML |
| 1 | Source Preservation | Preserving evidence of the source’s textual characteristics |
| 2 | Schema Application | Selecting and applying appropriate TEI elements and attributes according to TEI P5 Guidelines and project-specific constraints |
| 3 | Structural Markup | Constructing document scaffolding: segmenting texts into structural units (e.g., <div>, <opener>, <closer>, paragraph boundaries), and ensuring correct hierarchy and ordering |
| 4 | Semantic Markup | Annotating meaning-bearing spans and editorial phenomena, including named entities, temporal expressions, discourse markers, etc. |
| 5 | Contextual Enrichment | Linking entities to authority records, resolving references, and normalisations |
| 6 | Metadata Management | Extracting and normalising descriptive or administrative metadata from sources, and enriching records with external information |
| 7 | Collection Management | Maintaining consistent encoding depth and conventions across documents, monitoring quality drift, and checking interoperability standards |

Figure 1
Evaluation Framework and Tiered scoring with D0 as validity gate, D1-D4 contributing Final Score with Core Quality (equal 33.33% weighting D1, D3, D4), and D2 applying a multiplicative adjustment factor. D1 also provides context-aware weighting for D4.

Figure 2
OLMo2 processings with five different prompt configurations (10-letter sample).

Figure 3
Qwen3 processings (no thinking) with five different prompt configurations (10-letter sample).

Figure 4
GPT-5-mini processings with five different prompt configurations (10-letter sample).

Figure 5
Claude Sonnet 4.5 processings with five different prompt configurations (10-letter sample).

Figure 6
Cross-model comparison with 100-letter dataset.
