Have a personal or library account? Click to login
A Multi-Dimensional Evaluation Framework for Assessing LLM Performance in TEI Encoding Cover

A Multi-Dimensional Evaluation Framework for Assessing LLM Performance in TEI Encoding

By: Sabrina Strutz  
Open Access
|Mar 2026

References

  1. Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., … Amodei, D. (2020). Language Models are Few-Shot Learners. In Advances in neural information processing systems (Vol. 33, pp. 18771901). Retrieved from https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
  2. Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., … Xie, X. (2023). A Survey on Evaluation of Large Language Models. Retrieved from http://arxiv.org/abs/2307.03109 (last accessed 10 August 2024).
  3. Cummings, J. (2019). A world of difference: Myths and misconceptions about the TEI. Digital Scholarship in the Humanities, 34, i58i79. 10.1093/llc/fqy071
  4. De Cristofaro, M., & Zilio, D. (2025). Automating XML-TEI Encoding of Unpublished Correspondence: A Comparative Analysis of two LLM Approaches. Quaderni di Umanistica Digitale. 10.6092/UNIBO/AMSACTA/8380
  5. DeRose, S. J. (2024). Can LLMs help with XML? Retrieved from https://www.balisage.net/Proceedings/vol29/print/DeRose01/BalisageVol29-DeRose01.html (last accessed 2 August 2025).
  6. Ding, B., Qin, C., Liu, L., Chia, Y. K., Joty, S., Li, B., & Bing, L. (2023). Is GPT-3 a Good Data Annotator? In Proceedings of the 61st annual meeting of the association for computational linguistics (pp. 1117311195). Retrieved from https://aclanthology.org/2023.acl-long.625
  7. Dobson, J. (2020). Interpretable Outputs: Criteria for Machine Learning in the Humanities. Digital Humanities Quarterly, 15(2).
  8. Forney, C., Haaf, S., & Kirsten, L. (2020). Letter Openers and Closers. Retrieved from https://encoding-correspondence.bbaw.de/v1/openers-closers.html#c-3-2 (last accessed 9 November 2025).
  9. Franken, L., Koch, G., & Zinsmeister, H. (2020). Observations on Annotations. In J. Nantke & F. Schlupkothen (Eds.), Annotations in Scholarly Editions and Research (pp. 299324). De Gruyter. 10.1515/9783110689112-014
  10. Gilardi, F., Alizadeh, M., & Kubli, M. (2023). ChatGPT outperforms crowd workers for text-annotation tasks. Proceedings of the National Academy of Sciences, 120(30). 10.1073/pnas.2305016120
  11. Guo, Z., Jin, R., Liu, C., Huang, Y., Shi, D., Supryadi, … Xiong, D. (2023). Evaluating Large Language Models: A Comprehensive Survey. Retrieved from http://arxiv.org/abs/2310.19736 (last accessed 03 March 2024).
  12. Henny, U. (2018). Reviewing von digitalen Editionen im Kontext der Evaluation digitaler Forschungsergebnisse. Sonderband der Zeitschrift für digitale Geisteswissenschaften, 2. 10.17175/sb002_006
  13. Höflechner, W. (2021). Joseph von Hammer-Purgstall 1774–1856. Ein altösterreichisches Gelehrtenleben. Eine Annäherung. ADEVA.
  14. McGillivray, B., Poibeau, T., & Fabo, P. R. (2020). Digital Humanities and Natural Language Processing: “Je t’aime… moi non plus”. Digital Humanities Quarterly, 14(2).
  15. Pagel, A., Pichler, A., & Reiter, N. (2024). Über Prompt Brittleness, Prompt Generalisierbarkeit und Prompt Optimierung. Erste Erkenntnisse aus Fallstudien in den Computational Literary Studies. Retrieved from https://web.archive.org/web/20241121155350/https://agki-dh.github.io/pages/webinar/page-10.html (last accessed 23 September 2025).
  16. Pollin, C., Czmiel, A., Dumont, S., Fischer, F., Sahle, P., Schaßan, T., … Henny-Krahmer, U. (2024). Generative KI, LLMs und GPT bei digitalen Editionen. In Dhd 2024 quo vadis dh. Passau, Deutschland. 10.5281/zenodo.10698210
  17. Pollin, C., Fischer, F., Sahle, P., Scholger, M., & Vogeler, G. (2025). When it was 2024 – Generative AI in the Field of Digital Scholarly Editions. Zeitschrift für digitale Geisteswissenschaften, 10. 10.17175/2025_008
  18. Pollin, C., Steiner, C., & Zach, B. (2023). New Ways of Creating Research Data: Conversion of Unstructured Text to TEI XML using GPT on the Correspondence of Hugo Schuchardt with a Web Prototype for Prompt Engineering. Retrieved from https://chpollin.github.io/GM-DH/ (last accessed 27 October 2025).
  19. Rastinger, N. (2024). Named Entity Recognition mit LLMs. Retrieved from https://web.archive.org/web/20241121155350/https://agki-dh.github.io/slides/06_1_nlp_llm.pdf (last accessed 23 September 2025).
  20. Sahle, P. (2014). Kriterien für die Besprechung digitaler Editionen. Retrieved from https://www.i-d-e.de/publikationen/weitereschriften/kriterien-version-1-1/ (unter Mitarbeit von Georg Vogeler und den Mitgliedern des IDE, v1.1, last accessed 28 October 2025).
  21. Santini, C. (2024). Combining language models for knowledge extraction from Italian TEI editions. Frontiers in Computer Science, 6. 10.3389/fcomp.2024.1472512
  22. Scholger, M., Strutz, S., & Pollin, C. (2024). Empowering Text Encoding with Large Language Models: Benefits and Challenges. Retrieved from https://zenodo.org/records/13969082 (last accessed 26 May 2025).
  23. Somala, V., & Emberson, L. (2025). Frontier AI performance becomes accessible on consumer hardware within a year. Retrieved from https://epoch.ai/data-insights/consumer-gpu-model-gap (last accessed 31 October 2025).
  24. Strutz, S. (2025). Towards an Evaluation Framework for Assessing Large Language Models in Text Encoding. In Adho digital humanities conference 2025 (dh 2025). NOVA FCSH, Lissabon: Zenodo. 10.5281/zenodo.16364518
  25. Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., … Fedus, W. (2022). Emergent Abilities of Large Language Models. Transactions on Machine Learning Research. Retrieved from https://openreview.net/forum?id=yzkSU5zdwD (last accessed 10 August 2024).
  26. Yang, J., Jiang, D., He, L., Siu, S., Zhang, Y., Liao, D., … Chen, W. (2025). StructEval: Benchmarking LLMs’ Capabilities to Generate Structural Outputs. Retrieved from http://arxiv.org/abs/2505.20139 (arXiv:2505.20139).
DOI: https://doi.org/10.5334/johd.484 | Journal eISSN: 2059-481X
Language: English
Submitted on: Nov 18, 2025
|
Accepted on: Jan 19, 2026
|
Published on: Mar 2, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Sabrina Strutz, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.