Skip to main content
Have a personal or library account? Click to login
OpenITI MAKHZAN: An Open Annotated Dataset of Arabic, Persian, Ottoman Turkish, and Urdu Print and Manuscript Data Cover

OpenITI MAKHZAN: An Open Annotated Dataset of Arabic, Persian, Ottoman Turkish, and Urdu Print and Manuscript Data

Open Access
|May 2026

References

  1. Alghamdi, M., & Teahan, W. (2017). Experimental Evaluation of Arabic OCR Systems. PSU Research Review, 1(3), 229241. 10.1108/PRR-05-2017-0026
  2. Allen, J. P. (Forthcoming). Navigating the Ink-Dark Sea of Arabic Script Typography: OpenITI’s Typeface Evaluation and Data Production. In Al-ʿUṣūr al-Wusṭā: The Journal of Middle East Medievalists.
  3. Clausner, C, Antonacopoulos, A., McGregor, N., & Wilson-Nunn, D. (2018). ICFHR 2018 Competition on Recognition of Historical Arabic Scientific Manuscripts–RASM2018. In 16th International Conference on Frontiers in Handwriting Recognition (ICFHR) (pp. 471476). 10.1109/ICFHR-2018.2018.00088
  4. Keinan-Schoonbaert, A. (September 2019). Results of the RASM2019 Competition on Recognition of Historical Arabic Scientific Manuscripts. British Library Digital Scholarship Blog. https://blogs.bl.uk/digital-scholarship/2019/09/rasm2019-results.html
  5. Keinan-Schoonbaert, A. (January 2020). Using Transkribus for Arabic Handwritten Text Recognition. British Library Digital Scholarship Blog. https://blogs.bl.uk/digital-scholarship/2020/01/using-transkribus-for-arabic-handwritten-text-recognition.html
  6. Kiessling, B. (2019). Kraken—An Universal Text Recognizer for the Humanities. In Proceedings of the Digital Humanities Conference.
  7. Kiessling, B., Kurin, G., Miller, M. T., & Smail, K. (2021). Advances and Limitations in Open Source Arabic-Script OCR: A Case Study. Digital Studies / Le champ numérique, 11(1). 10.16995/dscn.8094
  8. Kiessling, B., Miller, M. T., Romanov, M. G., & Savant, S. B. (2017). Important New Developments in Arabographic Optical Character Recognition (OCR). Al-ʿUṣūr al-Wusṭā: The Journal of Middle East Medievalists, 25(1), 113. 10.7916/alusur.v25i1.6996
  9. Milo, T. (2013). Bodoni’s Arabic, Some Observations. In O. Riccardo & P. Jonathan (Eds.), Compulsive Bodoni and the Parmigiano Typographic System (pp. 95103). UvA Special Collections.
  10. Panagiotidou, G., Lamqaddam, H., Poblome, J., Brosens, K., Verbert, K., & Moere, A. V. (2022). Communicating Uncertainty in Digital Humanities Visualization Research. IEEE Transactions on Visualization and Computer Graphics (pp. 111). 10.1109/TVCG.2022.3209436
  11. Smith, D. A., Murel, J., Allen, J. P., & Miller, M. T. (2023). Automatic Collation for Diversifying Corpora: Commonly Copied Texts as Distant Supervision for Handwritten Text Recognition. In A. Šeļa, F. Jannidis, & I. Romanowska (Eds.), Proceedings of the Computational Humanities Research Conference 2023 (CHR 2023) (pp. 206221). CEUR Workshop Proceedings, Vol. 3558. https://ceur-ws.org/Vol-3558/paper1708.pdf
  12. Vogler, N., Allen, J. P., Miller, M. T., & Berg-Kirkpatrick, T. (2022). Lacuna Reconstruction: Self-Supervised Pre-Training for Low-Resource Historical Document Transcription. In C. Marine, M.-C. de Marneffe, & I. V. Meza Ruiz (Eds.), Findings of the Association for Computational Linguistics: NAACL 2022 (pp. 206216). Association for Computational Linguistics. 10.18653/v1/2022.findings-naacl.15
DOI: https://doi.org/10.5334/johd.465 | Journal eISSN: 2059-481X
Language: English
Page range: 69 - 69
Submitted on: Nov 9, 2025
Accepted on: May 5, 2026
Published on: May 26, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Jonathan Parkes Allen, John Mullan, Lorenz Nigst, Mathew Barber, Taimoor Shahid-Khan, Masoumeh Seydi, Danlu Chen, Yufei Weng, Nikolai Vogler, Jacob Murel, Osama Eshera, Taylor Berg-Kirkpatrick, David Smith, Sarah Bowen Savant, Matthew Thomas Miller, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.