Have a personal or library account? Click to login
An HPC-Ready, Wikidata-Based Workflow for Exploratory Geocoding of Unstructured Textual Corpora Cover

An HPC-Ready, Wikidata-Based Workflow for Exploratory Geocoding of Unstructured Textual Corpora

By: Annie K. Lamar  
Open Access
|Dec 2025

References

  1. Athens. (n.d.). Retrieved September 23, 2025, from https://www.wikidata.org/wiki/Q1524
  2. Bagnall, R. (Ed.). (2016). Pleiades: A Gazetteer of Past Places. Retrieved September 30, 2025. pleiades.stoa.org
  3. Bai, X., Jiao, X., Sakai, T., & Xu, H. (2024). Mapping the past with historical geographic information systems: Layered characteristics of the historic urban landscape of Nanjing. China, since the Ming Dynasty (1368–2024). Heritage Science, 12(1), 283. 10.1186/s40494-024-01400-4
  4. Bamman, D., & Smith, N. A. (2014). Unsupervised Discovery of Biographical Structure from Text. Transactions of the Association for Computational Linguistics, 2, 363376. 10.1162/tacl_a_00189
  5. Bodenhamer, D. J., Corrigan, J., & Harris, T. M. (Eds.). (2010). The spatial humanities: GIS and the future of humanities scholarship. Indiana University Press. 10.2979/5864.0s
  6. Bushell, S. (2020). Reading and mapping fiction: Spatialising the literary text. Cambridge: Cambridge University Press. 10.1017/9781108766876
  7. Bushell, S., & Hutcheon, R. L. (2025). New approaches for digital literary mapping: Chronotopic cartography. Cambridge University Press. 10.1017/9781009353632
  8. Devinney, H., Eklund, A., Ryazanov, I., & Cai, J. (2023). Developing a Multilingual Corpus of Wikipedia Biographies. In R. Mitkov & G. Angelova (Eds.), Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing (pp. 285294). INCOMA Ltd., Shoumen, Bulgaria. https://aclanthology.org/2023.ranlp-1.32/
  9. ESRI. (2024). ArcGIS World Geocoding [Computer software]. Retrieved December 8, 2025, from https://www.arcgis.com/home/item.html?id=305f2e55e67f4389bef269669fc2e284,
  10. Fischer, F., Börner, I., Göbel, M., Hechtl, A., Kittel, C., Milling, C., & Trilcke, P. (2019). Programmable Corpora: Introducing DraCor, an Infrastructure for the Research on European Drama. Proceedings of DH2019. Utrecht University. 10.5281/ZENODO.4284001
  11. Getty Research Institute. (2017). Getty Thesaurus of Geographic Names Online (TGN) [Dataset]. Retrieved September 30, 2025, from https://www.getty.edu/research/tools/vocabularies/tgn
  12. Google. (n.d.). Google Maps [Computer software]. Retrieved September 30, 2025, from https://maps.google.com
  13. Gregory, I. N., & Geddes, A. (Eds.). (2014). Toward spatial humanities: Historical GIS and spatial history. Indiana University Press. 10.2979/6100.0
  14. Hyvönen, E., & Rantala, H. (2021). Knowledge-based relational search in cultural heritage linked data. Digital Scholarship in the Humanities, 36(Supplement_2), ii155ii164. 10.1093/llc/fqab042
  15. Khatib, R. E., & Schaeben, M. (2020). Why Map Literature? Geospatial Prototyping for Literary Studies and Digital Humanities. Digital Studies/Le Champ Numérique, 10(1). 10.16995/dscn.381
  16. Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., & McClosky, D. (2014). The Stanford CoreNLP Natural Language Processing Toolkit. In K. Bontcheva & J. Zhu (Eds.), Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations (pp. 5560). Vienna: Association for Computational Linguistics. 10.3115/v1/P14-5010
  17. Murrieta-Flores, P., & Martins, B. (2019). The geospatial humanities: Past, present and future. International Journal of Geographical Information Science, 33(12), 24242429. 10.1080/13658816.2019.1645336
  18. Page, B., & Ross, E. (2015). Envisioning the Urban Past: GIS Reconstruction of a Lost Denver District. Frontiers in Digital Humanities, 2. 10.3389/fdigh.2015.00003
  19. Pattuelli, M. C., Weller, C., & Szablya, G. (2011, September). Linked Jazz: An Exploratory Pilot. International Conference on Dublin Core and Metadata Applications 2011 (pp. 158164). The 2011 International Conference on Dublin Core and Metadata Applications, The Hague, The Netherlands.
  20. Pywikibot (Version 1.31). (2003). [Computer software]. Retrieved September 30, 2025. https://github.com/wikimedia/pywikibot
  21. Ratinov, L., Roth, D., Downey, D., & Anderson, M. (2011). Local and Global Algorithms for Disambiguation to Wikipedia. In D. Lin, Y. Matsumoto, & R. Mihalcea (Eds.), Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (pp. 13751384). Vienna: Association for Computational Linguistics. https://aclanthology.org/P11-1138/
  22. Sil, A., & Florian, R. (2016). One for All: Towards Language Independent Named Entity Linking. In K. Erk & N. A. Smith (Eds.), Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 22552264). Vienna: Association for Computational Linguistics. 10.18653/v1/P16-1213
  23. Uhl, J. H., Leyk, S., Chiang, Y.-Y., & Knoblock, C. A. (2022). Towards the automated large-scale reconstruction of past road networks from historical maps. Computers, Environment and Urban Systems, 94, 101794. 10.1016/j.compenvurbsys.2022.101794
  24. Wick, M. (2005). GeoNames [Dataset]. https://www.geonames.org/
  25. World Historical Gazetteer. (2017). [Computer software]. Retrieved September 30, 2025, from https://whgazetteer.org/
DOI: https://doi.org/10.5334/johd.401 | Journal eISSN: 2059-481X
Language: English
Submitted on: Oct 3, 2025
|
Accepted on: Nov 22, 2025
|
Published on: Dec 23, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Annie K. Lamar, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.