Have a personal or library account? Click to login
“aSimMatrix” Dimensions: A Scalable Framework for Benchmarking Intertextual Similarity Cover

“aSimMatrix” Dimensions: A Scalable Framework for Benchmarking Intertextual Similarity

By: Shellie Audsley  
Open Access
|Feb 2026

References

  1. Bakhtin, M. M. (1981). The dialogic imagination: Four essays. (M. Holquist, Ed.; C. Emerson & M. Holquist, Trans.). University of Texas Press.
  2. Barré, J. (2024). Latent Structures of intertextuality in French fiction: How literary recognition and subgenres are framing textuality. arXiv:2410.17759. 10.48550/arXiv.2410.17759
  3. Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., & Specia, L. (2017). SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation. Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), 114. 10.18653/v1/S17-2001
  4. Chaudhary, P., & Dexter, J. (2023). Intertextuality: Computational tools for identifying related passages in large corpora. Quantitative Criticism Lab. https://www.qcrit.org/research/intertextuality (last accessed: 10 November 2025).
  5. Cochran, P. (2009). Byron’s Works. Peter Cochran’s Website – Film Reviews, Poems, Byron… Web. https://petercochran.wordpress.com/byron-2/byrons-works (last accessed: 08 January 2026).
  6. Coffee, N., Koenig, J. P., Poornima, S., Forstall, C., Ossewaarde, R., & Jacobson, S. (2012). The tesserae project: Intertextual analysis of Latin poetry. Literary and Linguistic Computing, 28, 221228. 10.1093/llc/fqs033
  7. Cooney, C., Horton, R., Olsen, M., Roe, G., & Voyer, R. (2008). Hidden Roads and Twisted Paths: Intertextual discovery using clusters, classifications, and similarities. Digital Humanities 2008 Book of Abstracts, 9394. https://openresearch-repository.anu.edu.au/bitstreams/19494939-20e2-43bf-be64-a264d770889a/download (last accessed: 06 January 2026).
  8. Duan, S. (2025). Quantitative intertextuality from the digital humanities perspective: A survey. arXiv:2510.27045. 10.48550/arXiv.2510.27045
  9. Fodor, J., De Deyne, S., & Suzuki, S. (2025). Compositionality and sentence meaning: Comparing semantic parsing and transformers on a challenging sentence similarity dataset. Computational Linguistics, 51(1), 139190. 10.1162/coli_a_00536
  10. Forstall, C. W., & Scheirer, W. J. (2019). Quantitative intertextuality: Analyzing the markers of information reuse. Cham: Springer International Publishing AG. 10.1007/978-3-030-23415-7
  11. Genette, G. (with Prince, G.). (1997). Palimpsests: Literature in the second degree (C. Newman & C. Doubinsky, Trans.). University of Nebraska Press. (Original work published 1982)
  12. Goel, A. (2025). LangExtract (Version 1.1.1) [Computer software]. 10.5281/zenodo.17015089
  13. Goodman, P. (1954). The Structure of Literature. University of Chicago Press.
  14. Guerra, R. (2023). From physics to data science: The beauty and power of cosine similarity. Medium. https://medium.com/@rgalvg/from-physics-to-data-science-the-beauty-and-power-of-cosine-similarity-f23e276afe29 (last accessed: 11 January 2026).
  15. Hinds, S. (1998). Allusion and Intertext: Dynamics of Appropriation in Roman Poetry. Cambridge University Press.
  16. Horton, R., Olsen, M., & Roe, G. (2010). Something borrowed: Sequence alignment and the identification of similar passages in large text collections. Digital Studies/Le champ numérique, 2(1). http://hdl.handle.net/1885/12104
  17. Hume, D. (1739). A Treatise of Human Nature. Hume Texts Online. https://davidhume.org/texts/t/1/1/4 (last accessed: 3 February 2026).
  18. Johnson, N., Bertsch, A., Deal, M-E., & Strubell, E. (2025). FicSim: A Dataset for Multi-Faceted Semantic Similarity in Long-Form Fiction. Findings of the Association for Computational Linguistics: EMNLP 2025, 2522825246. 10.18653/v1/2025.findings-emnlp.1375
  19. Joshi, B., Shah, N., Barbieri, F., & Neves, L. (2020). The Devil is in the Details: Evaluating Limitations of Transformer-based Methods for Granular Tasks. Proceedings of the 28th International Conference on Computational Linguistics, 36523659. 10.18653/v1/2020.coling-main.326
  20. Khandelwal, U., He, H., Qi, P., & Jurafsky, D. (2018). Sharp nearby, fuzzy far away: How neural language models use context. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 284294. 10.18653/v1/P18-1027
  21. Kristeva, J. (1981). Desire in language: A semiotic approach to literature and art (T. Gora, A. Jardine, & L. S. Roudiez, Trans., L. S. Roudiez, Ed.). Basil Blackwell. (Original work published 1977)
  22. Kuznetsov, I., Buchmann, J., Eichler, M., & Gurevych, I. (2022). Revise and Resubmit: An intertextual model of text-based collaboration in peer review. Computational Linguistics; 48(4), 949986. 10.1162/coli_a_00455
  23. Lau, P. K., & McManus S. M, (2024). Mining asymmetric intertextuality. arXiv:2410.15145. 10.48550/arXiv.2410.15145
  24. Losses. SBERT.net. https://sbert.net/docs/package_reference/sentence_transformer/losses.html (last accessed: 11 January 2026).
  25. Luo, M., Kumbhar, S., Shen, M., Parmar, M., Varshney, N., Banerjee, P., Aditya, S., & Baral, C. (2024). Towards LogiGLUE: A brief survey and a benchmark for analyzing logical reasoning capabilities of language models. arXiv:2310.00836v3. 10.48550/arXiv.2310.00836
  26. Mahadevan, A., Mathioudakis, M., Mäkelä, E., & Tolonen, M. (2025). Text reuse in large historical corpora: Insights from the optimization of a data science system. International Journal of Data Science and Analytics, 20(5), 46314643. 10.1007/s41060-025-00742-x
  27. May, P. (2021). Machine translated multilingual STS benchmark dataset. https://github.com/PhilipMay/stsb-multi-mt (last accessed: 09 January 2026).
  28. Miller, H., Kuflik, T., & Lavee, M. (2025). Text Alignment in the Service of Text Reuse Detection. Applied Sciences, 15(6), 3395. 10.3390/app15063395
  29. Peng, B., Narayanan, S., & Papadimitriou, C. (2024). On limitations of the transformer architecture. arXiv:2402.08164v2. 10.48550/arXiv.2402.08164
  30. Press, O., Zhang, M., Min, S., Schmidt, L., Smith, N., & Lewis, M. (2023). Measuring and narrowing the compositionality gap in language models. Findings of the Association for Computational Linguistics: EMNLP 2023, 56875711. 10.18653/v1/2023.findings-emnlp.378
  31. Ramsay, S. (2011). Reading machines: Toward an Algorithmic Criticism. University of Illinois Press. 10.16995/dscn.245
  32. Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 39823992. 10.18653/v1/D19-1410
  33. Roe, G., Gladstone, C., & Morrissey, R. (2016). Discourses and disciplines in the enlightenment: Topic modeling the french encyclopédie. Frontiers in Digital Humanities, 2. 10.3389/fdigh.2015.00008
  34. Romanello, M. (2016). Exploring Citation Networks to Study Intertextuality in Classics. Digital Humanities Quarterly, 10(2).
  35. Semantic Search. SBERT.net. https://www.sbert.net/examples/sentence_transformer/applications/semantic-search/README.html (last accessed: 11 January 2026).
  36. Scheirer, W., Forstall, C., & Coffee, N. (2016). The sense of a connection: Automatic tracing of intertextuality by meaning, Digital Scholarship in the Humanities, 31(1), 204217. 10.1093/llc/fqu058
  37. Schubert, C. (2020). Intertextuality and Digital Humanities. it – Information technology, 62(2), 5359. 10.1515/itit-2019-0036
  38. Smiley, D. M. (2025). Intertextual parallel detection in Biblical Hebrew: A transformer-based benchmark. arXiv:2506.24117. 10.48550/arXiv.2506.24117
  39. Stabler, J., & Hopps, G. (2024). The poems of Lord Byron – Don Juan (Vol. 4 & 5). Routledge. 10.4324/9781003571087
  40. Steyer, K. (2015). Irgendwie hängt alles mit allem zusammen – grenzen und möglichkeiten einer linguistischen kategorie ‘intertextualität’. Textbeziehungen. Linguistische und literaturwissenschaftliche Beiträge zur Intertextualität, 83106.
  41. Sui, P., Rodriguez, J. D., Laban, P., Murphy, J. D., Dexter, J. P., So, R. J., Baker, S., & Chaudhuri, P. (2025). KRISTEVA: Close Reading as a novel task for benchmarking interpretive reasoning. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 3282932849. 10.18653/v1/2025.acl-long.1577
  42. Takahashi, H., Lu, X., Ishijima, S., Seo, D., Kim, T., Park, S., Song, M., Marante, K., Iso, K.,Tokura, H., & Ohman, E. (2024). OZemi at SemEval-2024 Task 1: A Simplistic Approach to Textual Relatedness Evaluation Using Transformers and Machine Translation. Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), 712, 10.18653/v1/2024.semeval-1.2
  43. Trillini, R. H., & Quassdorf, S. (2010). A ‘key to all quotations’? A corpus-based parameter model of intertextuality, Literary and Linguistic Computing, 25(3), 269286. 10.1093/llc/fqq003
  44. Underwood, T. (2019). Distant horizons: Digital evidence and literary change. Chicago: The University of Chicago Press. 10.7208/chicago/9780226612973.001.0001
  45. Xing, Y. (2025). Modelling intertextuality with n-gram embeddings. arXiv:2509.06637. 10.48550/arXiv.2509.06637
DOI: https://doi.org/10.5334/johd.486 | Journal eISSN: 2059-481X
Language: English
Submitted on: Nov 19, 2025
|
Accepted on: Jan 23, 2026
|
Published on: Feb 18, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Shellie Audsley, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.