Skip to main content
Have a personal or library account? Click to login
A corpus-based study of paradigmatic variability of multiword expressions in scientific English from the 17th to the 20th century Cover

A corpus-based study of paradigmatic variability of multiword expressions in scientific English from the 17th to the 20th century

By:  and    
Open Access
|May 2026

References

  1. Alves, Diego. 2025. Diachronic analysis of phrasal verbs in English scientific writing. In Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025), 8–16, 2025, Tallin, Estonia.
  2. Alves, Diego, Stefania Degaetano-Ortlieb, Elena Schmidt & Elke Teich. 2024a. Diachronic analysis of multiword expression functional categories in scientific English. In Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, 81–87. Torino, Italy.
  3. Alves, Diego, Stefan Fischer, Stefania Degaetano-Ortlieb & Elke Teich. 2024b. Multiword expressions in English scientific writing. In Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024), 67–76. St Julian’s, Malta.
  4. Avgustinova, Tania & Leonid Iomdin. 2019. Towards a typology of microsyntactic constructions. In International Conference on Computational and Corpus-Based Phraseology, 15–30. Malaga, Spain.
  5. Bauer, Laurie, Rochelle Lieber & Ingo Plag. 2015. The Oxford reference guide to English morphology. Oxford: Oxford University Press.
  6. Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Edward Finegan. 2000. Longman grammar of spoken and written English. Harlow: Longman.
  7. Bizzoni, Yuri, Stefania Degaetano-Ortlieb, Peter Fankhauser & Elke Teich. 2020. Linguistic variation and change in 250 years of English scientific writing: A data-driven approach. Frontiers in Artificial Intelligence 3. 73.
  8. Bybee, Joan. 2010. Language, usage and cognition. Cambridge: Cambridge University Press.
  9. Claridge, Claudia. 2000. Multi-word verbs in Early Modern English: A corpus-based study. Amsterdam: Rodopi.
  10. Conklin, Kathy & Norbert Schmitt. 2012. The processing of formulaic language. Annual Review of Applied Linguistics 32. 45–61.
  11. Degaetano-Ortlieb, Stefania, Hannah Kermes, Ashraf Khamis & Elke Teich. 2018. An information-theoretic approach to modeling diachronic change in scientific English. In Carla Suhr, Terttu Nevalainen & Irma Taavitsainen (eds.), From data to evidence in English language research, 258–281. Leiden: Brill.
  12. Degaetano-Ortlieb, Stefania & Elke Teich. 2022. Toward an optimal code for communication: The case of scientific English. Corpus Linguistics and Linguistic Theory 18 (1), 175–207.
  13. De Marneffe, Marie-Catherine, Christopher D. Manning, Joakim Nivre & Daniel Zeman. 2021. Universal dependencies. Computational Linguistics 47 (2). 255–308.
  14. De Smet, Hendrik. 2016. How gradual change progresses: The interaction between convention and innovation. Language Variation and Change 28 (1). 83–102.
  15. Dubossarsky, Haim, Daphna Weinshall & Eitan Grossman. 2017. Outta control: Laws of semantic change and inherent biases in word representation models. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1136–1145. Copenhagen, Denmark.
  16. Fischer, Stefan, Jörg Knappen, Katrin Menzel & Elke Teich. 2020. The Royal Society Corpus 6.0: Providing 300+ years of scientific writing for humanistic study. In Proceedings of the Twelfth Language Resources and Evaluation Conference (LREC), 794–802. Marseille, France.
  17. Halliday, Michael Alexander Kirkwood & James R. Martin. 2003. Writing science: Literacy and discursive power. London: Routledge.
  18. Hamilton, William L., Jure Leskovec & Dan Jurafsky. 2016. Cultural shift or linguistic drift? Comparing two computational measures of semantic change. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 595–605. Austin, Texas, USA.
  19. Harris, Zellig S. 2002. The structure of science information. Journal of Biomedical Informatics 35 (4). 215–221.
  20. Ling, Wang, Chris Dyer, Alan W. Black & Isabel Trancoso. 2015. Two/too simple adaptations of word2vec for syntax problems. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1299–1304. Denver, Colorado, USA.
  21. Plag, Ingo. 2018. Word-formation in English. Cambridge: Cambridge University Press.
  22. Qi, Peng, Yuhao Zhang, Yuhui Zhang, Jason Bolton & Christopher D. Manning. 2020. Stanza: A Python natural language processing toolkit for many human languages. arXiv preprint arXiv:2003.07082.
  23. Santorini, Beatrice. 1990. Part-of-speech tagging guidelines for the Penn Treebank Project (3rd revision). University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-90-47.
  24. Savary, Agata, Carlos Ramisch, Silvio Cordeiro, Francesca Sangati, Veronika Vincze, Behrang Qasemi Zadeh & Alexandre Doucet. 2017. The PARSEME shared task on automatic identification of verbal multiword expressions. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), 31–47. Valencia, Spain.
  25. Schmid, Hans-Jörg. 2015. A blueprint of the Entrenchment-and-Conventionalization Model. Yearbook of the German Cognitive Linguistics Association 3 (1). 3–26.
  26. Schmid, Helmut. 2013. Probabilistic part-of-speech tagging using decision trees. In D.B. Jones & H. Somers (eds.), New methods in language processing, 154–164. London: Routledge.
  27. Schneider, Gerold, Mennatallah El-Assady & Hans Martin Lehmann. 2017. Tools and methods for processing and visualizing large corpora. Studies in Variation, Contacts and Change in English 19. Helsinki: VARIENG, University of Helsinki. http://www.helsinki.fi/varieng/series/volumes/19/schneider_el-assady_lehmann/
  28. Shannon, Claude E. 1948. A mathematical theory of communication. The Bell System Technical Journal 27 (3), 379–423.
  29. Simpson-Vlach, Rita & Ellis, Nick C. 2010. An academic formulas list: New methods in phraseology research. Applied Linguistics 31 (4). 487–512.
  30. Teich, Elke, Peter Fankhauser, Stefania Degaetano-Ortlieb & Yuri Bizzoni. 2021. Less is more/more diverse: On the communicative utility of linguistic conventionalization. Frontiers in Communication 5. 620275.
  31. Tourtouri, Elli N., Francesca Delogu, Les Sikos & Matthew W. Crocker. 2019. Rational over-specification in visually-situated comprehension and production. Journal of Cultural Cognitive Science 3 (2). 175–202.
  32. Universal Dependencies. 2024a. compound: compound. (https://universaldependencies.org/en/dep/compound.html) (Accessed 2026‑02‑12.)
  33. Universal Dependencies. 2024b. compound:prt: Phrasal verb particle. (https://universaldependencies.org/en/dep/compound-prt.html) (Accessed 2026-02-12.)
  34. Universal Dependencies. 2024c. fixed: fixed multiword expression. (https://universaldependencies.org/en/dep/fixed.html) (Accessed 2026-02-12.)
  35. Universal Dependencies. 2024d. flat: flat expression. (https://universaldependencies.org/u/dep/flat) (Accessed 2026-02-12.)
  36. Venhuizen, Noortje J., Matthew W. Crocker & Harm Brouwer. 2019. Semantic entropy in language comprehension. Entropy 21 (12). 1159.
DOI: https://doi.org/10.2478/icame-2026-0001 | Journal eISSN: 1502-5462 | Journal ISSN: 0801-5775
Language: English
Page range: 1 - 17
Submitted on: Dec 1, 2025
Accepted on: Feb 24, 2026
Published on: May 27, 2026
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Diego Alves, Elke Teich, published by The International Computer Archive of Modern and Medieval English
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.