A corpus-based study of paradigmatic variability of multiword expressions in scientific English from the 17th to the 20th century
Abstract
This study extends a paradigm-based framework for lexical conventionalization to multiword expressions (MWEs) in diachronic scientific English using the Royal Society Corpus (1665–1996). Building on prior work that quantified paradigmatic variability, the set of contextually available lexical alternatives, via distributional entropy, we model MWEs as single lexical units in distributional vector spaces to examine their role in long-term language standardization. Our aim is to verify whether MWEs undergo the same conventionalization process as the one previously observed for most word classes. Our results confirm that the overall decline in paradigmatic variability, indicating increasing conventionalization, continues through the 20th century. Different MWE classes follow distinct trajectories: compounds and phrasal verbs show substantial conventionalization in the 18th–19th and 19th–20th centuries, converging with or dropping below the variability of single nouns and verbs. In contrast, fixed expressions and academic formulas become conventionalized early and remain stably low thereafter. Flat expressions remain more variable, patterning with general nouns. These findings demonstrate that MWEs play an active role in the register-specific development of scientific English, with different types contributing differently to the reduction of paradigmatic choice.
© 2026 Diego Alves, Elke Teich, published by The International Computer Archive of Modern and Medieval English
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.