Have a personal or library account? Click to login
Phrasemes and Collocations in the Corpus – How to Find Unknown Variants Cover

Phrasemes and Collocations in the Corpus – How to Find Unknown Variants

Open Access
|Nov 2025

Abstract

This paper addresses the identification and annotation of multiword expressions (MWEs) in Czech corpora, focusing on enhancing the search procedure through transformations of existing lexicon entries and the addition of new entries based on syntactic patterns. We discuss the limitations of current annotation systems and introduce a new, efficient annotation system that leverages a comprehensive MWE dictionary. Our methodology includes the use of syntactic patterns to identify new collocations, automatic transformations of known MWEs, and manual searches for creatively varied expressions. The results demonstrate significant improvements in the success rate of corpus annotation, with newly identified collocations and transformed MWEs contributing to a richer and more accurate linguistic resource.

DOI: https://doi.org/10.2478/jazcas-2025-0019 | Journal eISSN: 1338-4287 | Journal ISSN: 0021-5597
Language: English
Page range: 212 - 222
Published on: Nov 27, 2025
Published by: Slovak Academy of Sciences, Mathematical Institute
In partnership with: Paradigm Publishing Services
Publication frequency: 2 issues per year

© 2025 Hana Skoumalová, Přemysl Vítovec, Milena Hnátková, published by Slovak Academy of Sciences, Mathematical Institute
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.