Have a personal or library account? Click to login

Microsyntactic Annotation of Corpora and its Use in Computational Linguistics Tasks

Open Access
|Jan 2018

Abstract

Microsyntax is a linguistic discipline dealing with idiomatic elements whose important properties are strongly related to syntax. In a way, these elements may be viewed as transitional entities between the lexicon and the grammar, which explains why they are often underrepresented in both of these resource types: the lexicographer fails to see such elements as full-fledged lexical units, while the grammarian finds them too specific to justify the creation of individual well-developed rules. As a result, such elements are poorly covered by linguistic models used in advanced modern computational linguistic tasks like high-quality machine translation or deep semantic analysis. A possible way to mend the situation and improve the coverage and adequate treatment of microsyntactic units in linguistic resources is to develop corpora with microsyntactic annotation, closely linked to specially designed lexicons. The paper shows how this task is solved in the deeply annotated corpus of Russian, SynTagRus.

DOI: https://doi.org/10.1515/jazcas-2017-0027 | Journal eISSN: 1338-4287 | Journal ISSN: 0021-5597
Language: English
Page range: 169 - 178
Published on: Jan 24, 2018
Published by: Slovak Academy of Sciences, Mathematical Institute
In partnership with: Paradigm Publishing Services
Publication frequency: 2 issues per year

© 2018 Leonid Iomdin, published by Slovak Academy of Sciences, Mathematical Institute
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.