Have a personal or library account? Click to login
Microsyntactic Annotation of Corpora and its Use in Computational Linguistics Tasks Cover

Microsyntactic Annotation of Corpora and its Use in Computational Linguistics Tasks

By: Leonid Iomdin  
Open Access
|Jan 2018

Abstract

Microsyntax is a linguistic discipline dealing with idiomatic elements whose important properties are strongly related to syntax. In a way, these elements may be viewed as transitional entities between the lexicon and the grammar, which explains why they are often underrepresented in both of these resource types: the lexicographer fails to see such elements as full-fledged lexical units, while the grammarian finds them too specific to justify the creation of individual well-developed rules. As a result, such elements are poorly covered by linguistic models used in advanced modern computational linguistic tasks like high-quality machine translation or deep semantic analysis. A possible way to mend the situation and improve the coverage and adequate treatment of microsyntactic units in linguistic resources is to develop corpora with microsyntactic annotation, closely linked to specially designed lexicons. The paper shows how this task is solved in the deeply annotated corpus of Russian, SynTagRus.

DOI: https://doi.org/10.1515/jazcas-2017-0027 | Journal eISSN: 1338-4287 | Journal ISSN: 0021-5597
Language: English
Page range: 169 - 178
Published on: Jan 24, 2018
In partnership with: Paradigm Publishing Services
Publication frequency: 2 issues per year

© 2018 Leonid Iomdin, published by Slovak Academy of Sciences, Ľudovít Štúr Institute of Linguistics
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.