Have a personal or library account? Click to login
The Intercorp Parallel Corpus with a Uniform Annotation for All Languages Cover

The Intercorp Parallel Corpus with a Uniform Annotation for All Languages

By: Alexandr Rosen  
Open Access
|Dec 2023

Abstract

Recently, the language-specific morphosyntactic annotation of InterCorp, a large multilingual parallel corpus, has been replaced by the language-uniform morphosyntactic and syntactic annotation following the guidelines of the Universal Dependencies project. Because the corpus is used predominantly by human users via a token-based concordancer, the CONLL-U format produced by the UDP ipe parser has been extended by attributes such as lemma of the token’s syntactic head or morphosyntactic categories of the content verb’s auxiliary. We conclude that despite some theoretical and practical issues, the new annotation is a promising solution to the issue of mutually incompatible tagsets within a single corpus.

DOI: https://doi.org/10.2478/jazcas-2023-0043 | Journal eISSN: 1338-4287 | Journal ISSN: 0021-5597
Language: English
Page range: 254 - 265
Published on: Dec 25, 2023
Published by: Slovak Academy of Sciences, Ľudovít Štúr Institute of Linguistics
In partnership with: Paradigm Publishing Services
Publication frequency: 2 issues per year

© 2023 Alexandr Rosen, published by Slovak Academy of Sciences, Ľudovít Štúr Institute of Linguistics
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.