Have a personal or library account? Click to login
Levels of Annotation in the Slovene Training Corpus ssj500k 2.2 Cover

Levels of Annotation in the Slovene Training Corpus ssj500k 2.2

By: Mija Bon and  Polona Gantar  
Open Access
|Dec 2019

Abstract

This paper presents the Slovene Training Corpus ssj500k 2.2, which has been annotated on the levels of tokenization, sentence segmentation, part-of-speech tagging, lemmatization, syntactic dependencies, named entities, verbal multi-word expressions, and semantic role labeling. It describes the individual layers of annotation and shows the scope of using the training corpus in the production of various lexicons, such as the lexicon of multi-word units and the valency lexicon of modern Slovene. It concludes by presenting our future work, i.e. the annotation of multi-word expressions based on the Slovene Lexical Database.

DOI: https://doi.org/10.2478/jazcas-2019-0068 | Journal eISSN: 1338-4287 | Journal ISSN: 0021-5597
Language: English
Page range: 390 - 399
Published on: Dec 21, 2019
Published by: Slovak Academy of Sciences, Ľudovít Štúr Institute of Linguistics
In partnership with: Paradigm Publishing Services
Publication frequency: 2 issues per year

© 2019 Mija Bon, Polona Gantar, published by Slovak Academy of Sciences, Ľudovít Štúr Institute of Linguistics
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.