Have a personal or library account? Click to login
Corpus of Slovak Legislative Documents Cover
Open Access
|Mar 2023

Abstract

The article describes the construction of the corpus of Slovak legislative documents. By analyzing several statistical values of the source metadata and documents, we efficiently improve corpus quality. We describe the methods used to clean up small variations in metadata, length based discrimination of document and examine the effectiveness of several strategies of deduplication. The corpus is a part of a comparable corpus of legislative documents of seven languages, created in the Multilingual Resources for CEF.AT in the Legal Domain (MARCELL) project.

DOI: https://doi.org/10.2478/jazcas-2023-0004 | Journal eISSN: 1338-4287 | Journal ISSN: 0021-5597
Language: English
Page range: 175 - 189
Published on: Mar 27, 2023
Published by: Slovak Academy of Sciences, Mathematical Institute
In partnership with: Paradigm Publishing Services
Publication frequency: 2 issues per year

© 2023 Radovan Garabík, published by Slovak Academy of Sciences, Mathematical Institute
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.