Have a personal or library account? Click to login
Old Catalan Morphosyntax: Developing an Annotated Corpus Cover

Old Catalan Morphosyntax: Developing an Annotated Corpus

Open Access
|Dec 2021

Figures & Tables

Table 1

Incremental build-up and time investment.

TRAINING ROUNDHOURS INVESTEDWORDS/HOURWORD/HOUR INCREASE
0–4,50032140,60%
4,500–10,00032312,5122,26%
10,000–20,00018555,6295,16%
20,000–40,00023.5851,6505.59%
40,000–60,000121666.71085.42%
Table 2

TARGER results.

TOKENSGLOBAL ACCURACY
10,00085.5
60,00091.4
Table 3

MBT results per training round.

TOKENSGLOBAL ACCURACYKNOWN WORDSUNKNOWN WORDS
4,50086.194.646.8
10,00091.396.456.4
20,00093.897.658.4
40,00094.797.264.3
60,00095.897.668.1
Table 4

MBT results for highest and lowest frequency unseen tokens.

POS TAGPRECISIONRECALLF-SCOREN
N0.740.750.74856
NPR0.90.940.92585
VNI0.660.670.67219
VBDI^3^PL0.730.750.74188
VBDI^3^SG0.650.680.66173
PRO^3^PL0001
PRO^A^3^SG0001
PRO^D^2^PL0001
PRO^D^3^SG0000
OLB0000
Table 5

MBT results for highest and lowest frequency tokens.

POS TAGPRECISIONRECALLF-SCOREN
P1116191
N0.980.980.986185
CONJ1114919
C1114139
COMMA1113910
PRO^RFL^2^SG1112
ADJ^POS0002
VBI^2^SG0002
VBDI^2^PL0001
0000
DOI: https://doi.org/10.5334/johd.54 | Journal eISSN: 2059-481X
Language: English
Published on: Dec 21, 2021
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2021 Marieke Meelen, Afra Pujol i Campeny, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.