Table 1
Sample text from the dataset.
| Field | Value |
|---|---|
| Text ID | B1111259224 |
| Title | Изучение трематодофауны инвазивных видов моллюсков на территории Беларуси |
| Body | В результате проведенных исследований и анализа литературных данных в водоемах Беларуси у инвазивных видов моллюсков выявлено как минимум 14 представителей класса Trematoda: y D. polymorpha _- Phyllodistomum folium… |
| Keywords | Республика Беларусь\видовое разнообразие\диагностика\пресноводные моллюски\таксономия\трематоды\фауна\хозяева |
| Codes of thematic departments | e3\e4 |
| Codes of abstract journals | 04AND9\07Д |
| SRSTI | 341.33.23.17.11.09\391.19.25.31 |

Figure 1
Dependence of number of topics of three rubricators on minimum number of texts.
Table 2
Results of selection of method for extraction of features for vectors with 50 elements.
| Classification model | Averaging | Average | Maximum | Sum | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Precision | Recall | F-score | Accuracy | Precision | Recall | F-score | Accuracy | Precision | Recall | F-score | ||
| LR | Micro | 0.94 | 0.75 | 0.54 | 0.63 | 0.92 | 0.60 | 0.40 | 0.48 | 0.94 | 0.77 | 0.53 | 0.62 |
| Macro | 0.94 | 0.67 | 0.51 | 0.55 | 0.92 | 0.54 | 0.38 | 0.43 | 0.94 | 0.72 | 0.51 | 0.58 | |
| RF | Micro | 0.94 | 0.75 | 0.53 | 0.62 | 0.92 | 0.60 | 0.39 | 0.48 | 0.93 | 0.72 | 0.50 | 0.59 |
| Macro | 0.94 | 0.75 | 0.43 | 0.51 | 0.92 | 0.60 | 0.27 | 0.29 | 0.93 | 0.73 | 0.39 | 0.45 | |
| ANN1 | Micro | 0.94 | 0.79 | 0.53 | 0.63 | 0.92 | 0.63 | 0.44 | 0.52 | 0.94 | 0.80 | 0.57 | 0.67 |
| Macro | 0.94 | 0.79 | 0.44 | 0.52 | 0.92 | 0.57 | 0.35 | 0.41 | 0.94 | 0.78 | 0.51 | 0.60 | |
| ANN2 | Micro | 0.94 | 0.78 | 0.54 | 0.64 | 0.92 | 0.62 | 0.43 | 0.51 | 0.94 | 0.80 | 0.56 | 0.66 |
| Macro | 0.94 | 0.77 | 0.46 | 0.54 | 0.92 | 0.59 | 0.33 | 0.38 | 0.94 | 0.78 | 0.51 | 0.60 | |
Table 3
Results of selection of method for extraction of features for vectors with 100 elements.
| Classification model | Averaging | Average | Maximum | Sum | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Precision | Recall | F-score | Accuracy | Precision | Recall | F-score | Accuracy | Precision | Recall | F-score | ||
| LR | Micro | 0.94 | 0.53 | 0.57 | 0.55 | 0.93 | 0.50 | 0.50 | 0.50 | 0.94 | 0.62 | 0.62 | 0.62 |
| Macro | 0.94 | 0.53 | 0.57 | 0.55 | 0.93 | 0.50 | 0.50 | 0.50 | 0.94 | 0.62 | 0.62 | 0.62 | |
| RF | Micro | 0.94 | 0.47 | 0.54 | 0.49 | 0.93 | 0.44 | 0.47 | 0.44 | 0.94 | 0.56 | 0.59 | 0.57 |
| Macro | 0.94 | 0.53 | 0.49 | 0.51 | 0.92 | 0.47 | 0.47 | 0.47 | 0.94 | 0.56 | 0.56 | 0.56 | |
| ANN1 | Micro | 0.94 | 0.47 | 0.39 | 0.41 | 0.92 | 0.42 | 0.35 | 0.36 | 0.94 | 0.52 | 0.46 | 0.47 |
| Macro | 0.95 | 0.57 | 0.55 | 0.56 | 0.93 | 0.53 | 0.53 | 0.53 | 0.95 | 0.64 | 0.64 | 0.64 | |
| ANN 2 | Micro | 0.95 | 0.52 | 0.42 | 0.42 | 0.93 | 0.49 | 0.42 | 0.43 | 0.95 | 0.60 | 0.56 | 0.57 |
| Macro | 0.95 | 0.57 | 0.55 | 0.56 | 0.93 | 0.53 | 0.53 | 0.53 | 0.95 | 0.64 | 0.64 | 0.64 | |
Table 4
Results of selection of method for extraction of features for vectors with 500 elements.
| Classification model | Averaging | Average | Maximum | Sum | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Precision | Recall | F-score | Accuracy | Precision | Recall | F-score | Accuracy | Precision | Recall | F-score | ||
| LR | Micro | 0.95 | 0.64 | 0.64 | 0.64 | 0.94 | 0.58 | 0.58 | 0.58 | 0.95 | 0.64 | 0.64 | 0.64 |
| Macro | 0.95 | 0.58 | 0.61 | 0.59 | 0.94 | 0.52 | 0.55 | 0.52 | 0.95 | 0.59 | 0.62 | 0.60 | |
| RF | Micro | 0.94 | 0.57 | 0.57 | 0.57 | 0.93 | 0.49 | 0.49 | 0.49 | 0.94 | 0.56 | 0.56 | 0.56 |
| Macro | 0.94 | 0.53 | 0.47 | 0.48 | 0.93 | 0.44 | 0.38 | 0.38 | 0.94 | 0.53 | 0.46 | 0.48 | |
| ANN1 | Micro | 0.95 | 0.62 | 0.62 | 0.62 | 0.94 | 0.57 | 0.57 | 0.57 | 0.95 | 0.64 | 0.64 | 0.64 |
| Macro | 0.95 | 0.59 | 0.52 | 0.53 | 0.94 | 0.57 | 0.45 | 0.46 | 0.95 | 0.61 | 0.56 | 0.57 | |
| ANN 2 | Micro | 0.95 | 0.62 | 0.62 | 0.62 | 0.94 | 0.58 | 0.58 | 0.58 | 0.95 | 0.64 | 0.64 | 0.64 |
| Macro | 0.95 | 0.59 | 0.52 | 0.53 | 0.94 | 0.55 | 0.47 | 0.47 | 0.95 | 0.61 | 0.56 | 0.57 | |
Table 5
Results of testing classifiers for codes of thematic departments.
| Classifier | Averaging | Accuracy | Precision | Recall | F-score | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 response | 2 responses | 3 responses | 1 response | 2 responses | 3 responses | 1 response | 2 responses | 3 responses | 1 response | 2 responses | 3 responses | ||
| LR | Micro | 0.94 | 0.93 | 0.90 | 0.77 | 0.59 | 0.49 | 0.53 | 0.71 | 0.80 | 0.62 | 0.64 | 0.60 |
| Macro | 0.94 | 0.93 | 0.90 | 0.72 | 0.55 | 0.45 | 0.51 | 0.70 | 0.79 | 0.58 | 0.60 | 0.56 | |
| RF | Micro | 0.93 | 0.92 | 0.88 | 0.72 | 0.55 | 0.43 | 0.50 | 0.67 | 0.78 | 0.59 | 0.60 | 0.55 |
| Macro | 0.93 | 0.92 | 0.88 | 0.73 | 0.55 | 0.42 | 0.39 | 0.57 | 0.70 | 0.50 | 0.53 | 0.51 | |
| ANN1 | Micro | 0.94 | 0.93 | 0.90 | 0.80 | 0.60 | 0.47 | 0.57 | 0.76 | 0.85 | 0.67 | 0.67 | 0.60 |
| Macro | 0.94 | 0.93 | 0.90 | 0.78 | 0.58 | 0.45 | 0.51 | 0.71 | 0.82 | 0.60 | 0.63 | 0.57 | |
| ANN 2 | Micro | 0.94 | 0.93 | 0.90 | 0.80 | 0.61 | 0.48 | 0.56 | 0.75 | 0.85 | 0.66 | 0.68 | 0.61 |
| Macro | 0.94 | 0.93 | 0.90 | 0.78 | 0.59 | 0.46 | 0.51 | 0.71 | 0.82 | 0.60 | 0.64 | 0.58 | |
| SVM | Micro | 0.95 | 0.94 | 0.91 | 0.82 | 0.64 | 0.50 | 0.59 | 0.77 | 0.87 | 0.69 | 0.70 | 0.64 |
| Macro | 0.95 | 0.94 | 0.91 | 0.80 | 0.61 | 0.48 | 0.55 | 0.74 | 0.85 | 0.65 | 0.67 | 0.61 | |
| LSTM | Micro | 0.95 | 0.91 | 0.86 | 0.80 | 0.52 | 0.39 | 0.60 | 0.78 | 0.87 | 0.68 | 0.63 | 0.54 |
| Macro | 0.95 | 0.91 | 0.86 | 0.77 | 0.49 | 0.37 | 0.54 | 0.75 | 0.85 | 0.63 | 0.59 | 0.50 | |
Table 6
Results of testing classifiers for codes of abstract journals.
| Classifier | Averaging | Accuracy | Precision | Recall | F-score | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 response | 2 responses | 3 responses | 1 response | 2 responses | 3 responses | 1 response | 2 responses | 3 responses | 1 response | 2 responses | 3 responses | ||
| LR | Micro | 0.99 | 0.99 | 0.98 | 0.49 | 0.36 | 0.29 | 0.33 | 0.49 | 0.59 | 0.39 | 0.42 | 0.39 |
| Macro | 0.99 | 0.99 | 0.98 | 0.46 | 0.36 | 0.29 | 0.35 | 0.51 | 0.60 | 0.37 | 0.40 | 0.37 | |
| RF | Micro | 0.99 | 0.99 | 0.98 | 0.45 | 0.35 | 0.29 | 0.23 | 0.39 | 0.49 | 0.23 | 0.36 | 0.37 |
| Macro | 0.99 | 0.99 | 0.98 | 0.36 | 0.31 | 0.26 | 0.20 | 0.33 | 0.42 | 0.31 | 0.30 | 0.30 | |
| ANN1 | Micro | 0.99 | 0.99 | 0.98 | 0.47 | 0.37 | 0.30 | 0.24 | 0.40 | 0.51 | 0.32 | 0.39 | 0.38 |
| Macro | 0.99 | 0.99 | 0.98 | 0.41 | 0.34 | 0.28 | 0.20 | 0.34 | 0.43 | 0.23 | 0.31 | 0.32 | |
| ANN 2 | Micro | 0.99 | 0.99 | 0.98 | 0.46 | 0.36 | 0.30 | 0.25 | 0.41 | 0.52 | 0.32 | 0.39 | 0.38 |
| Macro | 0.99 | 0.99 | 0.98 | 0.40 | 0.33 | 0.28 | 0.22 | 0.35 | 0.45 | 0.25 | 0.32 | 0.32 | |
| SVM | Micro | 0.99 | 0.99 | 0.99 | 0.61 | 0.48 | 0.38 | 0.36 | 0.54 | 0.65 | 0.45 | 0.51 | 0.48 |
| Macro | 0.99 | 0.99 | 0.99 | 0.54 | 0.44 | 0.36 | 0.33 | 0.50 | 0.60 | 0.40 | 0.46 | 0.44 | |
| LSTM | Micro | 0.99 | 0.98 | 0.98 | 0.49 | 0.36 | 0.28 | 0.33 | 0.50 | 0.59 | 0.39 | 0.42 | 0.38 |
| Macro | 0.99 | 0.98 | 0.98 | 0.47 | 0.36 | 0.28 | 0.34 | 0.49 | 0.59 | 0.37 | 0.40 | 0.37 | |
Table 7
Results of testing classifiers for SRSTI codes.
| Classifier | Averaging | Accuracy | Precision | Recall | F-score | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 response | 2 responses | 3 responses | 1 response | 2 responses | 3 responses | 1 response | 2 responses | 3 responses | 1 response | 2 responses | 3 responses | ||
| LR | Micro | 0.99 | 0.99 | 0.99 | 0.41 | 0.31 | 0.25 | 0.28 | 0.43 | 0.53 | 0.33 | 0.36 | 0.34 |
| Macro | 0.99 | 0.99 | 0.99 | 0.29 | 0.23 | 0.19 | 0.29 | 0.42 | 0.50 | 0.24 | 0.25 | 0.24 | |
| RF | Micro | 0.99 | 0.99 | 0.99 | 0.43 | 0.32 | 0.26 | 0.19 | 0.34 | 0.44 | 0.27 | 0.33 | 0.32 |
| Macro | 0.99 | 0.99 | 0.99 | 0.15 | 0.16 | 0.14 | 0.06 | 0.12 | 0.17 | 0.06 | 0.11 | 0.13 | |
| ANN1 | Micro | 0.99 | 0.99 | 0.99 | 0.46 | 0.35 | 0.28 | 0.25 | 0.40 | 0.49 | 0.32 | 0.38 | 0.36 |
| Macro | 0.99 | 0.99 | 0.99 | 0.16 | 0.14 | 0.13 | 0.08 | 0.14 | 0.18 | 0.09 | 0.12 | 0.13 | |
| ANN 2 | Micro | 0.99 | 0.99 | 0.99 | 0.46 | 0.36 | 0.29 | 0.27 | 0.42 | 0.52 | 0.34 | 0.38 | 0.37 |
| Macro | 0.99 | 0.99 | 0.99 | 0.19 | 0.18 | 0.16 | 0.09 | 0.16 | 0.22 | 0.11 | 0.15 | 0.16 | |
| SVM | Micro | 0.99 | 0.99 | 0.99 | 0.62 | 0.46 | 0.36 | 0.37 | 0.55 | 0.65 | 0.46 | 0.51 | 0.47 |
| Macro | 0.99 | 0.99 | 0.99 | 0.41 | 0.35 | 0.28 | 0.20 | 0.33 | 0.42 | 0.25 | 0.32 | 0.32 | |
| LSTM | Micro | 0.99 | 0.99 | 0.99 | 0.45 | 0.33 | 0.25 | 0.31 | 0.46 | 0.54 | 0.37 | 0.38 | 0.35 |
| Macro | 0.99 | 0.99 | 0.99 | 0.15 | 0.11 | 0.08 | 0.11 | 0.16 | 0.20 | 0.11 | 0.13 | 0.11 | |
