Have a personal or library account? Click to login
Application of Natural Language Processing Algorithms to the Task of Automatic Classification of Russian Scientific Texts Cover

Application of Natural Language Processing Algorithms to the Task of Automatic Classification of Russian Scientific Texts

Open Access
|Aug 2019

Figures & Tables

Table 1

Sample text from the dataset.

FieldValue
Text IDB1111259224
TitleИзучение трематодофауны инвазивных видов моллюсков на территории Беларуси
BodyВ результате проведенных исследований и анализа литературных данных в водоемах Беларуси у инвазивных видов моллюсков выявлено как минимум 14 представителей класса Trematoda: y D. polymorpha _- Phyllodistomum folium…
KeywordsРеспублика Беларусь\видовое разнообразие\диагностика\пресноводные моллюски\таксономия\трематоды\фауна\хозяева
Codes of thematic departmentse3\e4
Codes of abstract journals04AND9\07Д
SRSTI341.33.23.17.11.09\391.19.25.31
dsj-18-927-g1.png
Figure 1

Dependence of number of topics of three rubricators on minimum number of texts.

Table 2

Results of selection of method for extraction of features for vectors with 50 elements.

Classification modelAveragingAverageMaximumSum
AccuracyPrecisionRecallF-scoreAccuracyPrecisionRecallF-scoreAccuracyPrecisionRecallF-score
LRMicro0.940.750.540.630.920.600.400.480.940.770.530.62
Macro0.940.670.510.550.920.540.380.430.940.720.510.58
RFMicro0.940.750.530.620.920.600.390.480.930.720.500.59
Macro0.940.750.430.510.920.600.270.290.930.730.390.45
ANN1Micro0.940.790.530.630.920.630.440.520.940.800.570.67
Macro0.940.790.440.520.920.570.350.410.940.780.510.60
ANN2Micro0.940.780.540.640.920.620.430.510.940.800.560.66
Macro0.940.770.460.540.920.590.330.380.940.780.510.60
Table 3

Results of selection of method for extraction of features for vectors with 100 elements.

Classification modelAveragingAverageMaximumSum
AccuracyPrecisionRecallF-scoreAccuracyPrecisionRecallF-scoreAccuracyPrecisionRecallF-score
LRMicro0.940.530.570.550.930.500.500.500.940.620.620.62
Macro0.940.530.570.550.930.500.500.500.940.620.620.62
RFMicro0.940.470.540.490.930.440.470.440.940.560.590.57
Macro0.940.530.490.510.920.470.470.470.940.560.560.56
ANN1Micro0.940.470.390.410.920.420.350.360.940.520.460.47
Macro0.950.570.550.560.930.530.530.530.950.640.640.64
ANN 2Micro0.950.520.420.420.930.490.420.430.950.600.560.57
Macro0.950.570.550.560.930.530.530.530.950.640.640.64
Table 4

Results of selection of method for extraction of features for vectors with 500 elements.

Classification modelAveragingAverageMaximumSum
AccuracyPrecisionRecallF-scoreAccuracyPrecisionRecallF-scoreAccuracyPrecisionRecallF-score
LRMicro0.950.640.640.640.940.580.580.580.950.640.640.64
Macro0.950.580.610.590.940.520.550.520.950.590.620.60
RFMicro0.940.570.570.570.930.490.490.490.940.560.560.56
Macro0.940.530.470.480.930.440.380.380.940.530.460.48
ANN1Micro0.950.620.620.620.940.570.570.570.950.640.640.64
Macro0.950.590.520.530.940.570.450.460.950.610.560.57
ANN 2Micro0.950.620.620.620.940.580.580.580.950.640.640.64
Macro0.950.590.520.530.940.550.470.470.950.610.560.57
Table 5

Results of testing classifiers for codes of thematic departments.

ClassifierAveragingAccuracyPrecisionRecallF-score
1
response
2
responses
3
responses
1
response
2
responses
3
responses
1
response
2
responses
3
responses
1
response
2
responses
3
responses
LRMicro0.940.930.900.770.590.490.530.710.800.620.640.60
Macro0.940.930.900.720.550.450.510.700.790.580.600.56
RFMicro0.930.920.880.720.550.430.500.670.780.590.600.55
Macro0.930.920.880.730.550.420.390.570.700.500.530.51
ANN1Micro0.940.930.900.800.600.470.570.760.850.670.670.60
Macro0.940.930.900.780.580.450.510.710.820.600.630.57
ANN 2Micro0.940.930.900.800.610.480.560.750.850.660.680.61
Macro0.940.930.900.780.590.460.510.710.820.600.640.58
SVMMicro0.950.940.910.820.640.500.590.770.870.690.700.64
Macro0.950.940.910.800.610.480.550.740.850.650.670.61
LSTMMicro0.950.910.860.800.520.390.600.780.870.680.630.54
Macro0.950.910.860.770.490.370.540.750.850.630.590.50
Table 6

Results of testing classifiers for codes of abstract journals.

ClassifierAveragingAccuracyPrecisionRecallF-score
1
response
2
responses
3
responses
1
response
2
responses
3
responses
1
response
2
responses
3
responses
1
response
2
responses
3
responses
LRMicro0.990.990.980.490.360.290.330.490.590.390.420.39
Macro0.990.990.980.460.360.290.350.510.600.370.400.37
RFMicro0.990.990.980.450.350.290.230.390.490.230.360.37
Macro0.990.990.980.360.310.260.200.330.420.310.300.30
ANN1Micro0.990.990.980.470.370.300.240.400.510.320.390.38
Macro0.990.990.980.410.340.280.200.340.430.230.310.32
ANN 2Micro0.990.990.980.460.360.300.250.410.520.320.390.38
Macro0.990.990.980.400.330.280.220.350.450.250.320.32
SVMMicro0.990.990.990.610.480.380.360.540.650.450.510.48
Macro0.990.990.990.540.440.360.330.500.600.400.460.44
LSTMMicro0.990.980.980.490.360.280.330.500.590.390.420.38
Macro0.990.980.980.470.360.280.340.490.590.370.400.37
Table 7

Results of testing classifiers for SRSTI codes.

ClassifierAveragingAccuracyPrecisionRecallF-score
1
response
2
responses
3
responses
1
response
2
responses
3
responses
1
response
2
responses
3
responses
1
response
2
responses
3
responses
LRMicro0.990.990.990.410.310.250.280.430.530.330.360.34
Macro0.990.990.990.290.230.190.290.420.500.240.250.24
RFMicro0.990.990.990.430.320.260.190.340.440.270.330.32
Macro0.990.990.990.150.160.140.060.120.170.060.110.13
ANN1Micro0.990.990.990.460.350.280.250.400.490.320.380.36
Macro0.990.990.990.160.140.130.080.140.180.090.120.13
ANN 2Micro0.990.990.990.460.360.290.270.420.520.340.380.37
Macro0.990.990.990.190.180.160.090.160.220.110.150.16
SVMMicro0.990.990.990.620.460.360.370.550.650.460.510.47
Macro0.990.990.990.410.350.280.200.330.420.250.320.32
LSTMMicro0.990.990.990.450.330.250.310.460.540.370.380.35
Macro0.990.990.990.150.110.080.110.160.200.110.130.11
Language: English
Submitted on: Jan 14, 2019
|
Accepted on: Jul 23, 2019
|
Published on: Aug 12, 2019
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2019 Aleksandr Romanov, Konstantin Lomotin, Ekaterina Kozlova, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.