Figure 1

Figure 2
![Matthews correlation coefficient (MCC) for categorisation squamous intraepithelial lesion (HSIL)-combined for YES and NO prediction for different equalisation methods (no correction of minority class, under-sampling, oversampling and synthetic minority over-sampling technique [SMOTE]) for both RAW and Class settings. Best performance of multi-layer perceptron (MLP) is on dataset with data organised in classes and over-sampling method for minority class – MCC = 0.64. Lowest performance is with original dataset without correction for minority class – MCC = 0.086.](https://sciendo-parsed.s3.eu-central-1.amazonaws.com/647356504e662f30ba53ab32/j_raon-2022-0023_fig_002.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=ASIA6AP2G7AKHDZFKGBU%2F20260305%2Feu-central-1%2Fs3%2Faws4_request&X-Amz-Date=20260305T095156Z&X-Amz-Expires=3600&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEAAaDGV1LWNlbnRyYWwtMSJHMEUCIEviR%2FUxjQmP%2FZP23tT5%2F2abpqtoZa%2FkWTwoUVN0ln%2BOAiEA%2BNmg%2FIupDARafPvVMNsdAwA57NKxKY0tphb8eI%2BfI%2BwqxgUIyf%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARACGgw5NjMxMzQyODk5NDAiDChBbB43cNv67n8OeyqaBd4BSghDh2wqxqpq8fOAw6CYmRbb6Ig9acN%2FTzqX3rzLUw8gC8rUq2Iai6BAS4uFk262UGaUVJJScwVS4hAGuouTL5wpYLgJUIhl6cSz3UvJWeednrvv2I29bz1oTTg0cN3zvsk1YAKVnzYvP2ZCMVlAt1gplIPP5o5z9DHtIRmZ56wir8QFo0GHKHFADbRTsy9m060Z4JImnz8UhTiiTnAm2U%2FdAjCU2w5GnLXFFviedxwmJDML2bbACt%2BQyqNfem9pCJa%2BiUdEuthQTyGkgPU7WRslu033AZtw%2BhYQ%2BX38cpskGjrRiSeC9vobfJ4DIxYuIYPVakIgMKmsWxhJesTZshrrv8Xzq8vSCf0qgluOj%2F%2FbJbUaN9fGxZk5e%2BnW0j9UfYy5i3GoJjDp6mye%2FyM0tIA%2FcSADCVa8pDOtyQv1KRKZMk0hcJ3sja9c2T%2FIHDaCcSjQ8IguIUGJdFvQAYWTjEaQUJM%2FcuU5sUvSFIB3APRvnGphvVUYUYIDqio6cvty1Z7ujhKq2ABIQTyS4m1JDYCLDIEL2jNDI80vD6uBv3LvdIFeL9AUydEYMdDRVZg%2BasgfYLgbSs5HZSaTEjk8wPvu%2BZd9hAVsZz9koGqf8MPp4s52BpSx2MH8%2BriPd6ucVRjXc5AWTIPtr5hXaIRv7ff4AjrkMWGy3%2B%2FWlh2k%2FkEl4qX2dNjOh1qhQRtyQCNFBY1u%2B0SjuctsNR10godw4fqo7FxsFNil3%2Fn1lIgJXLKa5PyOLOycLfa4ZMm3gQzNiGqvNV%2FYQHtjfD1I%2FTAqdXLOLPtR7LxCNZxyTkpNFJ5zGANnm9zPz1m%2BKFvj3sNwNNXj8verS3oqfIh0PRYwSuvNdCIkEVLphlqml8F7ti33%2BSuCRHGdTTC15aTNBjqxAZfGBjAfK8mQh2yE%2FepqhmqYpnmKIUJmcKXYR0TbfHBVIx4rWgxTTJXmXyg2gtOeqWsy8FZVvA%2BOidhZhZ3hcJA%2BpiyR05SKsoD%2F60KX%2Boj%2FrefbRdlNpyTtdKxRrJQbZpnqMxroC1tEaZA72CdTcGRg8dRxNaRrNxg7JT3HtCPs455mSV3XaP0p1bjpoj5ZtASkvrZ4bm7lqPJj46bZ1g4xfAVZb2HEPYcWD2%2FaiqZmIA%3D%3D&X-Amz-Signature=0fa9e37e64914923a12e89d85b8f012ccb6448bda27ab5fa405bfda16d6be1d2&X-Amz-SignedHeaders=host&x-amz-checksum-mode=ENABLED&x-id=GetObject)
Figure 3
![True positive and False positive rate for different settings for prediction Yes and No combined and for different equalisation methods (no correction of minority class, under-sampling, over-sampling and synthetic minority over-sampling technique [SMOTE]) for both RAW and Class settings. Best performance model from Figure 2 has 0.842 true positive rate and 0.182 false positive rate. Lowest performance model from Figure 2 has high 0.814 true positive rate which is almost as high as best performance model but also high false positive rate 0.735.
Raw = original settings; Class = class setting; FPR = false positive rate; HSIL = high grade squamous intraepithelial lesion; overs = oversampling; TPR = true positive rate; unders = undersampling; SMOTE = synthetic minority over-sampling technique](https://sciendo-parsed.s3.eu-central-1.amazonaws.com/647356504e662f30ba53ab32/j_raon-2022-0023_fig_003.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=ASIA6AP2G7AKHDZFKGBU%2F20260305%2Feu-central-1%2Fs3%2Faws4_request&X-Amz-Date=20260305T095156Z&X-Amz-Expires=3600&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEAAaDGV1LWNlbnRyYWwtMSJHMEUCIEviR%2FUxjQmP%2FZP23tT5%2F2abpqtoZa%2FkWTwoUVN0ln%2BOAiEA%2BNmg%2FIupDARafPvVMNsdAwA57NKxKY0tphb8eI%2BfI%2BwqxgUIyf%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARACGgw5NjMxMzQyODk5NDAiDChBbB43cNv67n8OeyqaBd4BSghDh2wqxqpq8fOAw6CYmRbb6Ig9acN%2FTzqX3rzLUw8gC8rUq2Iai6BAS4uFk262UGaUVJJScwVS4hAGuouTL5wpYLgJUIhl6cSz3UvJWeednrvv2I29bz1oTTg0cN3zvsk1YAKVnzYvP2ZCMVlAt1gplIPP5o5z9DHtIRmZ56wir8QFo0GHKHFADbRTsy9m060Z4JImnz8UhTiiTnAm2U%2FdAjCU2w5GnLXFFviedxwmJDML2bbACt%2BQyqNfem9pCJa%2BiUdEuthQTyGkgPU7WRslu033AZtw%2BhYQ%2BX38cpskGjrRiSeC9vobfJ4DIxYuIYPVakIgMKmsWxhJesTZshrrv8Xzq8vSCf0qgluOj%2F%2FbJbUaN9fGxZk5e%2BnW0j9UfYy5i3GoJjDp6mye%2FyM0tIA%2FcSADCVa8pDOtyQv1KRKZMk0hcJ3sja9c2T%2FIHDaCcSjQ8IguIUGJdFvQAYWTjEaQUJM%2FcuU5sUvSFIB3APRvnGphvVUYUYIDqio6cvty1Z7ujhKq2ABIQTyS4m1JDYCLDIEL2jNDI80vD6uBv3LvdIFeL9AUydEYMdDRVZg%2BasgfYLgbSs5HZSaTEjk8wPvu%2BZd9hAVsZz9koGqf8MPp4s52BpSx2MH8%2BriPd6ucVRjXc5AWTIPtr5hXaIRv7ff4AjrkMWGy3%2B%2FWlh2k%2FkEl4qX2dNjOh1qhQRtyQCNFBY1u%2B0SjuctsNR10godw4fqo7FxsFNil3%2Fn1lIgJXLKa5PyOLOycLfa4ZMm3gQzNiGqvNV%2FYQHtjfD1I%2FTAqdXLOLPtR7LxCNZxyTkpNFJ5zGANnm9zPz1m%2BKFvj3sNwNNXj8verS3oqfIh0PRYwSuvNdCIkEVLphlqml8F7ti33%2BSuCRHGdTTC15aTNBjqxAZfGBjAfK8mQh2yE%2FepqhmqYpnmKIUJmcKXYR0TbfHBVIx4rWgxTTJXmXyg2gtOeqWsy8FZVvA%2BOidhZhZ3hcJA%2BpiyR05SKsoD%2F60KX%2Boj%2FrefbRdlNpyTtdKxRrJQbZpnqMxroC1tEaZA72CdTcGRg8dRxNaRrNxg7JT3HtCPs455mSV3XaP0p1bjpoj5ZtASkvrZ4bm7lqPJj46bZ1g4xfAVZb2HEPYcWD2%2FaiqZmIA%3D%3D&X-Amz-Signature=7ba6811570854899b4d50c06aedb7246a26ed7019d3ad83ec13db5614e6dd385&X-Amz-SignedHeaders=host&x-amz-checksum-mode=ENABLED&x-id=GetObject)
Figure 4

Figure 5

Number and percentage of patients according to human papilloma virus (HPV) 16 and 18 statuses in high grade squamous intraepithelial lesion (HSIL) and NO-HSIL group
| HPV 16 | HPV 18 | |||||||
|---|---|---|---|---|---|---|---|---|
| HSIL group | NO-HSIL group | HSIL group | NO-HSIL group | |||||
| Frequency | % | Frequency | % | Frequency | % | Frequency | % | |
| not performed | 177 | 14 | 29 | 16 | 172 | 13 | 27 | 15 |
| negative | 693 | 54 | 106 | 57 | 775 | 60 | 120 | 65 |
| positive | 419 | 32 | 51 | 27 | 342 | 27 | 39 | 20 |
| Total | 1289 | 100 | 186 | 100 | 1289 | 100 | 186 | 100 |
Results of multi-layer perceptron (MLP) classifications for different settings with baseline prediction – ZeroR, percentage of correct classification and Kappa statistic for all analysis_ Results are for prediction high grade squamous intraepithelial lesion (HSIL)-Yes (Y), prediction NO-HSIL (N) and weighted average for whole model (YES and NO combined) – Weighted average (AVG)_ In bold-type letters are results, where prediction by MLP is better than baseline prediction ZeroR
| TP Rate | FP Rate | Precision | Recall | F-Measure | MCC | ROC Area | PRC Area | Class | % Correct | Kappa | ZeroR % | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Class_orig–Y | 0.751 | 0.634 | 0.739 | 0.751 | 0.745 | 0.118 | 0.567 | 0.735 | Yes | 82.10 | 0.0965 | 87.39 |
| Class_orig–N | 0.366 | 0.249 | 0.308 | 0.366 | 0.373 | 0.118 | 0.567 | 0.377 | No | |||
| Class_orig–AVG | 0.637 | 0.521 | 0.633 | 0.637 | 0.635 | 0.118 | 0.567 | 0.629 | Weighted Avg | |||
| Class_overs–Y | 0.860 | 0.201 | 0.908 | 0.860 | 0.884 | 0.640 | 0.870 | 0.920 | Yes | 84.19 | 0.6376 | 69.79 |
| Class_overs–N | 0.799 | 0.140 | 0.712 | 0.799 | 0.753 | 0.640 | 0.870 | 0.703 | No | |||
| Class_overs–AVG | 0.842 | 0.182 | 0.849 | 0.842 | 0.844 | 0.640 | 0.870 | 0.855 | Weighted Avg | |||
| Class_SMOTE–Y | 0.797 | 0.274 | 0.834 | 0.797 | 0.815 | 0.515 | 0.802 | 0.850 | Yes | 77.08 | 0.5141 | 63.40 |
| Class_SMOTE–N | 0.726 | 0.203 | 0.673 | 0.726 | 0.699 | 0.515 | 0.802 | 0.669 | No | |||
| Class_SMOTE–AVG | 0.771 | 0.248 | 0.775 | 0.771 | 0.772 | 0.515 | 0.802 | 0.784 | Weighted Avg | |||
| Class_unders–Y | 0.669 | 0.559 | 0.636 | 0.669 | 0.652 | 0.112 | 0.542 | 0.608 | Yes | 57.64 | 0.1113 | 59.39 |
| Class_unders–N | 0.441 | 0.331 | 0.477 | 0.441 | 0.458 | 0.112 | 0.542 | 0.448 | No | |||
| Class_unders–AVG | 0.576 | 0.466 | 0.572 | 0.576 | 0.573 | 0.112 | 0.542 | 0.543 | Weighted Avg | |||
| RAW_orig–Y | 0.907 | 0.828 | 0.884 | 0.907 | 0.895 | 0.086 | 0.594 | 0.905 | Yes | 81.42 | 0.0856 | 87.39 |
| RAW_orig–N | 0.172 | 0.093 | 0.211 | 0.172 | 0.189 | 0.086 | 0.594 | 0.174 | No | |||
| RAW_orig–AVG | 0.814 | 0.735 | 0.799 | 0.814 | 0.806 | 0.086 | 0.594 | 0.813 | Weighted Avg | |||
| RAW_overs–Y | 0.825 | 0.285 | 0.870 | 0.825 | 0.847 | 0.525 | 0.837 | 0.905 | Yes | 79.21 | 0.523 | 69.79 |
| RAW_overs–N | 0.715 | 0.175 | 0.639 | 0.715 | 0.675 | 0.525 | 0.837 | 0.661 | No | |||
| RAW_overs–AVG | 0.792 | 0.252 | 0.800 | 0.792 | 0.795 | 0.525 | 0.837 | 0.831 | Weighted Avg | |||
| RAW_SMOTE–Y | 0.800 | 0.258 | 0.843 | 0.800 | 0.821 | 0.533 | 0.814 | 0.867 | Yes | 77.87 | 0.5318 | 63.4 |
| RAW_SMOTE–N | 0.742 | 0.200 | 0.681 | 0.742 | 0.710 | 0.533 | 0.814 | 0.691 | No | |||
| RAW_SMOTE–AVG | 0.779 | 0.237 | 0.784 | 0.779 | 0.780 | 0.533 | 0.814 | 0.802 | Weighted Avg | |||
| RAW_unders–Y | 0.688 | 0.575 | 0.636 | 0.688 | 0.661 | 0.115 | 0.551 | 0.614 | Yes | 58.08 | 0.1144 | 59.39 |
| RAW_unders–N | 0.425 | 0.313 | 0.482 | 0.425 | 0.451 | 0.115 | 0.551 | 0.466 | No | |||
| RAW_unders–AVG | 0.581 | 0.469 | 0.573 | 0.581 | 0.576 | 0.115 | 0.551 | 0.554 | Weighted Avg |
Confusion matrix for classification with all possible outcomes
| Predicted pos (PP) | Predicted neg (PN) | |
|---|---|---|
| Actual pos (P) | True positives (TP) | False negatives (FN) |
| Actual neg (N) | False positives (FP) | True negatives (TN) |
Final histology of the cone in patients without human papilloma virus (HPV) testing
| Frequency | Percent | |
|---|---|---|
| NO dysplasia | 9 | 1.8 |
| CIN 1 | 26 | 5.3 |
| CIN 1–2 | 27 | 5.4 |
| CIN 2 | 90 | 18.1 |
| CIN 2–3 | 55 | 11.1 |
| CIN 3 | 223 | 45.0 |
| CIS | 55 | 11.1 |
| invasive ca | 11 | 2.2 |
| Total | 496 | 100.0 |