Figure 1

Figure 2
![Matthews correlation coefficient (MCC) for categorisation squamous intraepithelial lesion (HSIL)-combined for YES and NO prediction for different equalisation methods (no correction of minority class, under-sampling, oversampling and synthetic minority over-sampling technique [SMOTE]) for both RAW and Class settings. Best performance of multi-layer perceptron (MLP) is on dataset with data organised in classes and over-sampling method for minority class – MCC = 0.64. Lowest performance is with original dataset without correction for minority class – MCC = 0.086.](https://sciendo-parsed.s3.eu-central-1.amazonaws.com/647356504e662f30ba53ab32/j_raon-2022-0023_fig_002.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=ASIA6AP2G7AKPENEKDWC%2F20260118%2Feu-central-1%2Fs3%2Faws4_request&X-Amz-Date=20260118T230120Z&X-Amz-Expires=3600&X-Amz-Security-Token=IQoJb3JpZ2luX2VjELr%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaDGV1LWNlbnRyYWwtMSJHMEUCIA4NzPlHb1AaA1eYYheoUwpZN9yNaqCvb%2Ft8zU%2BXcCzNAiEA1GIVeVz3CSCabUDq%2F8l7NhrvwGPgo3DwIBO6dWdXeO4qxgUIg%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARACGgw5NjMxMzQyODk5NDAiDOuqNxbeWW2LkYKaliqaBUjzhwEljSmWgNQ54MBzMfpX6g5tdWWC4cgk4rtwwAmpnGgh4G5S5bJ%2Fg1xaJO0U2D48HUR07nuoFrd5FmfWfxGHU7J9k1nykiIapuB4cQ5PWdJaj%2BMRVkhpgSu48rmfpeR4spr8D3pMvC7pf1Yuyrmp3vJN3T52Nz6b%2B6GPQNsbgg4TQTPDoKtzx8ENnOtw9WKfIP1CviUUe2akVh%2Bz2glKni982005a%2BVygnkryxLNVuRZHzeDtesDTPAm2DzbBWLoR4DuC8Pac4ruujS4aDvHPMW1qmBtw10la1zwXrFS2MJ%2Ben%2F6WNkHmT%2B0MbzL7aup2Y67sV02P8Tyoeu23gahkzBhStXwqKYkO66LFaMi0NB6lwuqlT04Awl4ZiVLvhkaCv0O9TejFJCv5anugCXHe3YRHSaLoAWMuEO3CFeYGuMB3wh%2BBBQFz433zFfUxV95ZCZO8v1mala9JvfUMy2Wp7bRMMFrSfCqx%2FVDtchUDlMQgsh7raZNJoSimdhmdB4TX47CX%2FW45iJwGLsA8j%2BUEnU%2FjeaDoXawyKXgbeubq21cL%2F5%2BFMCiVq1RSHNuS5ttkqW0wVWdmwU9%2F7sI3pnd8UhajBvbIKGE0WrXvEN7yd8F%2F1smg4aKqswTfotGree2biFxn61eU17tX%2B28stm0jhHf0jG7%2BM%2FDuATPdqwriHxNxZfnrkgJw4MJZQEQj9jej3oQ24FjidI2MJP4Z36gB0HnYpPrOOGc3DvKtGAvnImWE%2Fk9D6Kx37mOZGWx%2BqkGqS92NCsIsrQs%2FXM9NtjuhrExhVdbzH%2F%2FTz9tAkNGnVHoWXU9xlK6oBZlK%2BcVwI4wkAZjpW8UiUM%2BTTrP7naEYfW%2FqDDSGg1A%2FHN92IM1C7FbqRoqCOV6mDDlubTLBjqxAbx17pYWOtcHw5bEC6M99X6t1W7GCcov%2BueKK7rwbbLylvgcgFn4RuXiezcqJQwRiDhdSQs1GacS9F7ZRNHHUNN7vEY%2B9PQYH88z5pbfuHk3NHAb65PRKMCnt3pB%2FQ3Brz%2BrD%2BrF2gYLRzaldbK3JJlEIDvRr1Gjmu%2FajSAVKUET0f%2FzMnALXCOXh%2FOxEQ1K95RIpe7hXzZHyhoJB2B9%2FsgCDDc1eOFp5pF3oAtD7KR4cQ%3D%3D&X-Amz-Signature=b3ea317d9c78ac7854995df32bfe39381290ed1cbfdfb47ab1cc14655fefcd65&X-Amz-SignedHeaders=host&x-amz-checksum-mode=ENABLED&x-id=GetObject)
Figure 3
![True positive and False positive rate for different settings for prediction Yes and No combined and for different equalisation methods (no correction of minority class, under-sampling, over-sampling and synthetic minority over-sampling technique [SMOTE]) for both RAW and Class settings. Best performance model from Figure 2 has 0.842 true positive rate and 0.182 false positive rate. Lowest performance model from Figure 2 has high 0.814 true positive rate which is almost as high as best performance model but also high false positive rate 0.735.
Raw = original settings; Class = class setting; FPR = false positive rate; HSIL = high grade squamous intraepithelial lesion; overs = oversampling; TPR = true positive rate; unders = undersampling; SMOTE = synthetic minority over-sampling technique](https://sciendo-parsed.s3.eu-central-1.amazonaws.com/647356504e662f30ba53ab32/j_raon-2022-0023_fig_003.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=ASIA6AP2G7AKPENEKDWC%2F20260118%2Feu-central-1%2Fs3%2Faws4_request&X-Amz-Date=20260118T230120Z&X-Amz-Expires=3600&X-Amz-Security-Token=IQoJb3JpZ2luX2VjELr%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaDGV1LWNlbnRyYWwtMSJHMEUCIA4NzPlHb1AaA1eYYheoUwpZN9yNaqCvb%2Ft8zU%2BXcCzNAiEA1GIVeVz3CSCabUDq%2F8l7NhrvwGPgo3DwIBO6dWdXeO4qxgUIg%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARACGgw5NjMxMzQyODk5NDAiDOuqNxbeWW2LkYKaliqaBUjzhwEljSmWgNQ54MBzMfpX6g5tdWWC4cgk4rtwwAmpnGgh4G5S5bJ%2Fg1xaJO0U2D48HUR07nuoFrd5FmfWfxGHU7J9k1nykiIapuB4cQ5PWdJaj%2BMRVkhpgSu48rmfpeR4spr8D3pMvC7pf1Yuyrmp3vJN3T52Nz6b%2B6GPQNsbgg4TQTPDoKtzx8ENnOtw9WKfIP1CviUUe2akVh%2Bz2glKni982005a%2BVygnkryxLNVuRZHzeDtesDTPAm2DzbBWLoR4DuC8Pac4ruujS4aDvHPMW1qmBtw10la1zwXrFS2MJ%2Ben%2F6WNkHmT%2B0MbzL7aup2Y67sV02P8Tyoeu23gahkzBhStXwqKYkO66LFaMi0NB6lwuqlT04Awl4ZiVLvhkaCv0O9TejFJCv5anugCXHe3YRHSaLoAWMuEO3CFeYGuMB3wh%2BBBQFz433zFfUxV95ZCZO8v1mala9JvfUMy2Wp7bRMMFrSfCqx%2FVDtchUDlMQgsh7raZNJoSimdhmdB4TX47CX%2FW45iJwGLsA8j%2BUEnU%2FjeaDoXawyKXgbeubq21cL%2F5%2BFMCiVq1RSHNuS5ttkqW0wVWdmwU9%2F7sI3pnd8UhajBvbIKGE0WrXvEN7yd8F%2F1smg4aKqswTfotGree2biFxn61eU17tX%2B28stm0jhHf0jG7%2BM%2FDuATPdqwriHxNxZfnrkgJw4MJZQEQj9jej3oQ24FjidI2MJP4Z36gB0HnYpPrOOGc3DvKtGAvnImWE%2Fk9D6Kx37mOZGWx%2BqkGqS92NCsIsrQs%2FXM9NtjuhrExhVdbzH%2F%2FTz9tAkNGnVHoWXU9xlK6oBZlK%2BcVwI4wkAZjpW8UiUM%2BTTrP7naEYfW%2FqDDSGg1A%2FHN92IM1C7FbqRoqCOV6mDDlubTLBjqxAbx17pYWOtcHw5bEC6M99X6t1W7GCcov%2BueKK7rwbbLylvgcgFn4RuXiezcqJQwRiDhdSQs1GacS9F7ZRNHHUNN7vEY%2B9PQYH88z5pbfuHk3NHAb65PRKMCnt3pB%2FQ3Brz%2BrD%2BrF2gYLRzaldbK3JJlEIDvRr1Gjmu%2FajSAVKUET0f%2FzMnALXCOXh%2FOxEQ1K95RIpe7hXzZHyhoJB2B9%2FsgCDDc1eOFp5pF3oAtD7KR4cQ%3D%3D&X-Amz-Signature=98d58df83b67cc8338bf6424025121c954dc4c503dba790698b11b4dbd0c84ba&X-Amz-SignedHeaders=host&x-amz-checksum-mode=ENABLED&x-id=GetObject)
Figure 4

Figure 5

Number and percentage of patients according to human papilloma virus (HPV) 16 and 18 statuses in high grade squamous intraepithelial lesion (HSIL) and NO-HSIL group
| HPV 16 | HPV 18 | |||||||
|---|---|---|---|---|---|---|---|---|
| HSIL group | NO-HSIL group | HSIL group | NO-HSIL group | |||||
| Frequency | % | Frequency | % | Frequency | % | Frequency | % | |
| not performed | 177 | 14 | 29 | 16 | 172 | 13 | 27 | 15 |
| negative | 693 | 54 | 106 | 57 | 775 | 60 | 120 | 65 |
| positive | 419 | 32 | 51 | 27 | 342 | 27 | 39 | 20 |
| Total | 1289 | 100 | 186 | 100 | 1289 | 100 | 186 | 100 |
Results of multi-layer perceptron (MLP) classifications for different settings with baseline prediction – ZeroR, percentage of correct classification and Kappa statistic for all analysis_ Results are for prediction high grade squamous intraepithelial lesion (HSIL)-Yes (Y), prediction NO-HSIL (N) and weighted average for whole model (YES and NO combined) – Weighted average (AVG)_ In bold-type letters are results, where prediction by MLP is better than baseline prediction ZeroR
| TP Rate | FP Rate | Precision | Recall | F-Measure | MCC | ROC Area | PRC Area | Class | % Correct | Kappa | ZeroR % | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Class_orig–Y | 0.751 | 0.634 | 0.739 | 0.751 | 0.745 | 0.118 | 0.567 | 0.735 | Yes | 82.10 | 0.0965 | 87.39 |
| Class_orig–N | 0.366 | 0.249 | 0.308 | 0.366 | 0.373 | 0.118 | 0.567 | 0.377 | No | |||
| Class_orig–AVG | 0.637 | 0.521 | 0.633 | 0.637 | 0.635 | 0.118 | 0.567 | 0.629 | Weighted Avg | |||
| Class_overs–Y | 0.860 | 0.201 | 0.908 | 0.860 | 0.884 | 0.640 | 0.870 | 0.920 | Yes | 84.19 | 0.6376 | 69.79 |
| Class_overs–N | 0.799 | 0.140 | 0.712 | 0.799 | 0.753 | 0.640 | 0.870 | 0.703 | No | |||
| Class_overs–AVG | 0.842 | 0.182 | 0.849 | 0.842 | 0.844 | 0.640 | 0.870 | 0.855 | Weighted Avg | |||
| Class_SMOTE–Y | 0.797 | 0.274 | 0.834 | 0.797 | 0.815 | 0.515 | 0.802 | 0.850 | Yes | 77.08 | 0.5141 | 63.40 |
| Class_SMOTE–N | 0.726 | 0.203 | 0.673 | 0.726 | 0.699 | 0.515 | 0.802 | 0.669 | No | |||
| Class_SMOTE–AVG | 0.771 | 0.248 | 0.775 | 0.771 | 0.772 | 0.515 | 0.802 | 0.784 | Weighted Avg | |||
| Class_unders–Y | 0.669 | 0.559 | 0.636 | 0.669 | 0.652 | 0.112 | 0.542 | 0.608 | Yes | 57.64 | 0.1113 | 59.39 |
| Class_unders–N | 0.441 | 0.331 | 0.477 | 0.441 | 0.458 | 0.112 | 0.542 | 0.448 | No | |||
| Class_unders–AVG | 0.576 | 0.466 | 0.572 | 0.576 | 0.573 | 0.112 | 0.542 | 0.543 | Weighted Avg | |||
| RAW_orig–Y | 0.907 | 0.828 | 0.884 | 0.907 | 0.895 | 0.086 | 0.594 | 0.905 | Yes | 81.42 | 0.0856 | 87.39 |
| RAW_orig–N | 0.172 | 0.093 | 0.211 | 0.172 | 0.189 | 0.086 | 0.594 | 0.174 | No | |||
| RAW_orig–AVG | 0.814 | 0.735 | 0.799 | 0.814 | 0.806 | 0.086 | 0.594 | 0.813 | Weighted Avg | |||
| RAW_overs–Y | 0.825 | 0.285 | 0.870 | 0.825 | 0.847 | 0.525 | 0.837 | 0.905 | Yes | 79.21 | 0.523 | 69.79 |
| RAW_overs–N | 0.715 | 0.175 | 0.639 | 0.715 | 0.675 | 0.525 | 0.837 | 0.661 | No | |||
| RAW_overs–AVG | 0.792 | 0.252 | 0.800 | 0.792 | 0.795 | 0.525 | 0.837 | 0.831 | Weighted Avg | |||
| RAW_SMOTE–Y | 0.800 | 0.258 | 0.843 | 0.800 | 0.821 | 0.533 | 0.814 | 0.867 | Yes | 77.87 | 0.5318 | 63.4 |
| RAW_SMOTE–N | 0.742 | 0.200 | 0.681 | 0.742 | 0.710 | 0.533 | 0.814 | 0.691 | No | |||
| RAW_SMOTE–AVG | 0.779 | 0.237 | 0.784 | 0.779 | 0.780 | 0.533 | 0.814 | 0.802 | Weighted Avg | |||
| RAW_unders–Y | 0.688 | 0.575 | 0.636 | 0.688 | 0.661 | 0.115 | 0.551 | 0.614 | Yes | 58.08 | 0.1144 | 59.39 |
| RAW_unders–N | 0.425 | 0.313 | 0.482 | 0.425 | 0.451 | 0.115 | 0.551 | 0.466 | No | |||
| RAW_unders–AVG | 0.581 | 0.469 | 0.573 | 0.581 | 0.576 | 0.115 | 0.551 | 0.554 | Weighted Avg |
Confusion matrix for classification with all possible outcomes
| Predicted pos (PP) | Predicted neg (PN) | |
|---|---|---|
| Actual pos (P) | True positives (TP) | False negatives (FN) |
| Actual neg (N) | False positives (FP) | True negatives (TN) |
Final histology of the cone in patients without human papilloma virus (HPV) testing
| Frequency | Percent | |
|---|---|---|
| NO dysplasia | 9 | 1.8 |
| CIN 1 | 26 | 5.3 |
| CIN 1–2 | 27 | 5.4 |
| CIN 2 | 90 | 18.1 |
| CIN 2–3 | 55 | 11.1 |
| CIN 3 | 223 | 45.0 |
| CIS | 55 | 11.1 |
| invasive ca | 11 | 2.2 |
| Total | 496 | 100.0 |