Figure 1.

Patient demographic, laboratory, and radiologic parameters used as inputs to the models
| Parameters | Pearson's correlation coefficient |
|---|---|
| Age* | 0.047 |
| Sex* | 0.025 |
| HBV viral load | 0.007 |
| HbeAg | 0.018 |
| Anti-HBe | 0.004 |
| Anti-HCV | 0.024 |
| HCV-RNA (Positive/Negative) | 0.030 |
| Anti-HIV | 0.004 |
| AFP (for AFA and SFA) | 0.180 |
| TB* | 0.095 |
| DB | 0.096 |
| AST | 0.052 |
| ALT | 0.011 |
| ALP* | 0.120 |
| Albumin | 0.029 |
| Globulin | 0.071 |
| INR* | 0.130 |
| Hemoglobin | 0.037 |
| Total white blood cell count | 0.024 |
| Absolute neutrophil count | 0.058 |
| Absolute lymphocyte count | 0.017 |
| Platelet count | 0.006 |
| BUN | 0.060 |
| Cr | 0.033 |
| FPG | 0.017 |
| Hemoglobin A1C | 0.020 |
| Liver cirrhosis (present/absent)* | 0.075 |
| Liver steatosis (present/absent)* | 0.040 |
Baseline characteristics of patients in the derivation dataset and external validation dataset
| Derivation dataset | Validation dataset | |
|---|---|---|
| Total patients, n | 2,382 | 162 |
| Total follow-ups visit, n | 15,187 | 564 |
| Median follow-up time (IQR), month | 18.0 (52.0) | 11.2 (8.7) |
| Patients developing HCC, n (%) | 117 (4.9%) | 57 (35.2%) |
| BCLC stage*, n (%) | ||
| stage 0 | 26 (28.6%) | 13 (22.8%) |
| stage A | 44 (48.4%) | 32 (56.1%) |
| stage B | 10 (11.0%) | 10 (17.5%) |
| stage C | 11 (12.1%) | 2 (3.5%) |
| Age, mean (SD) (years) | 51.0 (14.7) | 58.0 (13.6) |
| Male, n (%) | 1,331 (55.1%) | 104 (64.2%) |
| Cirrhosis, n (%) | 609 (25.6%) | 57 (35.2%) |
| HCV co-infection, n (%) | 65 (2.7%) | 0 (0.0%) |
Performance of machine learning models in the external validation dataset when some features were missing
| Maximum number of missing features | Model with AFP (AFA) | Model without AFP (SFN) | ||||
|---|---|---|---|---|---|---|
| Sensitivity (95% CI) | Specificity (95% CI) | AUROC (95% CI) | Sensitivity (95% CI) | Specificity (95% CI) | AUROC (95% CI) | |
| 0 | 0.833 (0.535–1.000) | 0.476 (0.263–0.690) | 0.655 (0.458–0.851) | 0.889 (0.684–1.000) | 0.522 (0.318–0.726) | 0.705 (0.554–0.856) |
| 1 | 0.720 (0.544–0.896) | 0.670 (0.580–0.759) | 0.695 (0.594–0.795) | 0.727 (0.575–0.879) | 0.644 (0.564–0.725) | 0.683 (0.595–0.772) |
| 2 | 0.698 (0.560–0.835) | 0.631 (0.559–0.702) | 0.664 (0.586–0.742) | 0.708 (0.580–0.837) | 0.592 (0.524–0.660) | 0.639 (0.564–0.713) |
| 3 | 0.647 (0.533–0.761) | 0.602 (0.551–0.652) | 0.624 (0.562–0.687) | 0.690 (0.583–0.798) | 0.651 (0.609–0.693) | 0.663 (0.605–0.721) |
| 4 | 0.634 (0.522–0.746) | 0.657 (0.615–0.699) | 0.646 (0.585–0.706) | NA | NA | NA |
Performance of machine learning models for HCC prediction in the derivation dataset and the external validation dataset
| Derivation dataset | External validation dataset | |||||
|---|---|---|---|---|---|---|
| Sensitivity (95% CI) | Specificity (95% CI) | AUROC* mean ± SD | Sensitivity (95% CI) | Specificity (95% CI) | AUROC (95% CI) | |
| Models with all features (AF) | ||||||
| With AFP (AFA) | 0.634 (0.559–0.708) | 0.836 (0.830–0.842) | 0.786 ± 0.113 | NA | NA | NA |
| Without AFP (AFN) | 0.553 (0.476–0.630) | 0.786 (0.779–0.792) | 0.731 ± 0.089 | NA | NA | NA |
| Models with selected features (SF) | ||||||
| With AFP (SFA) | 0.683 (0.611–0.755) | 0.756 (0.749–0.763) | 0.727 ± 0.097 | 0.634 (0.522–0.746) | 0.657 (0.615–0.699) | 0.646 (0.585–0.706) |
| Without AFP (SFN) | 0.658 (0.585–0.732) | 0.744 (0.737–0.751) | 0.707 ± 0.088 | 0.690 (0.583–0.798) | 0.651 (0.609–0.693) | 0.663 (0.605–0.721) |