Have a personal or library account? Click to login

Application of machine learning algorithm for predicting gestational diabetes mellitus in early pregnancy†

Open Access
|Sep 2021

Abstract

Objective

To study the application of a machine learning algorithm for predicting gestational diabetes mellitus (GDM) in early pregnancy.

Methods

This study identified indicators related to GDM through a literature review and expert discussion. Pregnant women who had attended medical institutions for an antenatal examination from November 2017 to August 2018 were selected for analysis, and the collected indicators were retrospectively analyzed. Based on Python, the indicators were classified and modeled using a random forest regression algorithm, and the performance of the prediction model was analyzed.

Results

We obtained 4806 analyzable data from 1625 pregnant women. Among these, 3265 samples with all 67 indicators were used to establish data set F1; 4806 samples with 38 identical indicators were used to establish data set F2. Each of F1 and F2 was used for training the random forest algorithm. The overall predictive accuracy of the F1 model was 93.10%, area under the receiver operating characteristic curve (AUC) was 0.66, and the predictive accuracy of GDM-positive cases was 37.10%. The corresponding values for the F2 model were 88.70%, 0.87, and 79.44%. The results thus showed that the F2 prediction model performed better than the F1 model. To explore the impact of sacrificial indicators on GDM prediction, the F3 data set was established using 3265 samples (F1) with 38 indicators (F2). After training, the overall predictive accuracy of the F3 model was 91.60%, AUC was 0.58, and the predictive accuracy of positive cases was 15.85%.

Conclusions

In this study, a model for predicting GDM with several input variables (e.g., physical examination, past history, personal history, family history, and laboratory indicators) was established using a random forest regression algorithm. The trained prediction model exhibited a good performance and is valuable as a reference for predicting GDM in women at an early stage of pregnancy. In addition, there are certain requirements for the proportions of negative and positive cases in sample data sets when the random forest algorithm is applied to the early prediction of GDM.

DOI: https://doi.org/10.2478/fon-2021-0022 | Journal eISSN: 2544-8994 | Journal ISSN: 2097-5368
Language: English
Page range: 209 - 221
Submitted on: Nov 24, 2020
Accepted on: Jan 11, 2021
Published on: Sep 21, 2021
Published by: Shanxi Medical Periodical Press
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2021 Li-Li Wei, Yue-Shuai Pan, Yan Zhang, Kai Chen, Hao-Yu Wang, Jing-Yuan Wang, published by Shanxi Medical Periodical Press
This work is licensed under the Creative Commons Attribution 4.0 License.