Predicting preeclampsia and related risk factors using data mining approaches: A cross-sectional study

Background: Preeclampsia is a type of pregnancy hypertension disorder that has adverse effects on both the mother and the fetus. Despite recent advances in the etiology of preeclampsia, no adequate clinical screening tests have been identified to diagnose the disorder.
Objective: We aimed to provide a model based on data mining approaches that can be used as a screening tool to identify patients with this syndrome and also to identify the risk factors associated with it.
Materials and Methods: The data used to perform this cross-sectional study were extracted from the clinical records of 726 mothers with preeclampsia and 726 mothers without preeclampsia who were referred to Fatemieh Hospital in Hamadan City during April 2005–March 2015. In this study, six data mining methods were adopted, including logistic regression, k-nearest neighborhood, C5.0 decision tree, discriminant analysis, random forest, and support vector machine, and their performance was compared using the criteria of accuracy, sensitivity, and specificity.
Results: Underlying condition, age, pregnancy season and the number of pregnancies were the most important risk factors for diagnosing preeclampsia. The accuracy of the models were as follows: logistic regression (0.713), k-nearest neighborhood (0.742), C5.0 decision tree (0.788), discriminant analysis (0.687), random forest (0.758) and support vector machine (0.791).
Conclusion: Among the data mining methods employed in this study, support vector machine was the most accurate in predicting preeclampsia. Therefore, this model can be considered as a screening tool to diagnose this disorder.

Keywords
Preeclampsia; Random forest; C5.0 decision tree; Support vector machine; Logistic regression.