en |
Predicting preeclampsia and related risk factors using data mining approaches: A cross-sectional study
Manoochehri, Zohreh; Manoochehri, Sara; Soltani, Farzaneh; Tapak, Leili & Sadeghifar, Majid
Abstract
Background: Preeclampsia is a type of pregnancy hypertension disorder that has
adverse effects on both the mother and the fetus. Despite recent advances in the
etiology of preeclampsia, no adequate clinical screening tests have been identified
to diagnose the disorder.
Objective: We aimed to provide a model based on data mining approaches that can
be used as a screening tool to identify patients with this syndrome and also to identify
the risk factors associated with it.
Materials and Methods: The data used to perform this cross-sectional study were
extracted from the clinical records of 726 mothers with preeclampsia and 726 mothers
without preeclampsia who were referred to Fatemieh Hospital in Hamadan City during
April 2005–March 2015. In this study, six data mining methods were adopted, including
logistic regression, k-nearest neighborhood, C5.0 decision tree, discriminant analysis,
random forest, and support vector machine, and their performance was compared
using the criteria of accuracy, sensitivity, and specificity.
Results: Underlying condition, age, pregnancy season and the number of pregnancies
were the most important risk factors for diagnosing preeclampsia. The accuracy of the
models were as follows: logistic regression (0.713), k-nearest neighborhood (0.742),
C5.0 decision tree (0.788), discriminant analysis (0.687), random forest (0.758) and
support vector machine (0.791).
Conclusion: Among the data mining methods employed in this study, support vector
machine was the most accurate in predicting preeclampsia. Therefore, this model can
be considered as a screening tool to diagnose this disorder.
Keywords
Preeclampsia; Random forest; C5.0 decision tree; Support vector machine; Logistic regression.
|