Publication Title
PLoS ONE
Document Type
Article
Abstract/Description
Background: Cardiovascular diseases (CVD) are one of the leading global causes of death, which requires an accurate early prediction. This study aimed to develop transparent machine learning (ML) models using National Health and Nutrition Examination Survey (NHANES) data from 2017–2023 to predict CVD risk based on dietary and health factors.
Methods: We analyzed data from 12,382 adults (aged 18 and older) from NHANES 2017–2023, including 41 dietary, anthropometric, clinical, and demographic variables. Recursive Feature Elimination (RFE) was used to select an optimal subset of 30 predictors. To address substantial class imbalance in the outcome, we applied the Random Over-Sampling Examples (ROSE) technique to the training data. Five machine learning models—Logistic Regression, Random Forest, Support Vector Machines, XGBoost, and LightGBM—were trained and evaluated. Model interpretability was assessed using LIME and SHAP.
Results: Participants with CVD differed significantly from those without CVD in age, waist circumference, systolic blood pressure, C-reactive protein (CRP), and multiple dietary nutrients, with a consistently lower nutrient intake in the CVD group. Among the ML models evaluated, XGBoost achieved the highest accuracy (0.8216) and recall (0.8645), while Random Forest showed the highest AUROC (0.8139). Interpretability analyses identified age as the strongest predictor, followed by vitamin B12, total cholesterol, CRP, and waist circumference.
Conclusion: Interpretable ML models effectively identified key dietary and clinical factors for CVD risk. Nutrients like vitamin B12 and niacin, alongside established clinical indicators, emerged as significant predictors, underscoring their potential role in nutritional interventions and public health strategies for CVD prevention.
Department
Mathematics
DOI
10.1371/journal.pone.0335915
Volume
20
Issue
11
ISSN
1932-6203
Date
11-6-2025
Citation Information
Ahiduzzaman, Md and Hasan, Md Nahid, "Interpretable Machine Learning for Cardiovascular Risk Prediction: Insights from NHANES Dietary and Health Data" (2025). Faculty Publications. 242.
https://lair.etamu.edu/cose-faculty-publications/242
