Advanced Prediction of Chronic Kidney Failure Using Biostatistical and Machine Learning Models: An Age-Stratified Analytical Study Based on Simulated Iraqi Patient Data
Main Article Content
Abstract
Chronic kidney disease has become one of the major public health issues in Iraq due to the increasing rates of diabetes, high blood pressure, and an ageing population. The aim of this study was to investigate how age influences the creation of predictive models for the various stages of kidney disease. Key attributes were identified, and a synthetic dataset of 1,000 patients from Iraq was generated. To do this, several predictive models were employed, including Logistic Regression, Random Forest, Support Vector Machines (SVM), and Extreme Gradient Boosting (XGBoost). A 10-fold cross-validation was performed on all models to assess their stability and generalizability. The models were assessed, and their performance was measured using accuracy, sensitivity, specificity, receiver operating characteristic (ROC), and area under the curve (AUC), as well as on calibration tests, decision curve analysis, interaction tests, and survival analysis. Of all the models evaluated, Random Forest and XGBoost were found to have the best discriminative ability, with AUCs of 0.88 and 0.89, respectively. From the analyses conducted, individuals aged 60 years and older had a significantly higher likelihood of having kidney disease. The most significant predictors were older age, higher serum creatinine levels, and the presence of diabetes and hypertension. The results underscore the clinical value of predictive models for early risk stratification and emphasize the value of predictive technology for information-based management of chronic kidney disease.
