Optimizing bagged trees in an ensemble classifier for improved prediction of diabetes prevalence in women


Citation

Candia Jr., Jose and Adonis, Airish Mae and Perlas, Jesica (2024) Optimizing bagged trees in an ensemble classifier for improved prediction of diabetes prevalence in women. Pertanika Journal of Science & Technology (Malaysia), 32 (4). pp. 1753-1764. ISSN 2231-8526

Abstract

This study aims to optimize the performance of the bagged tree in an ensemble classifier for predicting diabetes prevalence in women. The study used a dataset of 1,888 women with six features: age, BMI, glucose level, insulin level, blood pressure, and pregnancy status. The dataset was divided into training and testing sets with a 70:30 ratio. The bagged tree ensemble classifier was used for the analysis, and five-fold cross-validation was applied. The study found that using all features during training resulted in a 92.3% training accuracy and a 99.5% testing accuracy. However, applying optimization techniques such as feature selection, parameter tuning, and a maximum number of splits improved model performance. Feature selection optimized the accuracy performance by 0.2%, while parameter tuning improved the test accuracy by 0.2%. Moreover, decreasing the maximum number of splits from 1322 to 800 or 600 resulted in an optimized model with 0.1% higher validation accuracy. Finally, the optimized bagged tree models were evaluated using various performance metrics, including accuracy, precision, recall, and F1 score. The study found that Model 1, which used 800 maximum number of splits and 50 learners, outperformed Model 2 in terms of recall and F1 score, while Model 2, which used 600 maximum number of splits and 50 learners, had a higher precision score. The study concludes that optimization techniques can significantly improve the performance of the bagged tree in predicting diabetes prevalence in women.


Download File

Full text available from:

Abstract

This study aims to optimize the performance of the bagged tree in an ensemble classifier for predicting diabetes prevalence in women. The study used a dataset of 1,888 women with six features: age, BMI, glucose level, insulin level, blood pressure, and pregnancy status. The dataset was divided into training and testing sets with a 70:30 ratio. The bagged tree ensemble classifier was used for the analysis, and five-fold cross-validation was applied. The study found that using all features during training resulted in a 92.3% training accuracy and a 99.5% testing accuracy. However, applying optimization techniques such as feature selection, parameter tuning, and a maximum number of splits improved model performance. Feature selection optimized the accuracy performance by 0.2%, while parameter tuning improved the test accuracy by 0.2%. Moreover, decreasing the maximum number of splits from 1322 to 800 or 600 resulted in an optimized model with 0.1% higher validation accuracy. Finally, the optimized bagged tree models were evaluated using various performance metrics, including accuracy, precision, recall, and F1 score. The study found that Model 1, which used 800 maximum number of splits and 50 learners, outperformed Model 2 in terms of recall and F1 score, while Model 2, which used 600 maximum number of splits and 50 learners, had a higher precision score. The study concludes that optimization techniques can significantly improve the performance of the bagged tree in predicting diabetes prevalence in women.

Additional Metadata

[error in script]
Item Type: Article
AGROVOC Term: diabetes
AGROVOC Term: women
AGROVOC Term: body mass index
AGROVOC Term: blood pressure
AGROVOC Term: insulin
AGROVOC Term: tidal prediction
AGROVOC Term: optimization methods
AGROVOC Term: training
AGROVOC Term: machine learning
AGROVOC Term: accuracy
Geographical Term: Philippines
Uncontrolled Keywords: Bagged trees, diabetes prevalence, ensemble classifier, feature selection, model optimization, parameter tuning
Depositing User: Ms. Azariah Hashim
Date Deposited: 23 Apr 2026 01:34
Last Modified: 23 Apr 2026 01:34
URI: http://webagris.upm.edu.my/id/eprint/3020

Actions (login required)

View Item View Item