Optimized KNN with Domain-Informed Features and LIME Explainability for Improved Breast Cancer Classification
收藏Mendeley Data2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/x837bngczn/1
下载链接
链接失效反馈官方服务:
资源简介:
Breast cancer remains one of the most common causes of mortality among women with more than 2.3 million new cases and 670,000 deaths reported globally since 2022. Early detection and accurate diagnosis significantly improve survival rate; conventional diagnostic methods remain time consuming and subjective. Machine learning offers a promising alternative solution, but many studies either lack systematic hyperparameter tuning or limited to address model generalization. This study employed the Breast Cancer Wisconsin (Diagnostic) Dataset, consisting of 569 samples with 32 features, to develop an optimized K-Nearest Neighbour (KNN) framework. The methodology integrated rigorous preprocessing, biologically informed feature engineering, hybrid feature selection and hyperparameter tuning via GridSearchCV. Furthermore, an ensemble KNN model using soft voting was introduced to improve accuracy. The optimized KNN and ensemble models both achieved an accuracy of 98.21%, outperform the baseline model’s achieved accuracy of 96.49%. These results confirm the clinical potential of KNN in diagnostic tasks. However, the limited dataset restricts broader applicability, and future work will explore hybrid ensembles and validation on larger clinical datasets
提供机构:
King Faisal University; King Faisal University College of Medicine



