Breast Cancer Classification using Logistic Regression By Jeffrey C. Ulatan
收藏DataONE2025-12-07 更新2025-12-13 收录
下载链接:
https://search.dataone.org/view/sha256:ac3b74484d62729ef77b51f50bf22b2a97e825f438ef1603e528f71885a2297b
下载链接
链接失效反馈官方服务:
资源简介:
Overview In this project, I build a **classification model** that predicts whether a breast tumor is **benign or malignant** based on numeric features extracted from cell nuclei. I use the **Breast Cancer Wisconsin dataset** available in `scikit-learn`. The machine learning workflow includes: - Loading and exploring the dataset - Splitting the data into training and test sets - Scaling the features - Training a Logistic Regression classifier - Evaluating the model with several metrics (accuracy, precision, recall, F1-score, confusion matrix, and ROC curve) - Discussing the implications and limitations of the model Implications and Limitations **Implications** #- This model could be used as a **decision support tool** to help doctors quickly flag potentially malignant tumors. #- High recall for the malignant class is particularly important, because missing a cancer diagnosis can have severe consequences. #- The model performs very well on this dataset, suggesting that the features are informative for predicting malignancy. **Limitations** #- The model is trained and evaluated on a single dataset. In real-world practice, performance may differ on new populations or different imaging equipment. #- I used only one algorithm (Logistic Regression) without extensive **hyperparameter tuning**. Other models (e.g., Random Forests, Support Vector Machines, Gradient Boosting) might perform even better. #- The dataset comes pre-cleaned in `scikit-learn`, so the project does not cover issues like missing data, outliers, or feature engineering in depth. **Possible Improvements** #- Compare multiple models (e.g., Logistic Regression vs. Random Forest). #- Use cross-validation for more robust performance estimates. #- Perform feature importance analysis or model interpretation techniques to understand which features are most influential.
创建时间:
2025-12-10



