Back to Projects
ML/DL

Breast Cancer Classification: Comparative ML Analysis

Compared 7 ML models for breast cancer detection using clinical biomarkers, achieving 87% accuracy with Random Forest (AUC: 0.91).

2025 ML/DL
Breast Cancer Classification: Comparative ML Analysis

About This Project

Built and compared 7 supervised ML models to classify breast cancer using routine blood biomarkers and anthropometric data (age, BMI) from the Breast Cancer Coimbra Dataset (116 women; 64 cancer, 52 control). Models evaluated: Naive Bayes, LDA, KNN, Random Forest, Gradient Boosting, SVM, and Deep Neural Network. Applied z-normalization, factor encoding, stratified train-test splits, and repeated 10-fold cross-validation for robust evaluation. Performed hyperparameter tuning including KNN-k sweep and RF mtry optimization. Key insight: Glucose, Resistin, and Adiponectin were top predictors — showing non-invasive tests can assist early cancer detection.

Key Features

  • Compared 7 models: Naive Bayes, LDA, KNN, RF, GBM, SVM, DNN
  • Best result: Random Forest — Accuracy 87%, F1 0.88, AUC 0.91
  • Sensitivity 85%, Specificity 90% — strong clinical balance
  • Repeated 10-fold cross-validation with stratified splits
  • Hyperparameter tuning: KNN-k sweep, RF mtry optimization
  • Top predictors: Glucose, Resistin, Adiponectin

More Projects