Breast Cancer
Diagnostic Support
System
Transforming FNA cytology data into clinically accountable, explainable AI decisions — 98.2% accuracy with full SHAP transparency.
Transforming FNA cytology data into clinically accountable, explainable AI decisions — 98.2% accuracy with full SHAP transparency.
Breast cancer is the most commonly diagnosed cancer in women globally. Early, accurate classification of a tumor as benign or malignant is the pivotal moment in a patient's care pathway — catching malignancy early dramatically improves survival outcomes while avoiding unnecessary surgical interventions for benign cases.
This project delivers a production-grade clinical decision support system that classifies breast tumors in real-time from 30 nuclear morphology measurements derived from Fine Needle Aspiration (FNA) cytology.
The system wraps its prediction inside a fully transparent, explainable, and auditable platform — making it suitable for medical researchers and clinical informaticians who need to trust and interpret every decision.
Design philosophy: A prediction is only clinically useful if the clinician can understand, challenge, and reproduce the reasoning behind it. Accuracy alone is insufficient — transparency is mandatory.
Pathologists face a recurring challenge: manual interpretation of FNA cytology is subjective, time-costly, and prone to inter-observer variability. Existing AI tools compound this with black-box outputs and zero operational transparency.
Problem Statement: How can we build an AI system that not only classifies breast tumors with near-perfect accuracy, but is also transparent, operationally monitored, clinically actionable, and ethically documented — such that a clinician can trust and use it with confidence?
| Pain Point | Legacy Tools | This System |
|---|---|---|
| Black-box predictions | ✗ No explanation for the decision | ✓ SHAP feature attribution per prediction |
| Fixed threshold | ✗ 0.50 default, ignores clinical stakes | ✓ Adjustable live threshold slider |
| No drift monitoring | ✗ Silent model degradation | ✓ Z-score drift detection per feature |
| No audit trail | ✗ No persistent prediction log | ✓ Timestamped CSV audit log |
| No PDF report | ✗ Manual copy-paste documentation | ✓ Auto-generated structured PDF |
| No robustness check | ✗ Single-point probability estimate | ✓ Tree-level variance + 95% CI |
| No research utility | ✗ Static production tool only | ✓ Synthetic data + PCA research lab |
The Wisconsin Diagnostic Breast Cancer (WDBC) dataset represents digitized nuclear morphology measurements from Fine Needle Aspiration imaging, sourced from the UCI Machine Learning Repository.
A modular, 11-module utility layer feeds a Streamlit multi-tab frontend. Each component is independently testable and swappable — enabling seamless model upgrades, feature additions, and compliance audits.
The Random Forest ensemble achieves near-perfect discrimination, validated across all standard clinical AI metrics on a held-out stratified 20% test set.
Note: SHAP values are approximate illustrative representations from ensemble analysis.
The system redefines the clinical workflow from manual, subjective, and time-intensive to real-time, explainable, and audit-ready — across every dimension that matters to clinicians and institutions.
| Impact Dimension | Without AI | With This System |
|---|---|---|
| Time to classification | Hours to days (manual) | Seconds (real-time) |
| Sensitivity at default threshold | ~85–90% (manual cytology) | ~97% at T=0.50 |
| Explainability | Subjective operator judgment | SHAP-backed, quantified attribution |
| Documentation time | Manual write-up | Auto-generated PDF in one click |
| Model degradation visibility | None — silent failure | Z-score drift alerts per feature |
| Second opinion access | Requires specialist availability | Instant AI second opinion |
| Research feasibility | Requires new patient data | Gaussian synthetic generation |
Clinical AI demands more than accuracy. This system ships with a structured Model Card covering intended use, known limitations, fairness disclosures, and explicit accountability documentation.
Building a clinical AI tool surfaces priorities that pure ML benchmarking obscures. These are the five most important architectural and design insights from this project.
The current system provides a robust, production-ready foundation. The following enhancements would extend its clinical utility, fairness coverage, and institutional applicability.
| Enhancement | Rationale |
|---|---|
| Multi-class support | Extension to BIRADS multi-category classification (Benign, Prob. Benign, Suspicious, Malignant) — providing more granular clinical mapping. |
| EHR Integration (FHIR/HL7) | Direct integration with hospital electronic health records — enabling seamless clinical orders and result ingestion. |
| Demographic Stratification | Benchmarking across larger, contemporary multi-ethnic datasets — improving the system's global reliability and fairness validation. |
| Federated Learning | Multi-institution model training without data sharing — enabling broader datasets while preserving patient privacy. |
| Model Versioning (MLflow) | Production-grade experiment tracking and automatic model retraining pipelines — ensuring the AI remains state-of-the-art. |
| LIME Comparison | Adding LIME explanations alongside SHAP — providing clinicians with multiple cross-validated perspectives on AI reasoning. |