Cardiovascular diseases remain the leading cause of global mortality. We present CardioSense AI — a state-of-the-art system that integrates optimized XGBoost with multi-modal interpretability layers (SHAP + LIME), ACC/AHA safety guardrails, and a novel Least Effort Path (LEP) optimization algorithm. High performance and full interpretability are not mutually exclusive.
Cardiovascular medicine is inherently data-rich. Traditional risk calculators — Framingham Score, ASCVD Estimator — rely on linear assumptions that fail to capture high-dimensional non-linear dependencies. AI offers a solution, but deployment is hampered by the "Black Box" problem.
"A High Risk notification from a model, without a supporting clinical rationale, is often viewed with scepticism. The clinician is ethically and legally responsible for every diagnosis they provide." A passive model that provides a risk score without explanation — and without intervention guidance — is clinically inert.
CardioSense AI is trained and validated on the internationally recognized UCI Cleveland Heart Disease dataset — 303 patient records, 13 clinical features, binary cardiovascular disease target. VIF analysis confirms all features exhibit VIF < 2.5 (low multicollinearity).
| Feature | Description | Clinical Significance | Range / Type |
|---|---|---|---|
| age | Patient age in years | Primary risk factor for vascular decay | 29–77 yrs |
| sex | Biological sex | Biological variance in coronary anatomy | 0: Female, 1: Male |
| cp | Chest pain type (1–4) | Qualitative indicator of ischemic stress | Categorical |
| trestbps | Resting systolic BP | Hemodynamic marker of vascular pressure | 94–200 mmHg |
| chol | Serum cholesterol | Risk factor for lipid-driven plaque formation | 126–564 mg/dl |
| fbs | Fasting blood sugar > 120mg/dl | Metabolic indicator of diabetic risk | 0/1 Boolean |
| restecg | Resting ECG results | Electric signal evidence of hypertrophy/ischemia | 0, 1, 2 |
| thalach | Maximum heart rate achieved | Marker of cardiac reserve and fitness | 71–202 bpm |
| exang | Exercise induced angina | Direct evidence of coronary insufficiency | 0/1 Boolean |
| oldpeak | ST depression via exercise | Metric for myocardial repolarization delay | 0.0–6.2 |
| slope | Peak exercise ST slope | Clinical indicator of ischemia severity | Upsloping/Flat/Down |
| ca | Number of major vessels (0–3) | Structural marker of coronary calcification | 0–3 |
| thal | Thalassemia score | Genetic/structural marker of blood flow | 3 / 6 / 7 |
CardioSense AI's clinical intelligence rests on four mathematical pillars: a robust preprocessing pipeline, the XGBoost objective, Bayesian hyperparameter optimization, and sigmoid probability calibration.
Numerical vitals
\((x_{\text{num}} \in \{\text{age, trestbps, chol, thalach, oldpeak}\})\) are standardised via Z-score
normalisation. Categorical features are transformed via One-Hot Encoding. Parameters are fitted exclusively on
training data and persisted in preprocessor.joblib
to eliminate training-serving skew.
XGBoost optimises a second-order Taylor expansion of the loss function, enabling rapid convergence on the \(N=303\) clinical dataset while controlling overfitting through explicit regularisation of tree structure.
Unlike grid or random search, Optuna uses a Tree-structured Parzen Estimator (TPE) to intelligently navigate the hyperparameter space. We executed 50 trials with 5-fold Stratified Cross-Validation.
The scale_pos_weight is automatically set as
\(N_{\text{neg}}/N_{\text{pos}}\) to address inherent class imbalance in cardiac datasets.
Raw XGBoost
probabilities are pushed away from 0 and 1 due to the boosting process. We apply Sigmoid
Calibration via CalibratedClassifierCV.
A predicted 20% risk should correspond to an actual 20% frequency in the clinical population.
A dual-engine interpretability layer ensures every prediction is explainable from multiple mathematical perspectives — global consistency via SHAP and local sensitivity via LIME. Neither technique alone is sufficient for clinical trust.
In medical AI, "Black Box" models are clinically unusable. CardioSense AI implements six interlocking layers of trust — from hard-stop clinical overrides to cryptographic audit hashes.
usedforsecurity=False flagged for audit
compliance). This allows clinicians to verify that the decision support engine has not been altered since
its last validated training run.The most significant innovation in CardioSense AI — moving from passive prediction to active intervention planning. The LEP algorithm identifies the minimum patient effort required to reach a clinician-set target risk level.
Validated using a Hold-Out Test Set (20%) and Stratified 5-Fold Cross-Validation during optimization. Metrics represent the system's state after Sigmoid Calibration and Target-Enriched Optuna optimization.
| Metric | Score | Professional Interpretation |
|---|---|---|
| Model Version | v2.4.0 | Professional Optuna-calibrated clinical ensemble |
| Clinical Accuracy | 88.52% | High fidelity across all diagnostic classes |
| ROC-AUC Score | 0.9621 | Exceptional class discrimination power |
| PR-AUC Score | 0.9553 | Precise performance in unbalanced medical sets |
| Recall (Sensitivity) | 92.86% | Critical safety metric — minimising false negatives |
| F1-Score | 0.8814 | Robust harmonic balance of precision and recall |
| Brier Score | 0.0814 | Strong probability calibration (Platt Scaling verified) |
| Test Coverage | 63.00% | 40 verified clinical scenarios across core logic |
| Security Audit | 100% Pass | Bandit (SAST) & Safety (SCA) verified release |
| Data Drift | Monitored | Adaptive Evidently AI K-S monitoring gateway |
Clinical Priority: We prioritize Recall (Sensitivity) in senior and female populations to ensure no high-risk patient is "missed" due to algorithmic bias. The system maintains 100% Recall for the Senior (≥65) cohort — the highest baseline risk population.
| Demographic Group | N | Accuracy | Recall (Sensitivity) | F1-Score |
|---|---|---|---|---|
| Gender: Female | 20 | 95.00% | 85.71% | 92.31% |
| Gender: Male | 41 | 87.80% | 95.24% | 88.89% |
| Age: Young (<45) | 13 | 100.0% | 100.0% | 100.0% |
| Age: Middle (45–64) | 42 | 90.48% | 90.91% | 90.91% |
| Age: Senior (≥65) ★ | 6 | 66.67% | 100.0% ★ | 75.00% |
★ Recall of 100% for the Senior (≥65) population is clinically vital. The accuracy dip is explained by small sample size (N=6) and deliberate prioritisation of sensitivity over specificity in the highest-risk cohort.
A four-layer decoupled architecture for maximum scalability and auditability. Each layer is independently testable and integrates via well-defined interfaces.
CardioSense AI exposes a production-grade FastAPI REST interface for seamless integration with Electronic Health Record (EHR) systems. Every request receives a unique X-Request-ID for full clinical audit traceability.
| Method | Endpoint | Description |
|---|---|---|
| POST | /predict | Primary inference endpoint — submits patient vitals, returns risk probability with X-Request-ID audit trace |
| GET | /monitoring/status | Returns data drift (Evidently K-S) and concept drift (Recall Stability) summary with timestamps |
| POST | /feedback/{id} | Clinician endpoint for ground-truth outcome labeling — feeds the Concept Drift monitoring loop |
| GET | /health | System health check — returns model version, uptime heartbeat, and artifact load status |
| GET | /docs | Interactive Swagger UI for live endpoint testing and documentation |
A medical CDSS must remain accurate as the underlying patient population evolves. CardioSense AI integrates adaptive monitoring that detects both distributional drift and predictive performance decay in real time.
CardioSense AI demonstrates that the perceived tradeoff between accuracy and interpretability is a false dichotomy. Post-hoc attribution (SHAP) alongside a high-capacity model (XGBoost) achieves state-of-the-art accuracy with full clinical transparency.
§11.1 Interpretability-Accuracy Tradeoff: By using SHAP as a post-hoc attribution layer over XGBoost, we achieve clinical transparency without sacrificing predictive capacity. The LIME sensitivity analysis further ensures that boundary-case patients are flagged — not silently misclassified.