# Known Limitations This page documents current limits of TrustLens so users can interpret outputs correctly. ## Scope TrustLens currently targets classification reliability workflows. Regression support is not a first-class path in the core analysis pipeline. ## Probability Dependency Calibration and several failure diagnostics require valid probability outputs. - If your model has no `predict_proba`, you must provide `y_prob` manually to access full diagnostics. - **Degraded Mode**: TrustLens v0.4.0 now allows running without probabilities. In this case, confidence-based metrics (Calibration, ECE) are skipped, and the report is labeled as "Degraded". - Low-quality probability estimates reduce the quality of trust conclusions. ## Dataset Size Effects Small validation sets can make calibration and subgroup diagnostics unstable. - Very small sample sizes may produce noisy ECE and subgroup gap values. - Fairness metrics should be interpreted with caution when subgroup counts are low. ## Fairness Constraints Current equalized-odds logic assumes a binary target and meaningful subgroup diversity. - If conditions are not met, equalized-odds analysis is skipped. - Skipped fairness outputs should not be treated as evidence of fairness. ## Representation Constraints Representation analysis is optional and depends on embedding quality. - No embeddings means no representation sub-score. - Poorly aligned embeddings can mislead separability interpretation. ## Threshold and Penalty Design Some trust-score thresholds and penalty boundaries are expert-designed heuristics. - They are practical defaults, not universal constants. - Domain-specific validation is recommended before using hard release gates. ## Not a Causal Fairness Auditor TrustLens surfaces statistical disparities. It does not prove causality or policy compliance by itself. - Human review and domain policy checks are still required. - Regulatory and legal conclusions should include additional evidence. ## Recommended Mitigations - Pair score-based gating with manual review for high-impact applications. - Validate thresholds on your own datasets before strict automation. - Track score behavior over time instead of relying on one run. - Preserve full report artifacts for auditability.