TrustLens evaluates model reliability beyond accuracy — and produces a deployment decision backed by evidence, not instinct.
Accuracy hides the danger of failures. TrustLens isolates "Confidently Wrong" samples where the model is 95%+ certain but incorrect. These are the samples most likely to bypass human review and cause production disasters.
# Sample True Pred Confidence Danger
1 234 1 0 96.7% CRITICAL
2 659 1 0 96.7% CRITICAL
3 740 1 0 95.7% CRITICAL
[Insight]: High-confidence mistakes detected.
The model is certain it is right, but it is wrong.
Overconfidence detected — consider calibration.
Accuracy is a shallow metric. TrustLens allows you to benchmark multiple models across calibration, failure risk, and bias — ensuring you ship the most reliable candidate, not just the one with the highest accuracy.
All candidates triggered critical diagnostic blocks. While Model C has the highest trust score, its fairness violations exceed safety thresholds. Recommendation: Retrain with class-weighted loss and bias mitigation.
You trained a model. It hits 92% accuracy. You ship it. Three months later — a minority-class user gets consistently wrong predictions, the model is 90% confident on its worst mistakes, and a regulator asks "why did it make that decision?" and you have no answer.
Accuracy tells you how often your model is right.
It tells you nothing about when it fails, why it fails, or who it fails. TrustLens makes those failures visible before they reach production — and turns raw diagnostics into a deployment decision.
TrustLens is zero-friction. If your model has .predict() and .predict_proba(), you're ready to go. One function call. One report.
# Install once, run anywhere
$ pip install trustlens
from trustlens import analyze
# 1. Run the audit — that's it.
report = analyze(
model,
X_test,
y_test,
y_prob = model.predict_proba(X_test),
sensitive_features = {
"gender": gender_test,
"age_group": age_test,
}
)
# 2. Inspect the verdict.
report.show()
# 3. Compare candidates.
from trustlens import compare
compare([report_a, report_b, report_c])
# 4. Export artifacts.
report.save("trust_report") # JSON+plots
report.save("report.txt") # human-readable
from trustlens import quick_analyze
quick_analyze(dataset="breast_cancer")
The Trust Score combines diagnostic modules with explicit, auditable logic. Weights, penalties, and blockers are all visible — you can trace every deduction back to a specific metric failure.
When Representation is unavailable, remaining weights are renormalized to sum to 1.0.
Each module diagnoses a different failure mode. Together they produce one deployment-oriented verdict while preserving full diagnostic detail for root-cause analysis.
Pass multiple sensitive features and TrustLens generates per-feature plots for every visualization type — no feature is silently dropped. Filenames are automatically sanitized. Features processed in sorted order for deterministic output.
# Batch save — one call, all per-feature files
from trustlens.visualization import plot_module
plot_module(
"bias",
report.results["bias"],
save_dir="plots/"
)
# Or via report object
report.plot_bias(mode="all")
Zero-friction by design. If your model has .predict(), you're already ready. No TrustLens-specific concepts to learn before getting results.
Each layer has a single responsibility. Metric computation, scoring logic, and report interpretation are decoupled — independently testable and swappable. The plugin system means new capabilities never touch core files.
New contributors read this before writing any code. These principles guide every API decision, every tradeoff, every line.
Most evaluation pipelines stop at one or two dimensions. TrustLens combines them all into a single deployment decision with traceable reasoning.
| Capability | sklearn metrics | fairlearn | TrustLens |
|---|---|---|---|
| Calibration (ECE, Brier) | ✓ partial | ✗ | ✓ Full + penalties |
| Failure / confidence gap analysis | ✗ | ✗ | ✓ Core module |
| Subgroup fairness + equalized odds | ✗ | ✓ | ✓ + multi-feature |
| Embedding representation analysis | ✗ | ✗ | ✓ Optional module |
| Composite Trust Score + verdict | ✗ | ✗ | ✓ 0–100 with grade |
| Deployment blockers | ✗ | ✗ | ✓ Hard-stop rules |
| Multi-candidate model comparison | ✗ | ✗ | ✓ compare() |
| Plugin extensibility | ✗ | ✗ | ✓ BasePlugin ABC |
| Exportable reports (JSON, TXT, plots) | ✗ | ✗ | ✓ report.save() |
| CI/CD gate integration | ✗ | ✗ | ✓ is_blocked flag |
If TrustLens contributed to your work, cite it so others can find it. Even a GitHub star helps with discovery.
# Option 1: Install and audit
$ pip install trustlens
# Option 2: Full demo
$ python demo.py
# Option 3: Comprehensive audit
$ python examples/comprehensive_audit.py