TrustLens Design Principles¶

These principles guide every technical decision in TrustLens. New contributors should read this before writing any code.

1. Simplicity > Complexity¶

A correct library that is never used has zero impact.

TrustLens competes with the user’s time. If the API is confusing, they will write a 10-line sklearn script instead. Every function must be usable without reading the docs.

In practice:

The primary API is a single function: analyze(). No configuration classes, no session objects, no builders.
Default parameters should produce a useful result for 80% of users.
Errors must be actionable: “y_prob is required when model does not expose predict_proba()” not “ValueError: array mismatch.”
We prefer explicit over implicit — pass arrays in, get numbers out.

What this rules out: Overengineered abstractions, “framework” patterns that require users to learn TrustLens-specific concepts before getting results.

2. Modular by Design¶

Every feature is independently importable.

Users who only want Brier Score should not pay the import cost of Grad-CAM. Researchers who want only CKA should not need scikit-learn installed.

In practice:

Each analysis area (calibration, failure, bias, explainability, representation) lives in its own module.
Modules have no circular imports.
Deep learning dependencies (torch, shap, captum) are optional extras.
The plugin system ensures new capabilities extend — not couple — the core.

What this rules out: Monolithic files, cross-module imports that create hidden dependencies, mandatory heavy dependencies for lightweight features.

3. Visual-First Outputs¶

Numbers without context mislead. Visuals reveal structure.

An ECE of 0.042 means nothing until you see the reliability diagram and notice the model is overconfident specifically for high-confidence predictions.

In practice:

Every metric has a corresponding visualization in trustlens.visualization.
Plots are annotated with metric values so they are self-contained.
Each visualization is designed to answer a specific question (e.g., “Is my model overconfident?”), not just display data.
Plots can be saved as publication-quality PNGs (150 DPI minimum).
Dark mode support and accessible color palettes are roadmap items.

What this rules out: Raw number dumps with no visual companion, non-informative plots that don’t annotate the metric they’re meant to convey.

4. Research + Practical Balance¶

A library used only in papers or only in production is half a library.

TrustLens sits at the intersection: rigorous enough for researchers (proper citations, correct math, edge-case handling), accessible enough for practitioners (sklearn API, sensible defaults, fast runtimes).

In practice:

Every metric links to its original paper in the module docstring.
Mathematical formulas are included in NumPy-style docstrings using LaTeX notation.
Metrics handle edge cases (empty bins, single-class inputs, NaN propagation) without crashing.
Performance matters: large datasets use subsampling with documented behavior.

What this rules out: Metrics that are implemented incorrectly for the sake of simplicity, or metrics so theoretically correct they’re unusably slow.

5. Extensibility Without Fragility¶

New capabilities should not break existing ones.

As TrustLens grows, new metrics, visualizations, and integrations must not require touching core files.

In practice:

The plugin system is the primary extension mechanism.
analyze() dispatches to modules by name string — adding a new module never changes the function signature.
Visualization functions accept dicts (not TrustReport objects) — they are decoupled from the report class.
Backward compatibility is maintained within a major version.

What this rules out: Tight coupling between modules, hardcoded module lists that must be updated on every addition, breaking API changes without a deprecation cycle.

6. Test Everything¶

Code without tests is a liability, not an asset.

TrustLens is a trust tool — it must itself be trustworthy.

In practice:

Minimum 80% branch coverage for all modules.
All metrics must have at least one test for the “perfect predictor” case and one for “random predictor.”
Edge cases (empty input, single class, NaN, infinite values) are explicitly tested.
Integration tests verify the full analyze() → TrustReport → save() pipeline.

What this rules out: Merging untested code, skipping tests for “obviously correct” utility functions.

7. Documentation as a Feature¶

If it isn’t documented, it doesn’t exist.

In practice:

Every public function has a complete NumPy-style docstring.
Every module has a module-level docstring explaining purpose, metrics implemented, and references.
The examples directory contains runnable, realistic scripts (no toy np.random examples as primary usage).
The README is updated with every public API change.

8. Performance is Not Optional¶

A tool that takes 10 minutes to run won’t be run.

In practice:

Silhouette score uses subsampling for n > 5000.
Faithfulness tests support configurable n_steps to trade resolution for speed.
Plots use matplotlib’s Agg backend (no display required) for CI/server compatibility.
Expensive operations are lazy (only run when the corresponding module is requested).