System Architecture

TrustLens is designed as a layered pipeline so diagnostic computation, scoring logic, and report interpretation remain decoupled and maintainable.

Why This Design Matters

The architecture aims to solve three practical needs:

  • stable and testable metric computation

  • explicit decision logic with traceable rules

  • clear extension points for contributors

High-Level Data Flow

This diagram shows how inputs move from orchestration through framework-specific resolvers to the agnostic metrics pipeline.

        graph TD
    Input[Input: model, X, y_true, y_prob?, sensitive_features?] --> API[api.analyze Orchestration]
    API --> Registry[Backend Registry]

    subgraph "Prediction Resolution Layer"
        Registry --> Detect[Framework Detection]
        Detect --> Resolver[Framework Resolver: Sklearn/XGBoost]
        Resolver --> Bundle[PredictionBundle: y_pred, y_prob, metadata]
    end

    Bundle --> Pipeline[Core Pipeline: Agnostic]

    subgraph "Agnostic Metrics Engine"
        Pipeline --> Calib[Calibration: Brier, ECE]
        Pipeline --> Fail[Failure: Confidence Gap]
        Pipeline --> Bias[Bias: Subgroup, Equalized Odds]
        Pipeline --> Rep[Representation: Silhouette]
    end

    Calib --> Results[Results Dict]
    Fail --> Results
    Bias --> Results
    Rep --> Results

    Results --> Scoring["trust_score.py: base_score vs Penalties"]
    Scoring --> Report[TrustReport: Narrative & Interpretation]
    Report --> Output[Console / Plots / Saved Files]
    

Implementation note: analyze() uses the Backend Registry to automatically detect the framework and resolve predictions into a standardized PredictionBundle. This ensures the core metrics pipeline remains 100% framework-agnostic.

Component Interactions

The system is decoupled into four primary layers.

        graph LR
    API["api.py: Orchestration"] -- "model" --> Backends["backends/: Framework Resolvers"]
    Backends -- "PredictionBundle" --> Pipeline["core/pipeline.py: Execution Engine"]
    Pipeline -- "dict: results" --> Report["report.py: TrustReport"]
    Report -- "dict: results" --> Score["trust_score.py: Scoring Engine"]
    Score -- "TrustScoreResult" --> Report
    

Data Contracts

  1. API to Backends: Model reference, feature matrix X, and optional manual overrides (y_pred, y_prob).

  2. Backends to Pipeline: PredictionBundle containing standardized numpy arrays, framework identifier, and audit metadata.

  3. Pipeline to Metrics: Normalized arrays (y_true, y_pred, y_prob) and optional metadata.

  4. Pipeline to Report: Consolidated results plus audit provenance (framework version, resolver details).

  5. Report to Scorer: Score computation from results including penalties and blockers.

Execution Sequence

This sequence shows one analyze() call from input to TrustReport.

        sequenceDiagram
    participant User
    participant API as api.analyze
    participant Registry as Backend Registry
    participant Backend as backends/
    participant Pipeline as core.pipeline
    participant Report as TrustReport

    User->>API: model, X, y_true
    API->>Registry: get_resolver(model)
    Registry-->>API: Resolver function
    API->>Backend: resolve(model, X)
    Backend-->>API: PredictionBundle
    API->>Pipeline: _run_analysis_pipeline(Bundle)
    Pipeline->>Pipeline: Dispatch Agnostic Modules
    Pipeline->>Report: Initialize(results, Bundle.metadata)
    Report-->>User: TrustReport Object
    

Layer Responsibilities

Orchestration Layer (api.py)

  • Coordinates framework detection and prediction resolution.

  • Validates high-level input shapes and data consistency.

  • Delegates to the framework-agnostic core pipeline.

Framework Resolvers (backends/)

  • Handles framework-specific prediction logic (e.g., predict_proba).

  • Normalizes output shapes (e.g., binary (n,)(n, 2)).

  • Captures audit metadata (framework version, model type).

  • Blocks unsupported tasks (e.g., regression or ranking).

Core Pipeline (trustlens/core/pipeline.py)

  • Framework-agnostic execution engine.

  • Dispatches enabled metrics modules.

  • Manages progress tracking and resource isolation.

Metrics Layer (trustlens/metrics/)

  • Pure math/stat functions for diagnostics.

  • Independent of models; operates entirely on labels and probabilities.

Reporting Layer (trustlens/report.py)

  • Packages diagnostic outputs with full backend provenance.

  • Generates textual and visual summaries.

  • Supports unified JSON serialization for experiment trackers.

Comparison Layer (comparison.py)

  • compares candidate TrustReport objects

  • filters blocked candidates

  • provides deployment recommendation rationale

Extensibility Path

To add a new metric module:

  1. implement the metric under trustlens/metrics/

  2. wire dispatch logic in analyze()

  3. ensure return format follows existing results conventions

  4. add tests and documentation

  5. optionally integrate into trust score weighting

Architectural Constraints

  • optimized for classification evaluation

  • fairness checks are data- and task-dependent

  • representation diagnostics require embeddings

  • some threshold logic is currently heuristic by design