Bias and Fairness Metrics

Bias and fairness metrics expose subgroup disparities so teams can evaluate equity risk before release.

Why This Matters

A model can look strong in aggregate while underperforming for specific groups. Fairness diagnostics make those gaps visible and actionable.

When to Use

  • when model decisions affect people across demographic or policy-sensitive segments

  • when governance requires subgroup performance reporting

  • when release policy includes fairness checks

Inputs and Assumptions

  • y_true: ground-truth labels

  • y_pred: predicted labels

  • sensitive_features: dictionary of aligned subgroup arrays

  • equalized-odds analysis assumes binary target labels

Output and Interpretation

Key outputs include:

  • Class imbalance report: class distribution risk context

  • Subgroup performance: per-group metrics and gap summaries

  • Equalized odds summary: TPR/FPR disparity severity across groups

Large subgroup gaps or severe equalized-odds violations should be treated as release blockers in high-impact domains.

Visualization

TrustLens provides comprehensive visualization modes via report.plot_bias(mode="...") to help interpret fairness diagnostics:

  • "summary" (Default): Combines key fairness signals into a single diagnostic view.

  • "subgroup": Detailed performance metric comparison (e.g., accuracy, precision) across all groups.

  • "equalized_odds": Visualizes TPR and FPR side-by-side to identify specific types of disparity.

  • "gap": High-level summary of the maximum demographic parity or opportunity gaps.

  • "all": Generates and returns all three diagnostic plots for a full audit.

These visualizations ensure that fairness gaps are not just calculated but are immediately visible and actionable.

# Generate all diagnostic plots for a full audit
plots = report.plot_bias(mode="all")

Multi-Feature Fairness Visualization

When multiple sensitive features are provided, TrustLens generates per-feature plots for every visualization type — no feature is silently dropped.

Usage

Pass multiple sensitive features to analyze():

from trustlens import analyze

report = analyze(
    model, X_test, y_test,
    sensitive_features={
        "gender": gender_array,
        "age_group": age_array,
        "income level": income_array,
    },
)

Generating Per-Feature Plots

Option 1 — Multi wrappers (direct):

from trustlens.visualization.fairness import (
    plot_subgroup_performance_multi,
    plot_equalized_odds_multi,
    plot_fairness_gap_multi,
)

bias_data = report.results["bias"]

figs = plot_subgroup_performance_multi(
    bias_data["subgroup_performance"], save_dir="plots/", show=False
)
# → plots/subgroup_performance_age_group.png
# → plots/subgroup_performance_gender.png
# → plots/subgroup_performance_income_level.png

Option 2 — Orchestrated via plot_module (recommended for batch saving):

from trustlens.visualization import plot_module

plot_module("bias", report.results["bias"], save_dir="plots/")

When class_imbalance is present, this saves a single bias_plot.png. When only fairness metrics are present, it saves per-feature files with standardized names:

plots/
├── bias_subgroup_age_group.png
├── bias_subgroup_gender.png
├── bias_subgroup_income_level.png
├── bias_equalized_odds_age_group.png
├── bias_equalized_odds_gender.png
├── bias_equalized_odds_income_level.png
├── bias_fairness_gap_age_group.png
├── bias_fairness_gap_gender.png
└── bias_fairness_gap_income_level.png

Filename Safety

Feature names with spaces or special characters are automatically sanitized:

Feature Name

Filename Component

gender

gender

age_group

age_group

income level

income_level

race/ethnicity

race_ethnicity

Design Notes

  • Features are processed in sorted order for deterministic output.

  • plot_module is the sole owner of file saving — no duplicate writes.

  • All returned figures are independent and can be saved or displayed individually.

Limitations and Caveats

  • fairness metrics are sensitive to subgroup sample size

  • skipped equalized-odds checks are input constraints, not fairness clearance

  • outputs are statistical diagnostics, not causal proof

API Reference

trustlens.metrics.bias.

Bias and fairness detection.

Bias in ML manifests as systematically worse performance for certain subgroups (demographic, geographic, temporal, etc.). TrustLens surfaces these disparities without making causal claims — the responsibility to act lies with the practitioner.

Metrics implemented

  • class_imbalance_report — distribution statistics for label classes.

  • subgroup_performance — per-subgroup accuracy/F1 breakdown.

trustlens.metrics.bias.class_imbalance_report(y_true: ndarray) dict[source]

Summarize the class distribution in y_true.

Reports absolute counts, relative frequencies, and an imbalance ratio (majority class count / minority class count).

Parameters:

y_true (np.ndarray) – Ground-truth labels.

Returns:

  • class_counts — dict mapping class → sample count

  • class_frequencies — dict mapping class → relative frequency

  • imbalance_ratio — max_count / min_count (1.0 = perfectly balanced)

  • minority_class — class with fewest samples

  • majority_class — class with most samples

Return type:

dict with keys

Examples

>>> report = class_imbalance_report(y_true)
>>> print(f"Imbalance ratio: {report['imbalance_ratio']:.2f}x")
trustlens.metrics.bias.subgroup_performance(y_true: ndarray, y_pred: ndarray, sensitive_features: dict[str, ndarray], metrics: list[str] | None = None) dict[source]

Compute model performance broken down by sensitive subgroups.

For each feature in sensitive_features, TrustLens computes per-group accuracy and macro-F1 scores, then derives the performance gap between best and worst performing groups.

Parameters:
  • y_true (np.ndarray) – Ground-truth labels.

  • y_pred (np.ndarray) – Model predictions.

  • sensitive_features (dict) – Mapping of feature name → 1-D array of group labels. Example: {"gender": gender_array}.

  • metrics (list[str], optional) – Which metrics to compute. Supports "accuracy" and "f1". Default: ["accuracy", "f1"].

Returns:

Nested dict: feature → group → metric values + summary.

Return type:

dict

Examples

>>> results = subgroup_performance(
...   y_true, y_pred,
...   sensitive_features={"gender": gender_array},
... )
>>> print(results["gender"]["performance_gap"])
trustlens.metrics.bias.equalized_odds(y_true: ndarray, y_pred: ndarray, sensitive_features: dict[str, ndarray], severe_threshold: float = 0.15, moderate_threshold: float = 0.05) dict[source]

Compute Equalized Odds fairness metrics broken down by sensitive subgroups.

Equalized Odds requires that TPR (True Positive Rate) and FPR (False Positive Rate) are equal across all subgroups. Large gaps indicate that the model treats certain groups systematically differently.

Reference: Hardt et al., “Equality of Opportunity in Supervised Learning”, NeurIPS 2016.

Parameters:
  • y_true (np.ndarray) – Ground-truth binary labels (0 or 1).

  • y_pred (np.ndarray) – Model predictions (binary, 0 or 1).

  • sensitive_features (dict) – Mapping of feature name → 1-D array of group labels. Example: {"gender": gender_array}.

  • severe_threshold (float, optional) – Gap above which a violation is classified as "severe". Default: 0.15.

  • moderate_threshold (float, optional) – Gap above which a violation is classified as "moderate". Must be less than severe_threshold. Default: 0.05.

Returns:

Nested dict: feature → group → metric values + __summary__.

Per-group keys:
  • n_samples — number of samples in the group

  • tpr — True Positive Rate (recall)

  • fpr — False Positive Rate (FP / (FP + TN))

Summary keys (under __summary__):
  • tpr_gap — max(tpr) - min(tpr) across groups

  • fpr_gap — max(fpr) - min(fpr) across groups

  • tpr_violation — severity of TPR gap

  • fpr_violation — severity of FPR gap

  • best_tpr_group — group with highest TPR

  • worst_tpr_group — group with lowest TPR

Violation levels:
  • "severe" — gap > severe_threshold (default 0.15)

  • "moderate" — gap between moderate_threshold and severe_threshold

  • "acceptable" — gap < moderate_threshold (default 0.05)

Return type:

dict

Raises:
  • ValueError – If y_true and y_pred have different lengths.

  • ValueError – If any sensitive_features array has a different length than y_true.

  • ValueError – If moderate_threshold >= severe_threshold.

  • ValueError – If y_true or y_pred is empty.

Examples

>>> import numpy as np
>>> from trustlens.metrics.bias import equalized_odds
>>>
>>> y_true = np.array([1, 1, 0, 0, 1, 1, 0, 0])
>>> y_pred = np.array([1, 0, 0, 0, 1, 1, 1, 0])
>>> gender  = np.array([0, 0, 0, 0, 1, 1, 1, 1])
>>>
>>> results = equalized_odds(y_true, y_pred, {"gender": gender})
>>> results["gender"]["0"]
{'n_samples': 4, 'tpr': 0.5, 'fpr': 0.0}
>>> results["gender"]["1"]
{'n_samples': 4, 'tpr': 1.0, 'fpr': 0.5}
>>> results["gender"]["__summary__"]
{'tpr_gap': 0.5, 'fpr_gap': 0.5, 'tpr_violation': 'severe', 'fpr_violation': 'severe', 'best_tpr_group': '1', 'worst_tpr_group': '0'}