Bias and Fairness Metrics¶
Bias and fairness metrics expose subgroup disparities so teams can evaluate equity risk before release.
Why This Matters¶
A model can look strong in aggregate while underperforming for specific groups. Fairness diagnostics make those gaps visible and actionable.
When to Use¶
when model decisions affect people across demographic or policy-sensitive segments
when governance requires subgroup performance reporting
when release policy includes fairness checks
Inputs and Assumptions¶
y_true: ground-truth labelsy_pred: predicted labelssensitive_features: dictionary of aligned subgroup arraysequalized-odds analysis assumes binary target labels
Output and Interpretation¶
Key outputs include:
Class imbalance report: class distribution risk context
Subgroup performance: per-group metrics and gap summaries
Equalized odds summary: TPR/FPR disparity severity across groups
Large subgroup gaps or severe equalized-odds violations should be treated as release blockers in high-impact domains.
Visualization¶
TrustLens provides comprehensive visualization modes via report.plot_bias(mode="...") to help interpret fairness diagnostics:
"summary"(Default): Combines key fairness signals into a single diagnostic view."subgroup": Detailed performance metric comparison (e.g., accuracy, precision) across all groups."equalized_odds": Visualizes TPR and FPR side-by-side to identify specific types of disparity."gap": High-level summary of the maximum demographic parity or opportunity gaps."all": Generates and returns all three diagnostic plots for a full audit.
These visualizations ensure that fairness gaps are not just calculated but are immediately visible and actionable.
# Generate all diagnostic plots for a full audit
plots = report.plot_bias(mode="all")
Multi-Feature Fairness Visualization¶
When multiple sensitive features are provided, TrustLens generates per-feature plots for every visualization type — no feature is silently dropped.
Usage¶
Pass multiple sensitive features to analyze():
from trustlens import analyze
report = analyze(
model, X_test, y_test,
sensitive_features={
"gender": gender_array,
"age_group": age_array,
"income level": income_array,
},
)
Generating Per-Feature Plots¶
Option 1 — Multi wrappers (direct):
from trustlens.visualization.fairness import (
plot_subgroup_performance_multi,
plot_equalized_odds_multi,
plot_fairness_gap_multi,
)
bias_data = report.results["bias"]
figs = plot_subgroup_performance_multi(
bias_data["subgroup_performance"], save_dir="plots/", show=False
)
# → plots/subgroup_performance_age_group.png
# → plots/subgroup_performance_gender.png
# → plots/subgroup_performance_income_level.png
Option 2 — Orchestrated via plot_module (recommended for batch saving):
from trustlens.visualization import plot_module
plot_module("bias", report.results["bias"], save_dir="plots/")
When class_imbalance is present, this saves a single bias_plot.png.
When only fairness metrics are present, it saves per-feature files with standardized names:
plots/
├── bias_subgroup_age_group.png
├── bias_subgroup_gender.png
├── bias_subgroup_income_level.png
├── bias_equalized_odds_age_group.png
├── bias_equalized_odds_gender.png
├── bias_equalized_odds_income_level.png
├── bias_fairness_gap_age_group.png
├── bias_fairness_gap_gender.png
└── bias_fairness_gap_income_level.png
Filename Safety¶
Feature names with spaces or special characters are automatically sanitized:
Feature Name |
Filename Component |
|---|---|
|
|
|
|
|
|
|
|
Design Notes¶
Features are processed in sorted order for deterministic output.
plot_moduleis the sole owner of file saving — no duplicate writes.All returned figures are independent and can be saved or displayed individually.
Limitations and Caveats¶
fairness metrics are sensitive to subgroup sample size
skipped equalized-odds checks are input constraints, not fairness clearance
outputs are statistical diagnostics, not causal proof
API Reference¶
trustlens.metrics.bias.¶
Bias and fairness detection.
Bias in ML manifests as systematically worse performance for certain subgroups (demographic, geographic, temporal, etc.). TrustLens surfaces these disparities without making causal claims — the responsibility to act lies with the practitioner.
Metrics implemented¶
class_imbalance_report— distribution statistics for label classes.subgroup_performance— per-subgroup accuracy/F1 breakdown.
- trustlens.metrics.bias.class_imbalance_report(y_true: ndarray) dict[source]¶
Summarize the class distribution in
y_true.Reports absolute counts, relative frequencies, and an imbalance ratio (majority class count / minority class count).
- Parameters:
y_true (np.ndarray) – Ground-truth labels.
- Returns:
class_counts— dict mapping class → sample countclass_frequencies— dict mapping class → relative frequencyimbalance_ratio— max_count / min_count (1.0 = perfectly balanced)minority_class— class with fewest samplesmajority_class— class with most samples
- Return type:
dict with keys
Examples
>>> report = class_imbalance_report(y_true) >>> print(f"Imbalance ratio: {report['imbalance_ratio']:.2f}x")
- trustlens.metrics.bias.subgroup_performance(y_true: ndarray, y_pred: ndarray, sensitive_features: dict[str, ndarray], metrics: list[str] | None = None) dict[source]¶
Compute model performance broken down by sensitive subgroups.
For each feature in
sensitive_features, TrustLens computes per-group accuracy and macro-F1 scores, then derives the performance gap between best and worst performing groups.- Parameters:
y_true (np.ndarray) – Ground-truth labels.
y_pred (np.ndarray) – Model predictions.
sensitive_features (dict) – Mapping of feature name → 1-D array of group labels. Example:
{"gender": gender_array}.metrics (list[str], optional) – Which metrics to compute. Supports
"accuracy"and"f1". Default:["accuracy", "f1"].
- Returns:
Nested dict: feature → group → metric values + summary.
- Return type:
dict
Examples
>>> results = subgroup_performance( ... y_true, y_pred, ... sensitive_features={"gender": gender_array}, ... ) >>> print(results["gender"]["performance_gap"])
- trustlens.metrics.bias.equalized_odds(y_true: ndarray, y_pred: ndarray, sensitive_features: dict[str, ndarray], severe_threshold: float = 0.15, moderate_threshold: float = 0.05) dict[source]¶
Compute Equalized Odds fairness metrics broken down by sensitive subgroups.
Equalized Odds requires that TPR (True Positive Rate) and FPR (False Positive Rate) are equal across all subgroups. Large gaps indicate that the model treats certain groups systematically differently.
Reference: Hardt et al., “Equality of Opportunity in Supervised Learning”, NeurIPS 2016.
- Parameters:
y_true (np.ndarray) – Ground-truth binary labels (0 or 1).
y_pred (np.ndarray) – Model predictions (binary, 0 or 1).
sensitive_features (dict) – Mapping of feature name → 1-D array of group labels. Example:
{"gender": gender_array}.severe_threshold (float, optional) – Gap above which a violation is classified as
"severe". Default:0.15.moderate_threshold (float, optional) – Gap above which a violation is classified as
"moderate". Must be less thansevere_threshold. Default:0.05.
- Returns:
Nested dict: feature → group → metric values +
__summary__.- Per-group keys:
n_samples— number of samples in the grouptpr— True Positive Rate (recall)fpr— False Positive Rate (FP / (FP + TN))
- Summary keys (under
__summary__): tpr_gap— max(tpr) - min(tpr) across groupsfpr_gap— max(fpr) - min(fpr) across groupstpr_violation— severity of TPR gapfpr_violation— severity of FPR gapbest_tpr_group— group with highest TPRworst_tpr_group— group with lowest TPR
- Violation levels:
"severe"— gap > severe_threshold (default 0.15)"moderate"— gap between moderate_threshold and severe_threshold"acceptable"— gap < moderate_threshold (default 0.05)
- Return type:
dict
- Raises:
ValueError – If
y_trueandy_predhave different lengths.ValueError – If any
sensitive_featuresarray has a different length thany_true.ValueError – If
moderate_threshold>=severe_threshold.ValueError – If
y_trueory_predis empty.
Examples
>>> import numpy as np >>> from trustlens.metrics.bias import equalized_odds >>> >>> y_true = np.array([1, 1, 0, 0, 1, 1, 0, 0]) >>> y_pred = np.array([1, 0, 0, 0, 1, 1, 1, 0]) >>> gender = np.array([0, 0, 0, 0, 1, 1, 1, 1]) >>> >>> results = equalized_odds(y_true, y_pred, {"gender": gender}) >>> results["gender"]["0"] {'n_samples': 4, 'tpr': 0.5, 'fpr': 0.0} >>> results["gender"]["1"] {'n_samples': 4, 'tpr': 1.0, 'fpr': 0.5} >>> results["gender"]["__summary__"] {'tpr_gap': 0.5, 'fpr_gap': 0.5, 'tpr_violation': 'severe', 'fpr_violation': 'severe', 'best_tpr_group': '1', 'worst_tpr_group': '0'}