Failure Analysis Metrics¶

Failure analysis metrics identify where errors are concentrated and whether error confidence creates operational risk.

Why This Matters¶

Two models with similar error rates can have very different risk profiles. A model that is highly confident when wrong is usually harder to monitor and safer to block early.

When to Use¶

when incident cost is high for false confidence
when investigating model behavior beyond aggregate accuracy
when selecting between candidates with close top-line metrics

Inputs and Assumptions¶

y_true: ground-truth labels
y_pred: predicted labels
y_prob: predicted probabilities

Output and Interpretation¶

Key outputs include:

Misclassification summary: class-wise error context
Confidence gap: separation between confidence on correct versus incorrect predictions
High-confidence error patterns: useful for targeted inspection

A narrow or negative confidence gap is a warning signal in most operational contexts.

Limitations and Caveats¶

confidence signals depend on quality of upstream probability calibration
low error counts can make distribution-level interpretation noisy

API Reference¶

trustlens.metrics.failure.¶

Failure-mode analysis: where and how does a model fail?

Metrics implemented¶

misclassification_summary — per-class error rates and high-confidence mistakes.
confidence_gap — distribution of confidence for correct vs. incorrect predictions.

trustlens.metrics.failure.misclassification_summary(y_true: ndarray, y_pred: ndarray, y_prob: ndarray) → dict[source]¶

Build a comprehensive misclassification summary.

For each class, reports: * total support (ground truth count) * number of misclassified samples * error rate * average confidence of misclassified samples (overconfident mistakes) * indices of the most confident misclassifications

Parameters:

y_true (np.ndarray) – Ground-truth labels, shape (n_samples,).
y_pred (np.ndarray) – Model predictions, shape (n_samples,).
y_prob (np.ndarray) – Predicted probabilities, shape (n_samples,) for binary or (n_samples, n_classes) for multi-class.

Returns:

Nested dictionary keyed by class label.

Return type:

dict

Examples

>>> summary = misclassification_summary(y_true, y_pred, y_prob)
>>> print(summary[1]["error_rate"]) # error rate for class 1

trustlens.metrics.failure.confidence_gap(y_true: ndarray, y_pred: ndarray, y_prob: ndarray, n_bins: int = 20) → dict[source]¶

Measure the confidence gap — how much more confident is the model on correct predictions than on incorrect ones?