Failure Analysis Metrics

Failure analysis metrics identify where errors are concentrated and whether error confidence creates operational risk.

Why This Matters

Two models with similar error rates can have very different risk profiles. A model that is highly confident when wrong is usually harder to monitor and safer to block early.

When to Use

  • when incident cost is high for false confidence

  • when investigating model behavior beyond aggregate accuracy

  • when selecting between candidates with close top-line metrics

Inputs and Assumptions

  • y_true: ground-truth labels

  • y_pred: predicted labels

  • y_prob: predicted probabilities

Output and Interpretation

Key outputs include:

  • Misclassification summary: class-wise error context

  • Confidence gap: separation between confidence on correct versus incorrect predictions

  • High-confidence error patterns: useful for targeted inspection

A narrow or negative confidence gap is a warning signal in most operational contexts.

Limitations and Caveats

  • confidence signals depend on quality of upstream probability calibration

  • low error counts can make distribution-level interpretation noisy

API Reference

trustlens.metrics.failure.

Failure-mode analysis: where and how does a model fail?

Metrics implemented

  • misclassification_summary — per-class error rates and high-confidence mistakes.

  • confidence_gap — distribution of confidence for correct vs. incorrect predictions.

trustlens.metrics.failure.misclassification_summary(y_true: ndarray, y_pred: ndarray, y_prob: ndarray) dict[source]

Build a comprehensive misclassification summary.

For each class, reports: * total support (ground truth count) * number of misclassified samples * error rate * average confidence of misclassified samples (overconfident mistakes) * indices of the most confident misclassifications

Parameters:
  • y_true (np.ndarray) – Ground-truth labels, shape (n_samples,).

  • y_pred (np.ndarray) – Model predictions, shape (n_samples,).

  • y_prob (np.ndarray) – Predicted probabilities, shape (n_samples,) for binary or (n_samples, n_classes) for multi-class.

Returns:

Nested dictionary keyed by class label.

Return type:

dict

Examples

>>> summary = misclassification_summary(y_true, y_pred, y_prob)
>>> print(summary[1]["error_rate"]) # error rate for class 1
trustlens.metrics.failure.confidence_gap(y_true: ndarray, y_pred: ndarray, y_prob: ndarray, n_bins: int = 20) dict[source]

Measure the confidence gap — how much more confident is the model on correct predictions than on incorrect ones?

Returns:

  • correct_confidence — confidence distribution for correct preds

  • incorrect_confidence — confidence distribution for incorrect preds

  • gap — mean(correct_conf) - mean(incorrect_conf)

  • histogram_bins — bin edges for the confidence histogram

  • correct_hist — histogram counts for correct predictions

  • incorrect_hist — histogram counts for incorrect predictions

Return type:

dict with keys

Examples

>>> gap_data = confidence_gap(y_true, y_pred, y_prob)
>>> print(f"Confidence gap: {gap_data['gap']:.3f}")