# The Problem Nobody Ships Around You trained a model. It hits **92% accuracy** on your validation set. You ship it. Three months later: * A minority-class user gets consistently wrong predictions. * The model is **90% confident on its worst mistakes**. * A regulator asks *"why did it make that decision?"* — and you have no answer. Sound familiar? You're not alone. **Accuracy tells you how often your model is right.** **It tells you nothing about *when* it fails, *why* it fails, or *who* it fails.** TrustLens makes those failures visible — before they reach production. Beyond standard metrics, machine learning practitioners need to understand the "certainty of failure" and the distribution of errors across subgroups. ## Why standard metrics fall short Most ML pipelines rely on Accuracy, F1, or RMSE. While useful, these metrics are aggregate scores that hide systematic flaws: - **Miscalibration**: A model saying "I'm 99% sure" when it's only right 60% of the time. - **Silent Bias**: High overall accuracy that masks significant performance drops for minority classes. - **Representation Fragility**: Latent spaces where classes are so closely packed that slight noise causes classification flips. **Traditional metrics tell you how the model performs, but they don't tell you if the model is safe to deploy.** TrustLens bridges the gap between **raw metrics and deployment decisions**. It transforms aggregate diagnostics into explainable narratives, providing machine learning practitioners with the evidence needed to approve (or block) a model for production. Learn how these issues are measured in [Features & Modules](features.md).