From static spreadsheets to an interactive end-to-end analytics workflow — statistics, clustering, regression, constrained optimisation, and reproducible exports in one cohesive Streamlit application.
Regional sales data for hardware categories and Services revenue is typically locked in static spreadsheets. Stakeholders needed comparative statistics, segmentation, predictive models, and repeatable exports — without standing up a full BI stack.
| Objective | How It's Addressed |
|---|---|
| Single source of truth | One file: apple_sales_2024.csv, loaded and column-normalised in
src/data/loader.py
|
| Reusable analytics | Logic lives in src/ (statistics, models, charts, insights) — not embedded
in button callbacks |
| Interactive exploration | Streamlit sidebar filters (regions, min iPhone volume) across 11 focused tab views |
| Trust & reproducibility | Validation tab with cross-validation; Export tab writes joblib + JSON +
CSV under src/models/saved/ |
| Guided presentation | "Launch Guided Demo" in app/components/demo.py locks navigation and walks
through the workflow |
A deterministic, reproducible pipeline from raw CSV to enriched dataframe — cached at every layer to avoid redundant I/O across Streamlit reruns.
Each module shares the same filtered dataframe. Switching filters in the sidebar propagates consistently across every tab — no stale state, no divergent views.
A full statistical and machine learning pipeline implemented in a clean src/
layer — decoupled from the UI, independently testable, and ready for extension.
A strict separation between presentation (app/)
and analytics (src/)
keeps the codebase maintainable, independently testable, and ready for extension.
main.py — Entry: load
→ features → sidebar → KPIs → tab routersidebar.py — Filters,
nav, demo launcherkpi.py — KPI card
componentsstyles.py — Global
dark CSS systemdemo.py — Guided
walkthrough overlaytabs/* — One module
per page (11 total)data/loader.py — CSV
load + quality dict + @cachefeatures/builder.py —
Derived feature engineeringanalysis/statistics.py
— t-test, ANOVA, Pearsonanalysis/insights.py —
IF/THEN recommendation enginemodels/regression.py —
Linear, Poly, RF, GB, CVmodels/clustering.py —
K-Means, Agg., DBSCANvisualization/charts.py —
Plotly dark theme systemSeven packages, each with a single responsibility. No bloat, no redundancy — every dependency earns its place.
A linear workflow through six stages — each building on the last, with reproducible artifacts at every step.
Intellectual honesty is part of the design. These boundaries are explicit in the code documentation, not buried in fine print.