Mathematical Foundations
Deep Dives into Machine Learning & Optimization
Probability Theory
The mathematical foundation beneath every algorithm. Measure theory, concentration inequalities, and the probabilistic soul of ML.
Information Theory
The mathematics of uncertainty and learning. Explore Shannon entropy, KL divergence, cross-entropy, and the maximum entropy principle.
Matrix Calculus
∂y/∂x for every shape. Differentials, Jacobians, trace tricks, and full derivations of linear regression, PCA, and backprop.
Linear Regression
The foundation of predictive modeling. Complete mathematical derivation of Ordinary Least Squares, normal equations, and assumptions.
Bias Variance TradeOff
The fundamental trade-off between model simplicity and prediction accuracy.
Gradient Descent
The workhorse of machine learning optimization. Understand partial derivatives, learning rates, and convergence behavior from first principles.
Logistic Regression
Moving from continuous to categorical. Explore sigmoid functions, maximum likelihood estimation, and cross-entropy loss gradients.
Lagrange Multipliers
Constrained optimization unlocked. A deep dive into the method of Lagrange multipliers, dual problems, and their geometric intuition.
Convex Optimization
Convex sets, Jensen's inequality, duality, KKT conditions, proximal methods — the rigorous bridge between gradient descent and Lagrange multipliers.
Singular Value Decomposition
The most powerful factorization in all of mathematics. Works on every matrix, reveals hidden geometry, and underlies PCA and compression.
Principal Component Analysis
From variance maximization to SVD equivalence. A definitive guide to understanding PCA's mathematical machinery from the ground up.
Bayesian Machine Learning
From the philosophical divide between frequentist and Bayesian thinking, through Bayes' theorem, priors, posteriors, and conjugate families.
Neural Networks
The mathematical foundations of deep learning. Explore forward propagation, backpropagation derivations, and the universal approximation theorem.
Backpropagation
The algorithm that trains every neural network. Computational graphs, reverse-mode AD, Jacobians, VJPs, matrix calculus, and PyTorch autograd — derived from first principles.
Transformers
From the failure modes of RNNs, through the mathematical derivation of attention, to multi-head attention and positional encoding.
DL Optimization
Loss landscapes, saddle points, flat vs sharp minima, SGD noise as implicit regularization, scaling laws, and grokking.