Articles | Shahid Ul Islam

01

Mathematics

[ PROBABILITY // V1.0 ]

Probability Theory

The mathematical foundation beneath every algorithm. Measure theory, concentration inequalities, and the probabilistic soul of ML.

25 min read Analyze

02

Information Theory

[ MATHEMATICS // V1.0 ]

Information Theory

The mathematics of uncertainty and learning. Explore Shannon entropy, KL divergence, cross-entropy, and the maximum entropy principle.

18 min read Analyze

03

Mathematics

[ CALCULUS // V1.0 ]

Matrix Calculus

∂y/∂x for every shape. Differentials, Jacobians, trace tricks, and full derivations of linear regression, PCA, and backprop.

40 min read Analyze

04

Supervised Learning

[ REGRESSION // V1.1 ]

Linear Regression

The foundation of predictive modeling. Complete mathematical derivation of Ordinary Least Squares, normal equations, and assumptions.

12 min read Analyze

05

Optimization

[ CALCULUS // V1.2 ]

Bias Variance TradeOff

The fundamental trade-off between model simplicity and prediction accuracy.

15 min read Analyze

06

Optimization

[ OPTIMIZATION // V1.0 ]

Gradient Descent

The workhorse of machine learning optimization. Understand partial derivatives, learning rates, and convergence behavior from first principles.

10 min read Analyze

07

Classification

[ CLASSIFICATION // V2.0 ]

Logistic Regression

Moving from continuous to categorical. Explore sigmoid functions, maximum likelihood estimation, and cross-entropy loss gradients.

14 min read Analyze

08

Optimization

[ CALCULUS // V1.2 ]

Lagrange Multipliers

Constrained optimization unlocked. A deep dive into the method of Lagrange multipliers, dual problems, and their geometric intuition.

15 min read Analyze

09

Optimization

[ CONVEX // V1.0 ]

Convex Optimization

Convex sets, Jensen's inequality, duality, KKT conditions, proximal methods — the rigorous bridge between gradient descent and Lagrange multipliers.

35 min read Analyze

10

Linear Algebra

[ MATHEMATICS // V1.0 ]

Singular Value Decomposition

The most powerful factorization in all of mathematics. Works on every matrix, reveals hidden geometry, and underlies PCA and compression.

22 min read Analyze

11

Dimensionality Reduction

[ UNSUPERVISED // V1.0 ]

Principal Component Analysis

From variance maximization to SVD equivalence. A definitive guide to understanding PCA's mathematical machinery from the ground up.

20 min read Analyze

12

Probabilistic ML

[ BAYESIAN // V1.0 ]

Bayesian Machine Learning

From the philosophical divide between frequentist and Bayesian thinking, through Bayes' theorem, priors, posteriors, and conjugate families.

25 min read Analyze

13

Deep Learning

[ NEURAL NETS // V1.0 ]

Neural Networks

The mathematical foundations of deep learning. Explore forward propagation, backpropagation derivations, and the universal approximation theorem.

25 min read Analyze

14

Deep Learning

[ BACKPROP // V1.0 ]

Backpropagation

The algorithm that trains every neural network. Computational graphs, reverse-mode AD, Jacobians, VJPs, matrix calculus, and PyTorch autograd — derived from first principles.

45 min read Analyze

15

Deep Learning

[ ATTENTION // V1.0 ]

Transformers

From the failure modes of RNNs, through the mathematical derivation of attention, to multi-head attention and positional encoding.

28 min read Analyze

16

Deep Learning

[ OPTIMIZATION // V1.0 ]

DL Optimization

Loss landscapes, saddle points, flat vs sharp minima, SGD noise as implicit regularization, scaling laws, and grokking.

50 min read Analyze