The foundational algorithm for training neural networks — it efficiently computes the gradient of the loss function with respect to every weight in the network, enabling gradient-based optimization.
In Depth
Backpropagation is the algorithm that makes neural networks trainable. After a forward pass — where input data flows through the network to produce a prediction — backpropagation computes how much each weight in the network contributed to the prediction error. It does this by applying the chain rule of calculus backward through the network, from the output layer to the input layer, calculating the gradient of the loss with respect to each parameter.
The key insight of backpropagation is efficiency. A neural network with millions of parameters could in principle have its gradients computed by finite differences — slightly perturbing each weight and measuring the effect on loss — but this would require millions of forward passes per training step. Backpropagation computes all gradients in a single backward pass, making training tractable even for networks with billions of parameters.
Once gradients are computed by backpropagation, an optimizer — typically a variant of gradient descent like Adam or SGD — uses them to update the weights, nudging each parameter in the direction that reduces the loss. This forward-backward-update cycle, repeated across millions of training examples, is the process by which neural networks learn. Modern deep learning frameworks (PyTorch, JAX, TensorFlow) implement automatic differentiation, making backpropagation largely invisible to practitioners — but understanding it is essential for debugging and architectural design.
Backpropagation is how neural networks assign credit — or blame — to every parameter for every prediction error, making it possible to efficiently train systems with millions or billions of weights.

