A regularization technique that randomly deactivates a fraction of neurons during each training step, forcing the network to learn more robust, distributed representations and reducing overfitting.
In Depth
Dropout, introduced by Srivastava et al. in 2014, is one of the most effective and widely used regularization techniques for neural networks. During each training step, each neuron is independently deactivated (set to zero) with probability p (the dropout rate, typically 0.2-0.5). The network must learn to produce correct outputs despite having only a random subset of its neurons active at any time.
The intuition behind dropout's effectiveness is that it prevents neurons from co-adapting — developing complex interdependencies where one neuron compensates for another's errors. By randomly removing neurons, dropout forces the network to develop redundant, independent representations of the same features. The result is an ensemble of many different sub-networks, averaged together at inference time (when dropout is turned off and outputs are scaled by the retention probability).
At inference time, all neurons are active and their outputs are scaled by the keep probability to maintain consistent expected values. Modern interpretations view dropout as approximate Bayesian inference — the randomness during training corresponds to sampling from a posterior distribution over model weights, yielding uncertainty estimates that can be used for calibrated predictions. Techniques like Monte Carlo Dropout deliberately keep dropout active at inference to produce uncertainty estimates.
Dropout is a powerful and elegant regularizer — by randomly silencing neurons during training, it forces networks to develop robust, redundant representations that generalize far better to unseen data.

