ViqusViqus
Navigate
Company
Blog
About Us
Contact
System Status
Enter Viqus Hub
Back to Glossary
Deep Learning Intermediate

Dropout

Definition

A regularization technique that randomly deactivates a fraction of neurons during each training step, forcing the network to learn more robust, distributed representations and reducing overfitting.

In Depth

Dropout, introduced by Srivastava et al. in 2014, is one of the most effective and widely used regularization techniques for neural networks. During each training step, each neuron is independently deactivated (set to zero) with probability p (the dropout rate, typically 0.2-0.5). The network must learn to produce correct outputs despite having only a random subset of its neurons active at any time.

The intuition behind dropout's effectiveness is that it prevents neurons from co-adapting — developing complex interdependencies where one neuron compensates for another's errors. By randomly removing neurons, dropout forces the network to develop redundant, independent representations of the same features. The result is an ensemble of many different sub-networks, averaged together at inference time (when dropout is turned off and outputs are scaled by the retention probability).

At inference time, all neurons are active and their outputs are scaled by the keep probability to maintain consistent expected values. Modern interpretations view dropout as approximate Bayesian inference — the randomness during training corresponds to sampling from a posterior distribution over model weights, yielding uncertainty estimates that can be used for calibrated predictions. Techniques like Monte Carlo Dropout deliberately keep dropout active at inference to produce uncertainty estimates.

Key Takeaway

Dropout is a powerful and elegant regularizer — by randomly silencing neurons during training, it forces networks to develop robust, redundant representations that generalize far better to unseen data.

Real-World Applications

01 Computer vision: dropout layers after fully-connected layers in CNNs to reduce overfitting on training image datasets.
02 NLP classification: dropout applied to BERT embeddings during fine-tuning to prevent memorization on small labeled datasets.
03 Medical AI: using Monte Carlo Dropout to produce uncertainty estimates alongside model predictions for clinical decision support.
04 Speech recognition: dropout applied to recurrent layers to improve generalization of acoustic models.
05 Any deep network trained on small datasets: dropout is a first-line defense against overfitting whenever data is limited.