One complete pass through the entire training dataset during model training — a unit of training progress used to track how many times every training example has been seen by the model.
In Depth
An epoch is a fundamental unit of training measurement in machine learning. During one epoch, the model processes every training example exactly once — calculating predictions, computing the loss, and updating weights via backpropagation. After each epoch, the model has 'seen' the full dataset once, and its parameters have been updated multiple times (once per batch). Training typically requires many epochs before convergence — the point at which the loss stops meaningfully decreasing.
The number of epochs to train for is a hyperparameter. Too few epochs and the model underfits — it hasn't learned enough from the data. Too many epochs and the model overfits — it has memorized the training data including noise, and performance on validation data degrades. Early stopping is the standard solution: monitor validation loss after each epoch and stop training when it begins to increase, keeping the model checkpoint that achieved the best validation performance.
Epochs interact with batch size and learning rate. Training for 10 epochs with a batch size of 32 involves many more gradient update steps than training for 10 epochs with a batch size of 512, even though both see the same data the same number of times. Larger batches compute more stable gradient estimates but take fewer update steps per epoch. This interaction — along with learning rate scheduling — means that epoch count alone is insufficient to characterize a training run; it must be considered alongside the full training configuration.
An epoch is how we measure a model's exposure to data — repeated full passes through training examples are how neural networks refine their internal representations from coarse pattern-matching to nuanced understanding.
Real-World Applications
Frequently Asked Questions
How many epochs should I train for?
There's no universal answer — it depends on dataset size, model complexity, and the task. Common practice: start with a generous number (50-100 epochs) and use early stopping to halt when validation performance stops improving. For LLM pre-training, models typically see each data point only 1-4 times (1-4 epochs). Overtrained models overfit; undertrained models underfit. Monitor validation loss to find the sweet spot.
What is the difference between an epoch, a batch, and an iteration?
An epoch is one complete pass through the entire training dataset. A batch is a subset of training examples processed together in one forward/backward pass. An iteration is the processing of one batch. If you have 10,000 samples and a batch size of 100, then 1 epoch = 100 iterations. These three concepts define the fundamental rhythm of model training.
Can more epochs always improve a model?
No. After a certain point, additional epochs lead to overfitting — the model memorizes training data and performs worse on new data. The learning curve (training vs. validation loss over epochs) typically shows validation loss decreasing then increasing. The optimal epoch count is where validation loss is minimized. Early stopping automates this by halting training when validation loss stops decreasing.