A model evaluation technique that divides data into multiple subsets, repeatedly training on some and testing on the remainder, to obtain a more reliable, unbiased estimate of model performance.
In Depth
Cross-validation solves a fundamental problem in model evaluation: if you evaluate a model on the same data used to train it, you get an optimistic, misleading picture of its real-world performance. The standard solution is to hold out a separate test set — but with limited data, dedicating a large portion to testing is expensive. Cross-validation uses all available data for both training and evaluation.
In k-fold cross-validation, the most common variant, the dataset is divided into k equal parts (folds). The model is trained k times — each time using k-1 folds for training and the remaining fold for evaluation. The k evaluation scores are averaged to produce a final performance estimate. With k=5 or k=10, this estimate is significantly more robust than a single train-test split, because every example is used for both training and testing across the k runs.
Cross-validation is especially important for Hyperparameter Tuning. By evaluating each hyperparameter configuration on cross-validation folds rather than a single test set, practitioners avoid 'overfitting to the test set' — a subtle error where repeated evaluation on the same test data inflates apparent performance. Leave-one-out cross-validation (LOOCV), where each data point is the test set once, is the extreme case — unbiased but computationally expensive for large datasets.
Cross-validation is how you honestly evaluate a machine learning model — providing a realistic performance estimate by ensuring every data point contributes to both training and testing across multiple runs.
Real-World Applications
Frequently Asked Questions
How does k-fold cross-validation work?
The dataset is divided into k equal subsets (folds). The model is trained k times, each time using k-1 folds for training and 1 fold for testing. The final performance metric is the average across all k tests. This gives a more reliable estimate than a single train-test split, because every data point is used for both training and evaluation.
How many folds should you use?
5-fold and 10-fold are the most common choices and work well in practice. More folds give less biased estimates but are more computationally expensive. For very small datasets, leave-one-out (k = n) maximizes training data per fold. For very large datasets, even 3-fold may suffice. The optimal k depends on dataset size and available compute.
Does cross-validation prevent overfitting?
Cross-validation doesn't directly prevent overfitting — it detects it. By evaluating on multiple held-out folds, it reveals whether the model generalizes beyond training data. If cross-validation scores are much lower than training scores, overfitting is present. This insight then guides model tuning, regularization, or data collection decisions to address the problem.