The process of systematically optimizing the configuration settings of a machine learning algorithm — settings set before training, not learned from data — to maximize model performance.
In Depth
Every machine learning model has two types of values. Parameters are learned from training data — the weights of a neural network, the split points of a decision tree. Hyperparameters are set before training begins and control the learning process itself — the learning rate, number of layers, regularization strength, or tree depth. Hyperparameter Tuning is the process of finding the combination of these settings that produces the best model.
The most basic approach is grid search: exhaustively testing every combination of hyperparameter values across a predefined grid. It is simple but exponentially expensive as the number of hyperparameters grows. Random search, which samples random combinations from the hyperparameter space, is often more efficient and surprisingly effective. Bayesian Optimization goes further — it builds a probabilistic model of the hyperparameter-to-performance mapping and intelligently chooses the next configuration to evaluate based on expected improvement.
Modern AutoML systems (Google AutoML, H2O AutoML, Optuna) automate hyperparameter tuning entirely, searching vast spaces of architectures and settings in parallel. But even automated tuning requires human judgment: defining sensible search spaces, choosing appropriate evaluation metrics, and understanding that the 'best' hyperparameters for a validation set may not always be the best for production data.
Hyperparameters are the dials of a machine learning system — Hyperparameter Tuning is the systematic process of finding the optimal settings, turning a functional model into a high-performing one.
Real-World Applications
Frequently Asked Questions
What is the difference between parameters and hyperparameters?
Parameters are learned during training (e.g., neural network weights). Hyperparameters are set before training and control the learning process (e.g., learning rate, number of layers, batch size). Parameters are optimized by the algorithm; hyperparameters are optimized by the data scientist — often through systematic tuning methods.
What is the best hyperparameter tuning method?
Bayesian optimization (tools like Optuna, HyperOpt) is generally the most efficient — it learns from previous trials to focus on promising regions. Random search is a strong baseline that outperforms grid search in most cases. For quick experiments, start with random search; for production models, use Bayesian optimization. AutoML frameworks like Auto-sklearn automate the entire process.
Which hyperparameters matter most?
For neural networks: learning rate (most impactful), batch size, number of layers/units, dropout rate, and optimizer choice. For tree-based models: number of trees, max depth, minimum samples per leaf, and learning rate (for gradient boosting). Always start with the learning rate — it has the largest effect on training dynamics and final performance.