The process of rescaling data features to a common range or distribution — ensuring that no single feature dominates model training simply because of its larger numerical scale.
In Depth
Normalization (or feature scaling) is a preprocessing step that transforms numerical features to a common scale before training a model. Without normalization, features with larger numerical ranges — such as salary (thousands of dollars) vs. age (tens of years) — can disproportionately influence the model, not because they are more important, but simply because their values are larger. Normalization ensures that all features contribute proportionally to the learning process.
The two most common normalization techniques are Min-Max Scaling and Standardization (Z-score normalization). Min-Max Scaling rescales values to a fixed range, typically [0, 1], by subtracting the minimum and dividing by the range. Standardization transforms values to have zero mean and unit variance by subtracting the mean and dividing by the standard deviation. Min-Max Scaling preserves the shape of the original distribution; Standardization is preferred when the data contains outliers or when the algorithm assumes normally distributed features.
Not all algorithms require normalization. Tree-based models (Decision Trees, Random Forests, Gradient Boosted Trees) are invariant to feature scaling because they make decisions based on threshold comparisons, not distances. However, algorithms based on distances or gradients — K-Nearest Neighbors, Support Vector Machines, neural networks, and logistic regression — are highly sensitive to feature scale and virtually always benefit from normalization. Understanding when and how to normalize is a fundamental data preprocessing skill.
Normalization rescales features to a common range so that no feature dominates model training simply because of its numerical scale — a crucial preprocessing step for most machine learning algorithms.