Normalization

Definition

The process of rescaling data features to a common range or distribution — ensuring that no single feature dominates model training simply because of its larger numerical scale.

In Depth

Normalization (or feature scaling) is a preprocessing step that transforms numerical features to a common scale before training a model. Without normalization, features with larger numerical ranges — such as salary (thousands of dollars) vs. age (tens of years) — can disproportionately influence the model, not because they are more important, but simply because their values are larger. Normalization ensures that all features contribute proportionally to the learning process.

The two most common normalization techniques are Min-Max Scaling and Standardization (Z-score normalization). Min-Max Scaling rescales values to a fixed range, typically [0, 1], by subtracting the minimum and dividing by the range. Standardization transforms values to have zero mean and unit variance by subtracting the mean and dividing by the standard deviation. Min-Max Scaling preserves the shape of the original distribution; Standardization is preferred when the data contains outliers or when the algorithm assumes normally distributed features.

Not all algorithms require normalization. Tree-based models (Decision Trees, Random Forests, Gradient Boosted Trees) are invariant to feature scaling because they make decisions based on threshold comparisons, not distances. However, algorithms based on distances or gradients — K-Nearest Neighbors, Support Vector Machines, neural networks, and logistic regression — are highly sensitive to feature scale and virtually always benefit from normalization. Understanding when and how to normalize is a fundamental data preprocessing skill.

Key Takeaway

Normalization rescales features to a common range so that no feature dominates model training simply because of its numerical scale — a crucial preprocessing step for most machine learning algorithms.

Real-World Applications

01 Neural network training: normalizing input features ensures stable gradient flow and faster convergence during backpropagation.

02 K-Nearest Neighbors: without normalization, the feature with the largest range would dominate distance calculations and bias predictions.

03 Image preprocessing: pixel values are typically normalized from [0, 255] to [0, 1] or standardized before feeding into convolutional neural networks.

04 Combining heterogeneous features: normalizing allows features measured in different units (dollars, years, percentages) to be used together in a single model.

05 Clustering: K-Means and other distance-based clustering algorithms produce more meaningful results when features are on comparable scales.

In Depth

Real-World Applications

Related Concepts