The process of selecting, transforming, or creating input variables (features) from raw data to make machine learning algorithms work more effectively on a given problem.
In Depth
Feature Engineering is the art and science of shaping raw data into a form that machine learning algorithms can learn from effectively. Raw data rarely arrives in a machine-learnable format: dates need to be decomposed, categories need to be encoded numerically, outliers need to be handled, and domain-specific combinations of variables often carry more signal than the raw variables themselves.
Common feature engineering techniques include normalization and standardization (scaling numerical features to comparable ranges); one-hot encoding (converting categorical variables to binary columns); creating interaction terms (multiplying two variables together to capture their joint effect); log transformations (compressing skewed distributions); and temporal features (extracting day-of-week, time-since-event, or rolling averages from time series).
The impact of Feature Engineering on model performance can be dramatic — often larger than switching from one algorithm to another. In many real-world ML projects, 70-80% of time is spent on data collection and Feature Engineering rather than model development. The rise of deep learning has partially automated feature learning for unstructured data (images, text), but Feature Engineering remains essential for structured/tabular data, which dominates enterprise ML applications.
Feature Engineering is what separates a model that works from one that works well — the quality of input features often matters more than the sophistication of the algorithm applied to them.

