A generative model that learns to encode data into a structured, continuous latent space and decode it back — enabling generation of new, similar data points by sampling from that learned space.
In Depth
A Variational Autoencoder consists of two neural networks: an encoder that maps input data to a probability distribution in a low-dimensional latent space, and a decoder that reconstructs data from samples drawn from that distribution. Unlike a standard autoencoder that learns a deterministic mapping, the VAE's encoder outputs a mean and variance for each latent dimension, and samples are drawn from this distribution during training. This stochasticity forces the latent space to be smooth and continuous — a property essential for generative use.
The VAE's training objective balances two terms: the reconstruction loss (how accurately the decoder reconstructs the original input from the latent sample) and the KL divergence (how closely the learned latent distribution matches a standard Gaussian prior). The KL term prevents the model from collapsing each input to a single point — it regularizes the latent space to be densely populated and semantically meaningful. Points close in latent space decode to semantically similar outputs.
The smooth, interpolable latent space is the VAE's defining advantage. Unlike GANs, which can generate sharp images but have opaque, disorganized latent spaces, VAEs provide interpretable latent representations where arithmetic makes sense: interpolating between two faces in latent space produces a plausible blend, and moving along a latent dimension changes one attribute (age, pose, expression) smoothly. This makes VAEs particularly useful for anomaly detection, data compression, and scientific applications where latent structure is the goal.
VAEs learn a structured map of data's underlying variation — a latent space where you can interpolate, sample, and navigate, making them powerful for both generation and understanding what makes data points similar or different.

