Recurrent Neural Network (RNN)

Definition

A neural network architecture designed for sequential data that maintains a hidden state — a form of memory — allowing it to incorporate context from previous inputs when processing each new element.

In Depth

Standard neural networks process each input independently — they have no memory. A Recurrent Neural Network breaks this limitation by maintaining a hidden state: an internal vector that summarizes information from all previous inputs in the sequence. At each time step, the network receives the current input and the previous hidden state, combines them, and produces a new hidden state. This feedback loop gives RNNs a form of short-term memory that makes them natural for sequential tasks.

RNNs were the dominant architecture for Natural Language Processing tasks in the 2010s — language translation, text generation, sentiment analysis — precisely because language is sequential. Each word's meaning depends on the words before it. RNNs capture this dependency by threading the hidden state through every word in the sentence. However, they suffer from the vanishing gradient problem: as sequences grow longer, gradients shrink exponentially during backpropagation, making it hard to learn dependencies spanning many time steps.

LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit) networks were developed to address this limitation, using gating mechanisms to selectively retain or forget information over long sequences. However, even LSTMs are largely superseded in NLP by the Transformer architecture, which processes entire sequences in parallel using attention mechanisms and scales far more efficiently on modern hardware. RNNs and LSTMs still see use in embedded systems, audio processing, and applications requiring low latency sequential inference.

Key Takeaway

RNNs gave neural networks memory — the ability to use context from the past when interpreting the present — unlocking sequential tasks like language, audio, and time series modeling.

Real-World Applications

01 Language modeling: early neural language models using RNNs to predict the next word in a sequence for text generation.

02 Machine translation: sequence-to-sequence RNN models that encode a sentence in one language and decode it in another.

03 Speech recognition: RNNs processing audio frames sequentially to transcribe speech to text in real time.

04 Music generation: RNNs trained on MIDI data to generate melodically consistent musical sequences.

05 Time series forecasting: predicting stock prices, energy demand, or sensor readings from historical sequential data.

In Depth

Real-World Applications

Related Concepts