A neural network architecture designed for sequential data that maintains a hidden state — a form of memory — allowing it to incorporate context from previous inputs when processing each new element.
In Depth
Standard neural networks process each input independently — they have no memory. A Recurrent Neural Network breaks this limitation by maintaining a hidden state: an internal vector that summarizes information from all previous inputs in the sequence. At each time step, the network receives the current input and the previous hidden state, combines them, and produces a new hidden state. This feedback loop gives RNNs a form of short-term memory that makes them natural for sequential tasks.
RNNs were the dominant architecture for Natural Language Processing tasks in the 2010s — language translation, text generation, sentiment analysis — precisely because language is sequential. Each word's meaning depends on the words before it. RNNs capture this dependency by threading the hidden state through every word in the sentence. However, they suffer from the vanishing gradient problem: as sequences grow longer, gradients shrink exponentially during backpropagation, making it hard to learn dependencies spanning many time steps.
LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit) networks were developed to address this limitation, using gating mechanisms to selectively retain or forget information over long sequences. However, even LSTMs are largely superseded in NLP by the Transformer architecture, which processes entire sequences in parallel using attention mechanisms and scales far more efficiently on modern hardware. RNNs and LSTMs still see use in embedded systems, audio processing, and applications requiring low latency sequential inference.
RNNs gave neural networks memory — the ability to use context from the past when interpreting the present — unlocking sequential tasks like language, audio, and time series modeling.

