Bidirectional Encoder Representations from Transformers — Google's landmark language model that reads text bidirectionally, capturing richer contextual understanding than left-to-right models, and became the foundation for NLP fine-tuning.
In Depth
BERT, introduced by Google in 2018, revolutionized NLP by demonstrating that bidirectional pre-training of Transformers produces dramatically better language representations than previous unidirectional approaches. While GPT reads text left-to-right, BERT reads in both directions simultaneously — understanding each word in the context of all words that come before and after it. This gives BERT a richer, more accurate representation of meaning, particularly for tasks like question answering where the relationship between words across a sentence is critical.
BERT is pre-trained using two objectives. Masked Language Modeling (MLM) randomly masks 15% of input tokens and trains the model to predict the masked tokens from surrounding context — a task that requires deep bidirectional understanding. Next Sentence Prediction (NSP) trains the model to determine whether two sentences appear consecutively in text — capturing discourse-level relationships. Together, these objectives create representations that encode syntax, semantics, and pragmatics from vast text corpora.
BERT's impact on NLP was immediate and profound. Fine-tuning BERT on downstream tasks (question answering, sentiment analysis, named entity recognition, textual entailment) produced state-of-the-art results on virtually every benchmark of the time, often surpassing prior specialized architectures. It established fine-tuning of pre-trained Transformers as the dominant NLP paradigm. Variants like RoBERTa (which improved pre-training), DistilBERT (smaller and faster), and ALBERT (parameter-efficient) extended BERT's influence further.
BERT showed that reading a sentence in both directions simultaneously provides fundamentally richer understanding than left-to-right reading — a simple insight that set new records across every NLP benchmark in 2018.

