Viqus Logo Viqus Logo
Home
Categories
Language Models Generative Imagery Hardware & Chips Business & Funding Ethics & Society Science & Robotics
Resources
AI Glossary Academy CLI Tool Labs
About Contact
Back to Glossary
Generative AI Intermediate Also: GPT-4, Generative Pre-trained Transformer

GPT (Generative Pre-trained Transformer)

Definition

A family of large language models developed by OpenAI using the Transformer decoder architecture, pre-trained on massive text datasets to predict the next token — forming the foundation for ChatGPT and many AI applications.

In Depth

GPT stands for Generative Pre-trained Transformer — three words that summarize its approach. Generative: it generates text by predicting the next token in a sequence. Pre-trained: it is first trained on a massive, general text corpus before any task-specific fine-tuning. Transformer: it uses the Transformer decoder architecture, processing context through stacked layers of masked self-attention. The combination of scale and this architecture proved transformatively powerful.

The GPT series traces a remarkable trajectory of scale and capability. GPT-1 (2018, 117M parameters) demonstrated that language model pre-training followed by fine-tuning worked for NLP tasks. GPT-2 (2019, 1.5B parameters) generated such coherent text that OpenAI initially withheld the full model citing misuse concerns. GPT-3 (2020, 175B parameters) introduced few-shot and zero-shot learning at scale — the model could perform tasks it was never explicitly trained on, given only a few examples in the prompt. GPT-4 (2023) added multimodal input (text and images) and significantly improved reasoning.

The GPT approach established the pre-training paradigm now standard across the industry: train a huge model on general data, then adapt it. This is why BERT, LLaMA, Gemini, Claude, and essentially every major LLM uses a variant of this approach. GPT-3's emergent few-shot capabilities also revealed 'scaling laws' — predictable improvements in performance as a function of model size, dataset size, and compute — which continue to guide frontier AI development.

Key Takeaway

GPT proved that pre-training a Transformer on internet-scale text, then fine-tuning for specific applications, is a general recipe for powerful AI — a paradigm that has defined the entire field since 2020.

Real-World Applications

01 ChatGPT: the most widely used AI assistant, built directly on the GPT-4 model with RLHF alignment.
02 GitHub Copilot: code generation and completion powered by Codex, a GPT model specialized on source code.
03 API access: developers building AI-powered applications using OpenAI's GPT API for summarization, extraction, classification, and generation.
04 Research: GPT models used as reasoning engines, knowledge bases, and writing assistants in academic and scientific workflows.
05 Enterprise automation: GPT-powered workflows for document analysis, customer communication, and knowledge management.

Frequently Asked Questions

How does GPT generate text?

GPT is an autoregressive model — it generates text one token at a time, where each new token is predicted based on all previous tokens. The model processes input through layers of self-attention and feedforward networks (the Transformer decoder), producing a probability distribution over the entire vocabulary for the next token. A token is sampled from this distribution, appended, and the process repeats.

What is the difference between GPT-3, GPT-4, and GPT-4o?

Each generation represents a significant leap in capability. GPT-3 (175B parameters) demonstrated impressive few-shot learning. GPT-4 dramatically improved reasoning, factuality, and multimodal capabilities (text + images). GPT-4o (omni) added native multimodal input/output (text, vision, audio) with reduced latency and cost. Each generation also improved safety, alignment, and instruction following.

What does 'pre-trained' mean in GPT?

Pre-trained means the model first learns general language understanding from a massive unlabeled text corpus by predicting the next word — a self-supervised task. This pre-training gives the model a broad foundation of language, facts, and reasoning patterns. It is then fine-tuned on specific tasks (instruction following, dialogue) and aligned with human preferences, adapting its general knowledge to specific applications.