A Machine Learning paradigm where an agent learns to make decisions by interacting with an environment, receiving rewards for good actions and penalties for bad ones, seeking to maximize cumulative reward over time.
In Depth
Reinforcement Learning (RL) takes inspiration from how humans and animals learn: through trial, error, and feedback. An RL agent exists within an environment. At each step, it observes the current state, takes an action, and receives a reward signal — positive if the action was beneficial, negative if harmful. Over millions of such interactions, the agent learns a policy: a strategy for choosing actions that maximizes long-term cumulative reward.
RL has produced some of AI's most spectacular achievements. DeepMind's AlphaGo and AlphaZero defeated world champions at Go and Chess by learning purely through self-play — billions of games against themselves — without any human strategic guidance. OpenAI Five beat professional Dota 2 teams. These systems learned emergent strategies that no human player had discovered, purely by optimizing for reward.
Beyond games, RL is increasingly central to real-world applications. RLHF (Reinforcement Learning from Human Feedback) is the technique used to align Large Language Models like ChatGPT and Claude — human raters evaluate model responses, creating a reward signal that steers the model toward helpful, accurate, and safe behavior. In robotics, RL allows physical agents to learn dexterous manipulation and locomotion in simulation before deploying to hardware.
Reinforcement Learning is the paradigm of learning through experience — the agent doesn't need labeled examples or a human teacher, just a reward signal and enough interactions to discover what works.

