The phrase nobody used two years ago
If you've been paying attention to LLM engineering circles over the past few months, you'll have noticed a quiet terminology shift. "Prompt engineering" is being replaced — in job titles, in blog posts, in internal docs — with a broader term: context engineering.
This isn't just rebranding. It reflects a real change in what building with LLMs actually looks like in 2026. The prompt is no longer the interesting part. The interesting part is everything around it.
What context engineering actually covers
A modern LLM request isn't just a user message anymore. It's a carefully assembled package that might include:
- A system instruction defining the model's role and constraints
- Retrieved documents from a vector store or hybrid search
- Prior conversation turns, possibly summarized
- Tool definitions and their schemas
- Tool invocation results from earlier steps
- Structured state from a scratchpad or memory store
- Few-shot examples selected dynamically based on the input
- Output format specifications
Every one of those elements is a design decision. How much of each to include, in what order, with what level of compression, at what token budget — these choices determine whether the model succeeds or fails more than the wording of any single instruction.
Prompt engineering asks: what should I say to the model? Context engineering asks: what should the model see when it starts thinking?
The three hard problems
Problem 1 — Token budget allocation
Even with million-token context windows available from most frontier providers, you don't want to use them. Longer contexts are slower, more expensive, and — despite what the marketing says — they degrade attention on the details that actually matter. Context engineering is, in large part, the discipline of deciding what not to include.
The teams doing this well treat token budget like a scarce resource. They measure how much context each component consumes, prune ruthlessly, and instrument the pipeline so they can see when context is bloating over time.
Problem 2 — Retrieval that's actually relevant
Most RAG systems retrieve too much and rank poorly. The result is a context window padded with marginally-relevant passages that dilute the model's focus. Good context engineering treats retrieval as a precision problem, not a recall problem — the goal is the five chunks that directly answer the question, not the top fifty by cosine similarity.
This often means layering a second-stage reranker, filtering by metadata before vector search, or using the query to generate a structured search plan rather than a single embedding lookup.
Problem 3 — Dynamic assembly
Static prompts are easy. Dynamic contexts — where what you include depends on the input, the previous steps, and the state of the world — are hard. You need templating, you need guardrails to prevent runaway growth, and you need observability to understand what the model actually saw when a response went wrong.
The new toolkit
The tooling for context engineering is maturing quickly. A year ago, most teams were hand-rolling their own context assembly logic. Today, the stack typically includes:
- Prompt templating engines that handle conditional inclusion and token budgeting
- Observability platforms that log and visualize the full assembled context for every request
- Retrieval orchestration layers that manage rerankers, filters, and query rewriting
- Context compression tools that summarize or distill lower-priority content
None of these are exotic. They're the same kinds of primitives any mature software system uses — config management, logging, caching — applied to the specific problem of feeding a language model.
What this means for team structure
The rise of context engineering is changing how AI-facing teams organize. "Prompt engineer" as a standalone role is giving way to something closer to "applied AI engineer" — someone who understands retrieval, evaluation, observability, and model behavior as a single system, not a collection of parts.
The teams that separate these concerns — one person writes prompts, another manages retrieval, a third handles evals — tend to ship worse products than teams where one engineer owns the full pipeline end to end. The feedback loops are too tight to split.
Where to start
If you're retrofitting an existing LLM product, the highest-leverage move is almost always the same: log every component of your context, for every request, in a queryable store. Until you can see what the model is actually receiving, you can't engineer the context — you're just guessing.
Once that's in place, the rest follows. You'll spot the retrieval bugs, the token bloat, the instructions that contradict each other. And you'll start making the boring, unglamorous improvements that add up to a system that actually works.