The Rise of AI-First Product Roles: What PMs Need to Unlearn

The skill gap nobody warned PMs about

The conversations I have most often with product leaders these days aren't about which model to use or which framework to adopt. They're about people — specifically, about experienced product managers who are struggling to ship effective AI features despite being very good at their jobs.

These aren't bad PMs. They're often the best PMs in the organization — the ones who learned the craft on traditional software products and built reputations for shipping fast, reasoning about tradeoffs, and making sharp prioritization calls. The skills that made them great are the same skills making this transition hard. AI products reward a different set of instincts, and some of the old instincts actively get in the way.

What's different about AI products

The shortest way to put it: in traditional software, the behavior of the product is something you specify. In AI products, the behavior of the product is something you discover — and then shape, imperfectly, through evaluation and iteration.

That changes almost everything about how product decisions get made:

Specs are less useful. You can write a detailed spec for a button. You can't write a spec for "the model should understand customer frustration." You have to show examples, build evals, and iterate.
The uncertainty budget is bigger. Traditional software either does what you wrote or has a bug. AI products exist on a quality spectrum that never reaches 100%, and shipping decisions have to account for that.
User research looks different. What users say they want from an AI feature and what they actually use often diverge dramatically. Watching real sessions matters more than survey data.
The competitive landscape moves weekly. A new model can reshape what's possible overnight. Roadmaps that look stable on Monday can be irrelevant by Friday.

What experienced PMs need to unlearn

The spec as source of truth

The instinct to pin down requirements in a clear, complete document is deeply trained in good PMs. It produces predictable projects and accountable teams. In AI products, it produces specs that are either too vague to be useful or too precise to be achievable.

The replacement is the eval suite. Instead of "the model should answer customer questions helpfully," you have 200 labeled examples showing what helpful answers look like and what unhelpful ones look like, with a rubric that makes the distinction concrete. That's the real spec. Everything written in prose is commentary.

"If the engineers can't explain it, it shouldn't ship"

Another well-earned instinct that backfires. In classical software, a feature you can't explain is usually a feature you shouldn't ship. In AI products, almost nothing can be fully explained — the model does what it does, and your understanding of why is approximate at best.

The useful question isn't "can we explain how this works?" It's "can we measure whether it works, and catch it when it stops working?" If the answer is yes, the fact that the internals are opaque is manageable. If the answer is no, no amount of explainability will save you.

Roadmaps as commitments

Six-month roadmaps that specify which features will ship when were already fraying in traditional software. In AI, they break outright. The best teams treat roadmaps as hypotheses that will be revised every few weeks as capabilities change, evals reveal surprises, and user behavior shifts.

PMs who need roadmap certainty to feel in control of a project will struggle. PMs who can hold loose plans while maintaining tight operational discipline will thrive. The skill isn't planning harder — it's replanning more gracefully.

The loop that replaces the spec Build a tight loop: collect failure cases from production → add them to the eval suite → change the prompt, retrieval, or model → rerun evals → ship or iterate. PMs who master this loop outperform PMs who keep trying to write complete specs, every single time.

The new skills that matter

Some skills become dramatically more valuable in AI product roles:

Evaluation literacy

Understanding what an eval suite is, how to build one, and how to read the results. This is table-stakes now. AI PMs who can't reason about evals are effectively blind to whether their product is improving.

Prompt and context intuition

You don't need to be an engineer to have good intuition about why a prompt is failing or what additional context would help. The PMs who build this intuition by actually reading through real interactions — dozens of them, weekly — consistently ship better AI features than the ones who stay abstract.

Cost awareness at the feature level

Every AI feature has a per-interaction cost, and that cost is visible in ways infrastructure costs usually aren't. PMs need to think about unit economics for each feature, not just the product as a whole. A feature that works great but costs $0.50 per use might be a better or worse idea than one that works 80% as well for $0.02. Making that call is a product decision, not an engineering one.

Comfort with probabilistic outcomes

A traditional PM asks "does it work?" An AI PM asks "what fraction of the time does it work, and is that fraction good enough for the use case?" This shift is harder than it sounds. It requires holding both "we should ship" and "it will sometimes fail" in mind simultaneously, without flinching at the second part.

The teams that are figuring this out

The product organizations I've seen navigate this transition well all do a few things in common. They pair PMs with ML-literate engineers early and often, so that evaluation practice and product intuition develop together. They invest in eval infrastructure before it feels necessary. They encourage their PMs to spend time reading production logs, not just reviewing metrics. And they reward the unglamorous work of tending an eval suite the way they reward shipping features.

The best AI PMs aren't the ones who learned machine learning. They're the ones who learned to operate in a world where product behavior is discovered, not designed — and who built the habits that make that world tractable.

Where to start

If you're an experienced PM trying to make this shift, the most valuable thing you can do this week is spend an hour reading real production interactions from an AI feature. Not summaries, not metrics — actual inputs, outputs, and user follow-ups. A hundred samples is enough to start noticing patterns. A thousand is enough to build real intuition.

Everything else — the frameworks, the evals, the unit economics — becomes more concrete once you've seen how the product actually behaves for real users. That grounding is the thing traditional PM training doesn't teach, and it's the thing that most reliably separates PMs who ship good AI products from PMs who ship impressive demos that users quietly abandon.