The promise and the asterisk
Structured outputs — the ability to guarantee that an LLM returns valid JSON matching a specific schema — were one of the quietly transformative features of the past two years. Before they existed, every team using LLMs for anything beyond chat had to write defensive parsing code, retry logic for malformed responses, and validation layers that often did more work than the model itself. After they existed, a lot of that machinery became unnecessary.
That's the marketing version. The reality, when you actually deploy these features across multiple providers and at production scale, has more asterisks than the announcements led anyone to expect. "Guaranteed JSON" is a real thing, but the guarantees stop short of what most teams assume when they read the docs. The differences between providers matter, and they matter most exactly when you're trying to build something portable across them.
What "structured output" actually means at each provider
All major providers now offer some form of structured output. The terminology varies — JSON mode, structured outputs, response format, function calling output schemas — but the underlying capability is roughly the same: you give the model a JSON schema, and the model promises to return output that conforms to it.
The differences show up in three places.
What gets guaranteed
The strongest providers offer schema-constrained generation: the decoding process itself is constrained at the token level so that the model cannot produce output that violates the schema. The output is mathematically guaranteed to parse and match the schema. This is the version most people imagine when they hear "guaranteed JSON."
A weaker version is schema-prompted generation: the model is told the schema and trained to follow it, but no token-level constraints are applied. The output usually matches, but rare edge cases produce malformed responses. You still need a parsing layer with retries.
The weakest version is JSON mode without schema: the model is constrained to produce valid JSON, but the structure of that JSON is whatever the model decides. This is barely an improvement over carefully prompted free-text generation.
If you're not sure which version your provider gives you, the test is simple: try to use a schema with a deeply nested or unusual constraint and see if the model ever violates it. If yes, you don't have token-level constraints, regardless of what the docs say.
Which schema features are supported
This is where portability breaks. The JSON Schema spec is large, and no provider supports all of it. The common subset that works everywhere covers basic types, required fields, and enums. Beyond that, you start hitting differences:
- Recursive schemas work on some providers and not others
oneOf/anyOf/allOfsupport varies wildly- String patterns and format constraints are partially supported, and the partial support is provider-specific
- Maximum nesting depth is limited everywhere, but the limits differ
- Schema size has hard caps that vary by provider
If you write a schema that uses any feature beyond the basic subset, you're locking yourself to one provider. The teams that learn this the hard way end up rewriting schemas during a migration that should have been straightforward.
How errors surface
When generation fails — and it does, occasionally, even with token-level constraints — providers handle it differently. Some return an error response. Some return a "best effort" output that may not match the schema. Some retry internally and may take longer than expected. Some silently truncate output if it grows too long mid-generation.
This matters because your error handling logic needs to know what to expect. A system designed for one provider's failure mode often breaks subtly when pointed at another's.
The patterns that work across providers
Use the lowest common denominator schema
If you need portability, write schemas that work everywhere: basic types, required fields, enums for known value sets, no recursion, modest nesting depth. Save the fancy stuff for cases where you've explicitly committed to a provider. This is annoying — the fancy features are useful — but it's the price of being able to swap providers later.
Validate after parsing, always
Even with token-level guarantees, validate the parsed output against your schema in your own code. The reasons are pragmatic: the guarantees occasionally break, the parsing can hit edge cases the docs don't mention, and your validation code becomes the place where you handle business logic that goes beyond the schema (value ranges, cross-field constraints, sanity checks). Treat the model's "guarantee" as a strong hint, not a contract.
Keep schemas close to the use case
Resist the urge to build a single "universal" schema for all your structured outputs. Different features need different schemas, and trying to share schemas across them creates coupling that hurts more than it helps. A small, tight schema per feature is easier to evolve, easier to debug, and easier to migrate.
Plan for graceful degradation
When a structured output fails, what does your system do? The answers worth having:
- Retry once with a slight prompt variation — catches transient failures
- Fall back to a parsing-tolerant mode — accept partial output, fill defaults for missing fields
- Escalate to a different model — sometimes the smaller model can't handle the schema and the bigger one can
- Surface a clean error to the user — better than silently producing garbage
The teams with the most reliable systems aren't the ones whose models never fail. They're the ones whose systems handle failures gracefully when they happen.
The thing nobody warns you about
Structured outputs change the model's behavior in ways that go beyond formatting. When you constrain a model to produce a specific schema, you're also changing what it can express. The model can't tell you it doesn't know the answer if your schema has a required answer field. It can't ask a clarifying question if your schema only allows for direct responses. The constraints shape the content, not just the form.
The fix is to design schemas that include the model's escape hatches — optional confidence fields, clarification_needed flags, unable_to_answer enum values. Without these, you'll get confidently wrong responses where you wanted thoughtful uncertainty.
Structured outputs solve the parsing problem, not the truthfulness problem. A response that perfectly matches your schema can still be perfectly wrong. Build accordingly.
Where this is going
The trajectory for structured outputs is clear: support is improving, schema coverage is expanding, and the differences between providers are slowly narrowing. The teams that benefit most are the ones who treat structured outputs as a useful but imperfect tool — relying on them for the work they do well, validating their outputs anyway, and designing systems that survive when the guarantees occasionally break. The teams that treat them as a magic parser are the ones writing post-mortems after their first production incident.