2026-05-11
Structured LLM output with Zod: streaming, validation, and recovery
TypeScript patterns to validate structured LLM output, handle streaming, and recover when JSON or schemas fail.
When an LLM must return JSON your app consumes directly, the most common failure is not an HTTP error: it is a string that looks syntactically fine but is semantically useless—truncated JSON, wrong key casing, numbers serialized as strings, empty arrays where objects are required. Streaming makes it worse: chunks arrive before the document can be closed and passed to JSON.parse.
This post assumes Node.js 20+, TypeScript 5.x, Zod 3.x (.safeParse), and a client that exposes a token stream (common with the Vercel AI SDK streamText plus OpenAI or Anthropic as documented for “JSON mode” or structured output). APIs move—pin and verify your @ai-sdk/* or official client versions.
The problem: valid tokens, unusable JSON
Three recurring failure classes:
- Parse error: interrupted stream, unbalanced brackets, stray commas.
- Schema error: valid JSON that breaks your contract (wrong types, missing keys, enums out of range).
- Weak semantics: JSON passes Zod but values are hollow (“description”: “N/A”) because the model filled fields.
This article addresses (1) and (2) in code. (3) needs business rules (minimum lengths, cross-field checks) or a second validation pass.
Internal contract with Zod
Define one schema for your internal API, not for the raw model prompt. The model may return a superset (extra keys): use Zod .strip() or .pick() for stable shapes.
import { z } from 'zod'
export const ExtractSchema = z.object({
title: z.string().min(1).max(200),
tags: z.array(z.string().min(1)).max(10),
confidence: z.number().min(0).max(1),
})
export type Extract = z.infer<typeof ExtractSchema>
After JSON.parse (only on a complete string) always use safeParse, not parse, on hot paths—it returns structured errors without throwing.
const parsed = ExtractSchema.safeParse(unknownJson)
if (!parsed.success) {
// recovery (below)
}
return parsed.data
Streaming: buffer and when to stop
While streaming, do not JSON.parse every delta. Minimal pattern:
- Append text to a string buffer.
- Optional: if the provider mixes reasoning or preambles, isolate the JSON payload (some models wrap
json …fences—strip before parsing). - Only when the stream ends (or an end marker arrives), call
JSON.parseon the full buffer.
If the provider offers native structured output that guarantees valid JSON at end-of-stream, steps 2–3 get easier but Zod stays: the model can still break semantic constraints.
When to skip streaming for tabular output: if the UI expects row-by-row updates but the model emits one large JSON blob, partial updates confuse users. Often you stream free text for UX and run a second non-streaming JSON call, or stream NDJSON (one line = one record) with a simpler per-line schema.
Recovery: failed parse or failed Zod
Layered strategy with a hard attempt budget (e.g. two repairs, then fail):
- Light repair: if
JSON.parsethrows, trimming dangling characters only helps with a trustworthy heuristic—usually safer to send the model a short repair prompt: “This JSON is invalid. Error: …. Return only corrected JSON for this schema.” with the truncated buffer attached. - Zod failure: feed
parsed.error.flatten()orissuesinto that repair prompt so the model knows what to fix. - Degrade: after N attempts return a business error (
422) or enqueue human review—no infinite loops; they burn tokens and rate limits.
const MAX_REPAIR = 2
async function extractWithRepair(
raw: string,
attempt: number,
): Promise<Extract> {
let json: unknown
try {
json = JSON.parse(raw)
} catch {
if (attempt >= MAX_REPAIR) throw new SyntaxError('Invalid JSON after repairs')
const fixed = await repairJsonWithLlm(raw, 'syntax')
return extractWithRepair(fixed, attempt + 1)
}
const r = ExtractSchema.safeParse(json)
if (r.success) return r.data
if (attempt >= MAX_REPAIR) throw new Error('Schema validation failed')
const fixed = await repairJsonWithLlm(raw, JSON.stringify(r.error.flatten()))
return extractWithRepair(fixed, attempt + 1)
}
repairJsonWithLlm is yours: preferably non-streaming, low temperature, known pricing.
Telemetry: what to log without PII
At minimum log:
- model id (e.g.
gpt-4.1,claude-3-5-sonnet-*) and finish_reason when exposed. - End-to-end latency and token in/out if the client surfaces them (OpenAI/Anthropic response bodies or headers per their docs).
- Repair counters and failure kind (
syntaxvszod). - A hash (e.g. SHA-256) of system + user prompt without storing raw user text unless required—helps trace regressions when prompts change.
Opinion, not law: raw JSON in prod logs is fine in staging; in production sample or mask it for compliance and cost.
Verified sources: Structured Outputs, JSON mode, AI SDK
- OpenAI — Structured Outputs vs JSON mode: official docs separate JSON mode (valid JSON syntax) from Structured Outputs tied to a JSON Schema: the latter constrains the emitted object where supported, cutting structural failures compared with “please return JSON” prompting. See Structured model outputs and the JSON mode guide. Keep Zod (or equivalent) as your portability layer across providers, for business-only rules, and when you still call models without structured-output support.
- Vercel AI SDK: current guidance is
Output.object({ schema: z.object({…}) })withstreamText/generateText, consuming typed partial objects viapartialOutputStream. Reference: Generating structured data. Pin and read release notes for your installed major (ai,@ai-sdk/openai, …)—the surface changes frequently. - Hard limits: even with schema guarantees, handle truncated streams, timeouts, and model refusals; in the AI SDK, streaming errors may surface through stream events (
onError) rather than only thrown exceptions—verify against the version you ship.
Operational summary
- Zod after parse, always
safeParse. - Stream = single buffer, parse at end (unless you designed NDJSON).
- Repairs with a fixed budget and metrics on failures.
- Prefer non-streaming for critical monolithic JSON.
Keep handy: OpenAI Structured Outputs / JSON mode, Anthropic’s structured-output docs, and Zod safeParse + flatten.