2026-05-11

Structured LLM output with Zod: streaming, validation, and recovery

TypeScript patterns to validate structured LLM output, handle streaming, and recover when JSON or schemas fail.

When an LLM must return JSON your app consumes directly, the most common failure is not an HTTP error: it is a string that looks syntactically fine but is semantically useless—truncated JSON, wrong key casing, numbers serialized as strings, empty arrays where objects are required. Streaming makes it worse: chunks arrive before the document can be closed and passed to JSON.parse.

This post assumes Node.js 20+, TypeScript 5.x, Zod 3.x (.safeParse), and a client that exposes a token stream (common with the Vercel AI SDK streamText plus OpenAI or Anthropic as documented for “JSON mode” or structured output). APIs move—pin and verify your @ai-sdk/* or official client versions.

The problem: valid tokens, unusable JSON

Three recurring failure classes:

Parse error: interrupted stream, unbalanced brackets, stray commas.
Schema error: valid JSON that breaks your contract (wrong types, missing keys, enums out of range).
Weak semantics: JSON passes Zod but values are hollow (“description”: “N/A”) because the model filled fields.

This article addresses (1) and (2) in code. (3) needs business rules (minimum lengths, cross-field checks) or a second validation pass.

Internal contract with Zod

Define one schema for your internal API, not for the raw model prompt. The model may return a superset (extra keys): use Zod .strip() or .pick() for stable shapes.

import { z } from 'zod'

export const ExtractSchema = z.object({
  title: z.string().min(1).max(200),
  tags: z.array(z.string().min(1)).max(10),
  confidence: z.number().min(0).max(1),
})

export type Extract = z.infer<typeof ExtractSchema>

After JSON.parse (only on a complete string) always use safeParse, not parse, on hot paths—it returns structured errors without throwing.

const parsed = ExtractSchema.safeParse(unknownJson)
if (!parsed.success) {
  // recovery (below)
}
return parsed.data

Streaming: buffer and when to stop

While streaming, do not JSON.parse every delta. Minimal pattern:

Append text to a string buffer.
Optional: if the provider mixes reasoning or preambles, isolate the JSON payload (some models wrap json … fences—strip before parsing).
Only when the stream ends (or an end marker arrives), call JSON.parse on the full buffer.

If the provider offers native structured output that guarantees valid JSON at end-of-stream, steps 2–3 get easier but Zod stays: the model can still break semantic constraints.

When to skip streaming for tabular output: if the UI expects row-by-row updates but the model emits one large JSON blob, partial updates confuse users. Often you stream free text for UX and run a second non-streaming JSON call, or stream NDJSON (one line = one record) with a simpler per-line schema.

Recovery: failed parse or failed Zod

Layered strategy with a hard attempt budget (e.g. two repairs, then fail):

Light repair: if JSON.parse throws, trimming dangling characters only helps with a trustworthy heuristic—usually safer to send the model a short repair prompt: “This JSON is invalid. Error: …. Return only corrected JSON for this schema.” with the truncated buffer attached.
Zod failure: feed parsed.error.flatten() or issues into that repair prompt so the model knows what to fix.
Degrade: after N attempts return a business error (422) or enqueue human review—no infinite loops; they burn tokens and rate limits.

const MAX_REPAIR = 2

async function extractWithRepair(
  raw: string,
  attempt: number,
): Promise<Extract> {
  let json: unknown
  try {
    json = JSON.parse(raw)
  } catch {
    if (attempt >= MAX_REPAIR) throw new SyntaxError('Invalid JSON after repairs')
    const fixed = await repairJsonWithLlm(raw, 'syntax')
    return extractWithRepair(fixed, attempt + 1)
  }
  const r = ExtractSchema.safeParse(json)
  if (r.success) return r.data
  if (attempt >= MAX_REPAIR) throw new Error('Schema validation failed')
  const fixed = await repairJsonWithLlm(raw, JSON.stringify(r.error.flatten()))
  return extractWithRepair(fixed, attempt + 1)
}

repairJsonWithLlm is yours: preferably non-streaming, low temperature, known pricing.

Telemetry: what to log without PII

At minimum log:

model id (e.g. gpt-4.1, claude-3-5-sonnet-*) and finish_reason when exposed.
End-to-end latency and token in/out if the client surfaces them (OpenAI/Anthropic response bodies or headers per their docs).
Repair counters and failure kind (syntax vs zod).
A hash (e.g. SHA-256) of system + user prompt without storing raw user text unless required—helps trace regressions when prompts change.

Opinion, not law: raw JSON in prod logs is fine in staging; in production sample or mask it for compliance and cost.

Verified sources: Structured Outputs, JSON mode, AI SDK

OpenAI — Structured Outputs vs JSON mode: official docs separate JSON mode (valid JSON syntax) from Structured Outputs tied to a JSON Schema: the latter constrains the emitted object where supported, cutting structural failures compared with “please return JSON” prompting. See Structured model outputs and the JSON mode guide. Keep Zod (or equivalent) as your portability layer across providers, for business-only rules, and when you still call models without structured-output support.
Vercel AI SDK: current guidance is Output.object({ schema: z.object({…}) }) with streamText / generateText, consuming typed partial objects via partialOutputStream. Reference: Generating structured data. Pin and read release notes for your installed major (ai, @ai-sdk/openai, …)—the surface changes frequently.
Hard limits: even with schema guarantees, handle truncated streams, timeouts, and model refusals; in the AI SDK, streaming errors may surface through stream events (onError) rather than only thrown exceptions—verify against the version you ship.

Operational summary

Zod after parse, always safeParse.
Stream = single buffer, parse at end (unless you designed NDJSON).
Repairs with a fixed budget and metrics on failures.
Prefer non-streaming for critical monolithic JSON.

Keep handy: OpenAI Structured Outputs / JSON mode, Anthropic’s structured-output docs, and Zod safeParse + flatten.