AI SDK CoreGenerating Structured Data

Generating Structured Data

While text generation can be useful, your usecase will likely call for generating structured data. For example, you might want to extract information from text, classify data, or generate synthetic data.

Many language models are capable of generating structured data, often defined as using "JSON modes" or "tools". However, you need to manually provide schemas and then validate the generated data as LLMs can produce incorrect or incomplete structured data.

The Vercel AI SDK standardises structured object generation across model providers with the generateObject function.

The generateObject function uses Zod schemas or JSON schemas to specify the shape of the data that you want, and the AI model will generate data that conforms to that structure. The schema is also used to validate the generated data, ensuring type safety and correctness.

import { generateObject } from 'ai';
import { z } from 'zod';
const { object } = await generateObject({
model: yourModel,
schema: z.object({
recipe: z.object({
name: z.string(),
ingredients: z.array(z.object({ name: z.string(), amount: z.string() })),
steps: z.array(z.string()),
}),
}),
prompt: 'Generate a lasagna recipe.',
});

Specifying Generation Mode

While some models (like OpenAI) natively support object generation, others require alternative methods, like modified tool calling. The generateObject function allows you to specify the method it will use to return structured data.

  • auto: The provider will choose the best mode for the model. This recommended mode is used by default.
  • tool: A tool with the JSON schema as parameters is provided and the provider is instructed to use it.
  • json: The JSON schema and an instruction is injected into the prompt. If the provider supports JSON mode, it is enabled. If the provider supports JSON grammars, the grammar is used.
Please note that not every provider supports all generation modes.

Streaming Objects

Given the added complexity of returning structured data, model response time can be unacceptable for your interactive use case. With the streamObject function, you can stream the model's response as it is generated.

import { streamObject } from 'ai';
const { partialObjectStream } = await streamObject({
// ...
});
// use partialObjectStream as an async iterable
for await (const partialObject of partialObjectStream) {
console.log(partialObject);
}

You can use streamObject to stream generated UIs in combination with React Server Components (see Generative UI)) or the useObject hook.

Schema Writing Tips

The mapping from Zod schemas to LLM inputs (typically JSON schema) is not always straightforward, since the mapping is not one-to-one. Please checkout the following tips and the Prompt Engineering with Tools guide.

Generating Arrays

Most models require an object as the top-level schema. If you want to generate an array, you can wrap it in an object with a single descriptive key and use destructuring to access the array.

const {
object: { users },
} = await generateObject({
model: yourModel,
schema: z.object({
users: z.array(
z.object({
login: z.string(),
fullName: z.string(),
age: z.number(),
}),
),
}),
prompt: 'Generate a list of fake user profiles for testing.',
});
console.log('users', users);

Dates

Zod expects JavaScript Date objects, but most models return dates as strings. You can use the z.string().datetime() method to specify and validate datetime strings.

const result = await generateObject({
model: openai('gpt-4o'),
schema: z.object({
user: z.object({
login: z.string(),
lastSeen: z
.string()
.datetime()
.describe('Last time the user was seen (ISO 8601 date string().'),
}),
}),
prompt: 'Generate a fake user profile for testing.',
});

Error Handling

When you use generateObject, errors are thrown when the model fails to generate proper JSON (JSONParseError) or when the generated JSON does not match the schema (TypeValidationError). Both error types contain additional information, e.g. the generated text or the invalid value.

You can use this to e.g. design a function that safely process the result object and also returns values in error cases:

import { openai } from '@ai-sdk/openai';
import { JSONParseError, TypeValidationError, generateObject } from 'ai';
import { z } from 'zod';
const recipeSchema = z.object({
recipe: z.object({
name: z.string(),
ingredients: z.array(z.object({ name: z.string(), amount: z.string() })),
steps: z.array(z.string()),
}),
});
type Recipe = z.infer<typeof recipeSchema>;
async function generateRecipe(
food: string,
): Promise<
| { type: 'success'; recipe: Recipe }
| { type: 'parse-error'; text: string }
| { type: 'validation-error'; value: unknown }
| { type: 'unknown-error'; error: unknown }
> {
try {
const result = await generateObject({
model: openai('gpt-4-turbo'),
schema: recipeSchema,
prompt: `Generate a ${food} recipe.`,
});
return { type: 'success', recipe: result.object };
} catch (error) {
if (TypeValidationError.isTypeValidationError(error)) {
return { type: 'validation-error', value: error.value };
} else if (JSONParseError.isJSONParseError(error)) {
return { type: 'parse-error', text: error.text };
} else {
return { type: 'unknown-error', error };
}
}
}

Examples

You can see generateObject and streamObject in action using various frameworks in the following examples:

generateObject

streamObject