Generating Structured Data
While text generation can be useful, your use case will likely call for generating structured data. For example, you might want to extract information from text, classify data, or generate synthetic data.
Many language models are capable of generating structured data, often defined as using "JSON modes" or "tools". However, you need to manually provide schemas and then validate the generated data as LLMs can produce incorrect or incomplete structured data.
The AI SDK standardises structured object generation across model providers
with the generateObject
and streamObject
functions.
You can use both functions with different output strategies, e.g. array
, object
, or no-schema
,
and with different generation modes, e.g. auto
, tool
, or json
.
You can use Zod schemas or JSON schemas to specify the shape of the data that you want,
and the AI model will generate data that conforms to that structure.
Generate Object
The generateObject
generates structured data from a prompt.
The schema is also used to validate the generated data, ensuring type safety and correctness.
import { generateObject } from 'ai';import { z } from 'zod';
const { object } = await generateObject({ model: yourModel, schema: z.object({ recipe: z.object({ name: z.string(), ingredients: z.array(z.object({ name: z.string(), amount: z.string() })), steps: z.array(z.string()), }), }), prompt: 'Generate a lasagna recipe.',});
Stream Object
Given the added complexity of returning structured data, model response time can be unacceptable for your interactive use case.
With the streamObject
function, you can stream the model's response as it is generated.
import { streamObject } from 'ai';
const { partialObjectStream } = await streamObject({ // ...});
// use partialObjectStream as an async iterablefor await (const partialObject of partialObjectStream) { console.log(partialObject);}
You can use streamObject
to stream generated UIs in combination with React Server Components (see Generative UI)) or the useObject
hook.
Output Strategy
You can use both functions with different output strategies, e.g. array
, object
, or no-schema
.
Object
The default output strategy is object
, which returns the generated data as an object.
You don't need to specify the output strategy if you want to use the default.
Array
If you want to generate an array of objects, you can set the output strategy to array
.
When you use the array
output strategy, the schema specifies the shape of an array element.
With streamObject
, you can also stream the generated array elements using elementStream
.
import { openai } from '@ai-sdk/openai';import { streamObject } from 'ai';import { z } from 'zod';
const { elementStream } = await streamObject({ model: openai('gpt-4-turbo'), output: 'array', schema: z.object({ name: z.string(), class: z .string() .describe('Character class, e.g. warrior, mage, or thief.'), description: z.string(), }), prompt: 'Generate 3 hero descriptions for a fantasy role playing game.',});
for await (const hero of elementStream) { console.log(hero);}
Enum
If you want to generate a specific enum value, e.g. for classification tasks,
you can set the output strategy to enum
and provide a list of possible values in the enum
parameter.
generateObject
.import { generateObject } from 'ai';
const { object } = await generateObject({ model: yourModel, output: 'enum', enum: ['action', 'comedy', 'drama', 'horror', 'sci-fi'], prompt: 'Classify the genre of this movie plot: ' + '"A group of astronauts travel through a wormhole in search of a ' + 'new habitable planet for humanity."',});
No Schema
In some cases, you might not want to use a schema,
for example when the data is a dynamic user request.
You can use the output
setting to set the output format to no-schema
in those cases
and omit the schema parameter.
import { openai } from '@ai-sdk/openai';import { generateObject } from 'ai';
const { object } = await generateObject({ model: openai('gpt-4-turbo'), output: 'no-schema', prompt: 'Generate a lasagna recipe.',});
Generation Mode
While some models (like OpenAI) natively support object generation, others require alternative methods, like modified tool calling. The generateObject
function allows you to specify the method it will use to return structured data.
auto
: The provider will choose the best mode for the model. This recommended mode is used by default.tool
: A tool with the JSON schema as parameters is provided and the provider is instructed to use it.json
: The response format is set to JSON when supported by the provider, e.g. via json modes or grammar-guided generation. If grammar-guided generation is not supported, the JSON schema and instructions to generate JSON that conforms to the schema are injected into the system prompt.
Please note that not every provider supports all generation modes. Some providers do not support object generation at all.
Schema Name and Description
You can optionally specify a name and description for the schema. These are used by some providers for additional LLM guidance, e.g. via tool or schema name.
import { generateObject } from 'ai';import { z } from 'zod';
const { object } = await generateObject({ model: yourModel, schemaName: 'Recipe', schemaDescription: 'A recipe for a dish.', schema: z.object({ name: z.string(), ingredients: z.array(z.object({ name: z.string(), amount: z.string() })), steps: z.array(z.string()), }), prompt: 'Generate a lasagna recipe.',});
Schema Writing Tips
The mapping from Zod schemas to LLM inputs (typically JSON schema) is not always straightforward, since the mapping is not one-to-one. Please checkout the following tips and the Prompt Engineering with Tools guide.
Dates
Zod expects JavaScript Date objects, but models return dates as strings.
You can specify and validate the date format using z.string().datetime()
or z.string().date()
,
and then use a Zod transformer to convert the string to a Date object.
const result = await generateObject({ model: openai('gpt-4-turbo'), schema: z.object({ events: z.array( z.object({ event: z.string(), date: z .string() .date() .transform(value => new Date(value)), }), ), }), prompt: 'List 5 important events from the the year 2000.',});
Error Handling
When you use generateObject
, errors are thrown when the model fails to generate proper JSON (JSONParseError
)
or when the generated JSON does not match the schema (TypeValidationError
).
Both error types contain additional information, e.g. the generated text or the invalid value.
You can use this to e.g. design a function that safely process the result object and also returns values in error cases:
import { openai } from '@ai-sdk/openai';import { JSONParseError, TypeValidationError, generateObject } from 'ai';import { z } from 'zod';
const recipeSchema = z.object({ recipe: z.object({ name: z.string(), ingredients: z.array(z.object({ name: z.string(), amount: z.string() })), steps: z.array(z.string()), }),});
type Recipe = z.infer<typeof recipeSchema>;
async function generateRecipe( food: string,): Promise< | { type: 'success'; recipe: Recipe } | { type: 'parse-error'; text: string } | { type: 'validation-error'; value: unknown } | { type: 'unknown-error'; error: unknown }> { try { const result = await generateObject({ model: openai('gpt-4-turbo'), schema: recipeSchema, prompt: `Generate a ${food} recipe.`, });
return { type: 'success', recipe: result.object }; } catch (error) { if (TypeValidationError.isTypeValidationError(error)) { return { type: 'validation-error', value: error.value }; } else if (JSONParseError.isJSONParseError(error)) { return { type: 'parse-error', text: error.text }; } else { return { type: 'unknown-error', error }; } }}
More Examples
You can see generateObject
and streamObject
in action using various frameworks in the following examples: