Agents

AI agents let the language model execute a series of steps in a non-deterministic way. The model can make tool calling decisions based on the context of the conversation, the user's input, and previous tool calls and results.

One approach to implementing agents is to allow the LLM to choose the next step in a loop. With generateText, you can combine tools with maxSteps. This makes it possible to implement agents that reason at each step and make decisions based on the context.

Example

This example demonstrates how to create an agent that solves math problems. It has a calculator tool (using math.js) that it can call to evaluate mathematical expressions.

import { openai } from '@ai-sdk/openai';
import { generateText, tool } from 'ai';
import * as mathjs from 'mathjs';
import { z } from 'zod';
const { text: answer } = await generateText({
model: openai('gpt-4o-2024-08-06', { structuredOutputs: true }),
tools: {
calculate: tool({
description:
'A tool for evaluating mathematical expressions. ' +
'Example expressions: ' +
"'1.2 * (2 + 4.5)', '12.7 cm to inch', 'sin(45 deg) ^ 2'.",
parameters: z.object({ expression: z.string() }),
execute: async ({ expression }) => mathjs.evaluate(expression),
}),
},
maxSteps: 10,
system:
'You are solving math problems. ' +
'Reason step by step. ' +
'Use the calculator when necessary. ' +
'When you give the final answer, ' +
'provide an explanation for how you arrived at it.',
prompt:
'A taxi driver earns $9461 per 1-hour of work. ' +
'If he works 12 hours a day and in 1 hour ' +
'he uses 12 liters of petrol with a price of $134 for 1 liter. ' +
'How much money does he earn in one day?',
});
console.log(`ANSWER: ${answer}`);

Structured Answers

You can use an answer tool and the toolChoice: 'required' setting to force the LLM to answer with a structured output that matches the schema of the answer tool. The answer tool has no execute function, so invoking it will terminate the agent.

Alternatively, you can use the experimental_output setting for generateText to generate structured outputs.

Example

import { openai } from '@ai-sdk/openai';
import { generateText, tool } from 'ai';
import 'dotenv/config';
import { z } from 'zod';
const { toolCalls } = await generateText({
model: openai('gpt-4o-2024-08-06', { structuredOutputs: true }),
tools: {
calculate: tool({
description:
'A tool for evaluating mathematical expressions. Example expressions: ' +
"'1.2 * (2 + 4.5)', '12.7 cm to inch', 'sin(45 deg) ^ 2'.",
parameters: z.object({ expression: z.string() }),
execute: async ({ expression }) => mathjs.evaluate(expression),
}),
// answer tool: the LLM will provide a structured answer
answer: tool({
description: 'A tool for providing the final answer.',
parameters: z.object({
steps: z.array(
z.object({
calculation: z.string(),
reasoning: z.string(),
}),
),
answer: z.string(),
}),
// no execute function - invoking it will terminate the agent
}),
},
toolChoice: 'required',
maxSteps: 10,
system:
'You are solving math problems. ' +
'Reason step by step. ' +
'Use the calculator when necessary. ' +
'The calculator can only do simple additions, subtractions, multiplications, and divisions. ' +
'When you give the final answer, provide an explanation for how you got it.',
prompt:
'A taxi driver earns $9461 per 1-hour work. ' +
'If he works 12 hours a day and in 1 hour he uses 14-liters petrol with price $134 for 1-liter. ' +
'How much money does he earn in one day?',
});
console.log(`FINAL TOOL CALLS: ${JSON.stringify(toolCalls, null, 2)}`);

Accessing all steps

Calling generateText with maxSteps can result in several calls to the LLM (steps). You can access information from all steps by using the steps property of the response.

import { generateText } from 'ai';
const { steps } = await generateText({
model: openai('gpt-4-turbo'),
maxSteps: 10,
// ...
});
// extract all tool calls from the steps:
const allToolCalls = steps.flatMap(step => step.toolCalls);

Getting notified on each completed step

You can use the onStepFinish callback to get notified on each completed step. It is triggered when a step is finished, i.e. all text deltas, tool calls, and tool results for the step are available.

import { generateText } from 'ai';
const result = await generateText({
model: yourModel,
maxSteps: 10,
onStepFinish({ text, toolCalls, toolResults, finishReason, usage }) {
// your own logic, e.g. for saving the chat history or recording usage
},
// ...
});