Agents
AI agents let the language model execute a series of steps in a non-deterministic way. The model can make tool calling decisions based on the context of the conversation, the user's input, and previous tool calls and results.
One approach to implementing agents is to allow the LLM to choose the next step in a loop.
With generateText
, you can combine tools with maxSteps
.
This makes it possible to implement agents that reason at each step and make decisions based on the context.
Example
This example demonstrates how to create an agent that solves math problems. It has a calculator tool (using math.js) that it can call to evaluate mathematical expressions.
import { openai } from '@ai-sdk/openai';import { generateText, tool } from 'ai';import * as mathjs from 'mathjs';import { z } from 'zod';
const { text: answer } = await generateText({ model: openai('gpt-4o-2024-08-06', { structuredOutputs: true }), tools: { calculate: tool({ description: 'A tool for evaluating mathematical expressions. ' + 'Example expressions: ' + "'1.2 * (2 + 4.5)', '12.7 cm to inch', 'sin(45 deg) ^ 2'.", parameters: z.object({ expression: z.string() }), execute: async ({ expression }) => mathjs.evaluate(expression), }), }, maxSteps: 10, system: 'You are solving math problems. ' + 'Reason step by step. ' + 'Use the calculator when necessary. ' + 'When you give the final answer, ' + 'provide an explanation for how you arrived at it.', prompt: 'A taxi driver earns $9461 per 1-hour of work. ' + 'If he works 12 hours a day and in 1 hour ' + 'he uses 12 liters of petrol with a price of $134 for 1 liter. ' + 'How much money does he earn in one day?',});
console.log(`ANSWER: ${answer}`);
Structured Answers
You can use an answer tool and the toolChoice: 'required'
setting to force
the LLM to answer with a structured output that matches the schema of the answer tool.
The answer tool has no execute
function, so invoking it will terminate the agent.
Alternatively, you can use the experimental_output
setting for generateText
to generate structured outputs.
Example
import { openai } from '@ai-sdk/openai';import { generateText, tool } from 'ai';import 'dotenv/config';import { z } from 'zod';
const { toolCalls } = await generateText({ model: openai('gpt-4o-2024-08-06', { structuredOutputs: true }), tools: { calculate: tool({ description: 'A tool for evaluating mathematical expressions. Example expressions: ' + "'1.2 * (2 + 4.5)', '12.7 cm to inch', 'sin(45 deg) ^ 2'.", parameters: z.object({ expression: z.string() }), execute: async ({ expression }) => mathjs.evaluate(expression), }), // answer tool: the LLM will provide a structured answer answer: tool({ description: 'A tool for providing the final answer.', parameters: z.object({ steps: z.array( z.object({ calculation: z.string(), reasoning: z.string(), }), ), answer: z.string(), }), // no execute function - invoking it will terminate the agent }), }, toolChoice: 'required', maxSteps: 10, system: 'You are solving math problems. ' + 'Reason step by step. ' + 'Use the calculator when necessary. ' + 'The calculator can only do simple additions, subtractions, multiplications, and divisions. ' + 'When you give the final answer, provide an explanation for how you got it.', prompt: 'A taxi driver earns $9461 per 1-hour work. ' + 'If he works 12 hours a day and in 1 hour he uses 14-liters petrol with price $134 for 1-liter. ' + 'How much money does he earn in one day?',});
console.log(`FINAL TOOL CALLS: ${JSON.stringify(toolCalls, null, 2)}`);
Accessing all steps
Calling generateText
with maxSteps
can result in several calls to the LLM (steps).
You can access information from all steps by using the steps
property of the response.
import { generateText } from 'ai';
const { steps } = await generateText({ model: openai('gpt-4-turbo'), maxSteps: 10, // ...});
// extract all tool calls from the steps:const allToolCalls = steps.flatMap(step => step.toolCalls);
Getting notified on each completed step
You can use the onStepFinish
callback to get notified on each completed step.
It is triggered when a step is finished,
i.e. all text deltas, tool calls, and tool results for the step are available.
import { generateText } from 'ai';
const result = await generateText({ model: yourModel, maxSteps: 10, onStepFinish({ text, toolCalls, toolResults, finishReason, usage }) { // your own logic, e.g. for saving the chat history or recording usage }, // ...});