Generating and Streaming Text
Large language models (LLMs) can generate text in response to a prompt, which can contain instructions and information to process. For example, you can ask a model to come up with a recipe, draft an email, or summarize a document.
The AI SDK Core provides two functions to generate text and stream it from LLMs:
generateText
: Generates text for a given prompt and model.streamText
: Streams text from a given prompt and model.
Advanced LLM features such as tool calling and structured data generation are built on top of text generation.
generateText
You can generate text using the generateText
function. This function is ideal for non-interactive use cases where you need to write text (e.g. drafting email or summarizing web pages) and for agents that use tools.
import { generateText } from 'ai';
const { text } = await generateText({ model: yourModel, prompt: 'Write a vegetarian lasagna recipe for 4 people.',});
You can use more advanced prompts to generate text with more complex instructions and content:
import { generateText } from 'ai';
const { text } = await generateText({ model: yourModel, system: 'You are a professional writer. ' + 'You write simple, clear, and concise content.', prompt: `Summarize the following article in 3-5 sentences: ${article}`,});
The result object of generateText
contains several promises that resolve when all required data is available:
result.text
: The generated text.result.finishReason
: The reason the model finished generating text.result.usage
: The usage of the model during text generation.
streamText
Depending on your model and prompt, it can take a large language model (LLM) up to a minute to finish generating it's response. This delay can be unacceptable for interactive use cases such as chatbots or real-time applications, where users expect immediate responses.
AI SDK Core provides the streamText
function which simplifies streaming text from LLMs:
import { streamText } from 'ai';
const result = streamText({ model: yourModel, prompt: 'Invent a new holiday and describe its traditions.',});
// example: use textStream as an async iterablefor await (const textPart of result.textStream) { console.log(textPart);}
result.textStream
is both a ReadableStream
and an AsyncIterable
.
You can use streamText
on it's own or in combination with AI SDK
UI and AI SDK
RSC.
The result object contains several helper functions to make the integration into AI SDK UI easier:
result.toDataStreamResponse()
: Creates a data stream HTTP response (with tool calls etc.) that can be used in a Next.js App Router API route.result.pipeDataStreamToResponse()
: Writes data stream delta output to a Node.js response-like object.result.toTextStreamResponse()
: Creates a simple text stream HTTP response.result.pipeTextStreamToResponse()
: Writes text delta output to a Node.js response-like object.
streamText
is using backpressure and only generates tokens as they are
requested. You need to consume the stream in order for it to finish.
It also provides several promises that resolve when the stream is finished:
result.text
: The generated text.result.finishReason
: The reason the model finished generating text.result.usage
: The usage of the model during text generation.
onChunk
callback
When using streamText
, you can provide an onChunk
callback that is triggered for each chunk of the stream.
It receives the following chunk types:
text-delta
tool-call
tool-result
tool-call-streaming-start
(whenexperimental_toolCallStreaming
is enabled)tool-call-delta
(whenexperimental_toolCallStreaming
is enabled)
import { streamText } from 'ai';
const result = streamText({ model: yourModel, prompt: 'Invent a new holiday and describe its traditions.', onChunk({ chunk }) { // implement your own logic here, e.g.: if (chunk.type === 'text-delta') { console.log(chunk.text); } },});
onFinish
callback
When using streamText
, you can provide an onFinish
callback that is triggered when the stream is finished (
API Reference
).
It contains the text, usage information, finish reason, messages, and more:
import { streamText } from 'ai';
const result = streamText({ model: yourModel, prompt: 'Invent a new holiday and describe its traditions.', onFinish({ text, finishReason, usage, response }) { // your own logic, e.g. for saving the chat history or recording usage
const messages = response.messages; // messages that were generated },});
fullStream
property
You can read a stream with all events using the fullStream
property.
This can be useful if you want to implement your own UI or handle the stream in a different way.
Here is an example of how to use the fullStream
property:
import { streamText } from 'ai';import { z } from 'zod';
const result = streamText({ model: yourModel, tools: { cityAttractions: { parameters: z.object({ city: z.string() }), execute: async ({ city }) => ({ attractions: ['attraction1', 'attraction2', 'attraction3'], }), }, }, prompt: 'What are some San Francisco tourist attractions?',});
for await (const part of result.fullStream) { switch (part.type) { case 'text-delta': { // handle text delta here break; } case 'tool-call': { switch (part.toolName) { case 'cityAttractions': { // handle tool call here break; } } break; } case 'tool-result': { switch (part.toolName) { case 'cityAttractions': { // handle tool result here break; } } break; } case 'finish': { // handle finish here break; } case 'error': { // handle error here break; } }}
Stream transformation
You can use the experimental_transform
option to transform the stream.
This is useful for e.g. filtering, changing, or smoothing the text stream.
The transformations are applied before the callbacks are invoked and the promises are resolved.
If you e.g. have a transformation that changes all text to uppercase, the onFinish
callback will receive the transformed text.
Smoothing streams
The AI SDK Core provides a smoothStream
function that
can be used to smooth out text streaming.
import { smoothStream, streamText } from 'ai';
const result = streamText({ model, prompt, experimental_transform: smoothStream(),});
Generating Long Text
Most language models have an output limit that is much shorter than their context window. This means that you cannot generate long text in one go, but it is possible to add responses back to the input and continue generating to create longer text.
generateText
and streamText
support such continuations for long text generation using the experimental continueSteps
setting:
import { openai } from '@ai-sdk/openai';import { generateText } from 'ai';
const { text, // combined text usage, // combined usage of all steps} = await generateText({ model: openai('gpt-4o'), // 4096 output tokens maxSteps: 5, // enable multi-step calls experimental_continueSteps: true, prompt: 'Write a book about Roman history, ' + 'from the founding of the city of Rome ' + 'to the fall of the Western Roman Empire. ' + 'Each chapter MUST HAVE at least 1000 words.',});
When experimental_continueSteps
is enabled, only full words are streamed in
streamText
, and both generateText
and streamText
might drop the trailing
tokens of some calls to prevent whitespace issues.
Some models might not always stop correctly on their own and keep generating
until maxSteps
is reached. You can hint the model to stop by e.g. using a
system message such as "Stop when sufficient information was provided."
Examples
You can see generateText
and streamText
in action using various frameworks in the following examples: