OpenAI Provider
The OpenAI provider contains language model support for the OpenAI chat and completion APIs and embedding model support for the OpenAI embeddings API.
Setup
The OpenAI provider is available in the @ai-sdk/openai
module. You can install it with
pnpm add @ai-sdk/openai
Provider Instance
You can import the default provider instance openai
from @ai-sdk/openai
:
import { openai } from '@ai-sdk/openai';
If you need a customized setup, you can import createOpenAI
from @ai-sdk/openai
and create a provider instance with your settings:
import { createOpenAI } from '@ai-sdk/openai';
const openai = createOpenAI({ // custom settings, e.g. compatibility: 'strict', // strict mode, enable when using the OpenAI API});
You can use the following optional settings to customize the OpenAI provider instance:
-
baseURL string
Use a different URL prefix for API calls, e.g. to use proxy servers. The default prefix is
https://api.openai.com/v1
. -
apiKey string
API key that is being sent using the
Authorization
header. It defaults to theOPENAI_API_KEY
environment variable. -
name string
The provider name. You can set this when using OpenAI compatible providers to change the model provider property. Defaults to
openai
. -
organization string
OpenAI Organization.
-
project string
OpenAI project.
-
headers Record<string,string>
Custom headers to include in the requests.
-
fetch (input: RequestInfo, init?: RequestInit) => Promise<Response>
Custom fetch implementation. Defaults to the global
fetch
function. You can use it as a middleware to intercept requests, or to provide a custom fetch implementation for e.g. testing. -
compatibility "strict" | "compatible"
OpenAI compatibility mode. Should be set to
strict
when using the OpenAI API, andcompatible
when using 3rd party providers. Incompatible
mode, newer information such asstreamOptions
are not being sent, resulting inNaN
token usage. Defaults to 'compatible'.
Language Models
The OpenAI provider instance is a function that you can invoke to create a language model:
const model = openai('gpt-4-turbo');
It automatically selects the correct API based on the model id. You can also pass additional settings in the second argument:
const model = openai('gpt-4-turbo', { // additional settings});
The available options depend on the API that's automatically chosen for the model (see below).
If you want to explicitly select a specific model API, you can use .chat
or .completion
.
Example
You can use OpenAI language models to generate text with the generateText
function:
import { openai } from '@ai-sdk/openai';import { generateText } from 'ai';
const { text } = await generateText({ model: openai('gpt-4-turbo'), prompt: 'Write a vegetarian lasagna recipe for 4 people.',});
OpenAI language models can also be used in the streamText
, generateObject
, streamObject
, and streamUI
functions
(see AI SDK Core and AI SDK RSC).
Chat Models
You can create models that call the OpenAI chat API using the .chat()
factory method.
The first argument is the model id, e.g. gpt-4
.
The OpenAI chat models support tool calls and some have multi-modal capabilities.
const model = openai.chat('gpt-3.5-turbo');
OpenAI chat models support also some model specific settings that are not part of the standard call settings. You can pass them as an options argument:
const model = openai.chat('gpt-3.5-turbo', { logitBias: { // optional likelihood for specific tokens '50256': -100, }, user: 'test-user', // optional unique user identifier});
The following optional settings are available for OpenAI chat models:
-
logitBias Record<number, number>
Modifies the likelihood of specified tokens appearing in the completion.
Accepts a JSON object that maps tokens (specified by their token ID in the GPT tokenizer) to an associated bias value from -100 to 100. You can use this tokenizer tool to convert text to token IDs. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.
As an example, you can pass
{"50256": -100}
to prevent the token from being generated. -
logProbs boolean | number
Return the log probabilities of the tokens. Including logprobs will increase the response size and can slow down response times. However, it can be useful to better understand how the model is behaving.
Setting to true will return the log probabilities of the tokens that were generated.
Setting to a number will return the log probabilities of the top n tokens that were generated.
-
parallelToolCalls boolean
Whether to enable parallel function calling during tool use. Defaults to
true
. -
useLegacyFunctionCalls boolean
Whether to use legacy function calling. Defaults to false.
Required by some open source inference engines which do not support the
tools
API. May also provide a workaround forparallelToolCalls
resulting in the provider buffering tool calls, which causesstreamObject
to be non-streaming.Prefer setting
parallelToolCalls: false
over this option. -
structuredOutputs boolean
Whether to use structured outputs. Defaults to
false
.When enabled, tool calls and object generation will be strict and follow the provided schema.
-
user string
A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse. Learn more.
-
downloadImages boolean
Automatically download images and pass the image as data to the model. OpenAI supports image URLs for public models, so this is only needed for private models or when the images are not publicly accessible. Defaults to
false
. -
simulateStreaming boolean
Simulates streaming by using a normal generate call and returning it as a stream. Enable this if the model that you are using does not support streaming. Defaults to
false
. -
reasoningEffort 'low' | 'medium' | 'high'
Reasoning effort for reasoning models. Defaults to
medium
. If you useexperimental_providerMetadata
to set thereasoningEffort
option, this model setting will be ignored.
Structured Outputs
You can enable OpenAI structured outputs by setting the structuredOutputs
option to true
.
Structured outputs are a form of grammar-guided generation.
The JSON schema is used as a grammar and the outputs will always conform to the schema.
import { openai } from '@ai-sdk/openai';import { generateObject } from 'ai';import { z } from 'zod';
const result = await generateObject({ model: openai('gpt-4o-2024-08-06', { structuredOutputs: true, }), schemaName: 'recipe', schemaDescription: 'A recipe for lasagna.', schema: z.object({ name: z.string(), ingredients: z.array( z.object({ name: z.string(), amount: z.string(), }), ), steps: z.array(z.string()), }), prompt: 'Generate a lasagna recipe.',});
console.log(JSON.stringify(result.object, null, 2));
OpenAI structured outputs have several limitations, in particular around the supported schemas, and are therefore opt-in.
For example, optional schema properties are not supported.
You need to change Zod .nullish()
and .optional()
to .nullable()
.
Predicted Outputs
OpenAI supports predicted outputs for gpt-4o
and gpt-4o-mini
.
Predicted outputs help you reduce latency by allowing you to specify a base text that the model should modify.
You can enable predicted outputs by adding the prediction
option to the experimental_providerMetadata.openai
object:
const result = streamText({ model: openai('gpt-4o'), messages: [ { role: 'user', content: 'Replace the Username property with an Email property.', }, { role: 'user', content: existingCode, }, ], experimental_providerMetadata: { openai: { prediction: { type: 'content', content: existingCode, }, }, },});
Image Detail
You can use the openai
provider metadata to set the image generation detail to high
, low
, or auto
:
const result = await generateText({ model: openai('gpt-4o'), messages: [ { role: 'user', content: [ { type: 'text', text: 'Describe the image in detail.' }, { type: 'image', image: 'https://github.com/vercel/ai/blob/main/examples/ai-core/data/comic-cat.png?raw=true',
// OpenAI specific extension - image detail: experimental_providerMetadata: { openai: { imageDetail: 'low' }, }, }, ], }, ],});
Distillation
OpenAI supports model distillation for some models. If you want to store a generation for use in the distillation process, you can add the store
option to the experimental_providerMetadata.openai
object. This will save the generation to the OpenAI platform for later use in distillation.
import { openai } from '@ai-sdk/openai';import { generateText } from 'ai';import 'dotenv/config';
async function main() { const { text, usage } = await generateText({ model: openai('gpt-4o-mini'), prompt: 'Who worked on the original macintosh?', experimental_providerMetadata: { openai: { store: true, metadata: { custom: 'value', }, }, }, });
console.log(text); console.log(); console.log('Usage:', usage);}
main().catch(console.error);
Reasoning Models
OpenAI has introduced the o1
series of reasoning models.
Currently, o1
, o1-mini
, and o1-preview
are available.
Reasoning models currently only generate text, have several limitations, and are only supported using generateText
and streamText
.
Reasoning models support additional settings and response metadata:
-
You can use
experimental_providerMetadata
to set- the
maxCompletionTokens
option, which determines the maximum number of both reasoning and output tokens that the model generates. - the
reasoningEffort
option (or alternatively thereasoningEffort
model setting), which determines the amount of reasoning the model performs.
- the
-
You can use response
experimental_providerMetadata
to access the number of reasoning tokens that the model generated.
import { openai } from '@ai-sdk/openai';import { generateText } from 'ai';
const { text, usage, experimental_providerMetadata } = await generateText({ model: openai('o1-mini'), prompt: 'Invent a new holiday and describe its traditions.', experimental_providerMetadata: { openai: { reasoningEffort: 'low', maxCompletionTokens: 1000, }, },});
console.log(text);console.log('Usage:', { ...usage, reasoningTokens: experimental_providerMetadata?.openai?.reasoningTokens,});
Reasoning models like o1-mini
and o1-preview
require additional runtime
inference to complete their reasoning phase before generating a response. This
introduces longer latency compared to other models, with o1-preview
exhibiting significantly more inference time than o1-mini
.
OpenAI has introduced a new developer
message type for reasoning models.
However, system
messages are automatically converted to developer
messages
by OpenAI. You can pass system
messages to reasoning models and set the
system
instruction as usual.
The o1
reasoning model currently does not support streaming. You can use the
simulateStreaming
option to simulate streaming.
Prompt Caching
OpenAI has introduced Prompt Caching for supported models
including gpt-4o
, gpt-4o-mini
, o1-preview
, and o1-mini
.
- Prompt caching is automatically enabled for these models, when the prompt is 1024 tokens or longer. It does not need to be explicitly enabled.
- You can use response
experimental_providerMetadata
to access the number of prompt tokens that were a cache hit. - Note that caching behavior is dependent on load on OpenAI's infrastructure. Prompt prefixes generally remain in the cache following 5-10 minutes of inactivity before they are evicted, but during off-peak periods they may persist for up to an hour.
import { openai } from '@ai-sdk/openai';import { generateText } from 'ai';
const { text, usage, experimental_providerMetadata } = await generateText({ model: openai('gpt-4o-mini'), prompt: `A 1024-token or longer prompt...`,});
console.log(`usage:`, { ...usage, cachedPromptTokens: experimental_providerMetadata?.openai?.cachedPromptTokens,});
Audio Input
With the gpt-4o-audio-preview
model, you can pass audio files to the model.
The gpt-4o-audio-preview
model is currently in preview and requires at least
some audio inputs. It will not work with non-audio data.
import { openai } from '@ai-sdk/openai';import { generateText } from 'ai';
const result = await generateText({ model: openai('gpt-4o-audio-preview'), messages: [ { role: 'user', content: [ { type: 'text', text: 'What is the audio saying?' }, { type: 'file', mimeType: 'audio/mpeg', data: fs.readFileSync('./data/galileo.mp3'), }, ], }, ],});
Completion Models
You can create models that call the OpenAI completions API using the .completion()
factory method.
The first argument is the model id.
Currently only gpt-3.5-turbo-instruct
is supported.
const model = openai.completion('gpt-3.5-turbo-instruct');
OpenAI completion models support also some model specific settings that are not part of the standard call settings. You can pass them as an options argument:
const model = openai.completion('gpt-3.5-turbo-instruct', { echo: true, // optional, echo the prompt in addition to the completion logitBias: { // optional likelihood for specific tokens '50256': -100, }, suffix: 'some text', // optional suffix that comes after a completion of inserted text user: 'test-user', // optional unique user identifier});
The following optional settings are available for OpenAI completion models:
-
echo: boolean
Echo back the prompt in addition to the completion.
-
logitBias Record<number, number>
Modifies the likelihood of specified tokens appearing in the completion.
Accepts a JSON object that maps tokens (specified by their token ID in the GPT tokenizer) to an associated bias value from -100 to 100. You can use this tokenizer tool to convert text to token IDs. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.
As an example, you can pass
{"50256": -100}
to prevent the <|endoftext|> token from being generated. -
logProbs boolean | number
Return the log probabilities of the tokens. Including logprobs will increase the response size and can slow down response times. However, it can be useful to better understand how the model is behaving.
Setting to true will return the log probabilities of the tokens that were generated.
Setting to a number will return the log probabilities of the top n tokens that were generated.
-
suffix string
The suffix that comes after a completion of inserted text.
-
user string
A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse. Learn more.
Model Capabilities
Model | Image Input | Audio Input | Object Generation | Tool Usage |
---|---|---|---|---|
gpt-4o | ||||
gpt-4o-mini | ||||
gpt-4o-audio-preview | ||||
gpt-4-turbo | ||||
gpt-4 | ||||
gpt-3.5-turbo | ||||
o1 | ||||
o1-mini | ||||
o1-preview |
The table above lists popular models. Please see the OpenAI docs for a full list of available models. The table above lists popular models. You can also pass any available provider model ID as a string if needed.
Embedding Models
You can create models that call the OpenAI embeddings API
using the .embedding()
factory method.
const model = openai.embedding('text-embedding-3-large');
OpenAI embedding models support several additional settings. You can pass them as an options argument:
const model = openai.embedding('text-embedding-3-large', { dimensions: 512 // optional, number of dimensions for the embedding user: 'test-user' // optional unique user identifier})
The following optional settings are available for OpenAI embedding models:
-
dimensions: number
The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.
-
user string
A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse. Learn more.
Model Capabilities
Model | Default Dimensions | Custom Dimensions |
---|---|---|
text-embedding-3-large | 3072 | |
text-embedding-3-small | 1536 | |
text-embedding-ada-002 | 1536 |
Image Models
You can create models that call the OpenAI image generation API
using the .image()
factory method.
const model = openai.image('dall-e-3');
Model Capabilities
Model | Supported Sizes |
---|---|
dall-e-3 | 1024x1024, 1792x1024, 1024x1792 |
dall-e-2 | 256x256, 512x512, 1024x1024 |