generateSpeech()
generateSpeech
is an experimental feature.Generates speech audio from text.
import { experimental_generateSpeech as generateSpeech } from 'ai';import { openai } from '@ai-sdk/openai';import { readFile } from 'fs/promises';
const { audio } = await generateSpeech({ model: openai.speech('tts-1'), text: 'Hello from the AI SDK!',});
console.log(audio);
Import
import { experimental_generateSpeech as generateSpeech } from "ai"
API Signature
Parameters
model:
SpeechModelV1
The speech model to use.
text:
string
The text to generate the speech from.
voice?:
string
The voice to use for the speech.
outputFormat?:
string
The output format to use for the speech e.g. "mp3", "wav", etc.
instructions?:
string
Instructions for the speech generation.
speed?:
number
The speed of the speech generation.
providerOptions?:
Record<string, Record<string, JSONValue>>
Additional provider-specific options.
maxRetries?:
number
Maximum number of retries. Default: 2.
abortSignal?:
AbortSignal
An optional abort signal to cancel the call.
headers?:
Record<string, string>
Additional HTTP headers for the request.
Returns
audio:
GeneratedAudioFile
The generated audio.
GeneratedAudioFile
base64:
string
Audio as a base64 encoded string.
uint8Array:
Uint8Array
Audio as a Uint8Array.
mimeType:
string
MIME type of the audio (e.g. "audio/mpeg").
format:
string
Format of the audio (e.g. "mp3").
warnings:
SpeechWarning[]
Warnings from the model provider (e.g. unsupported settings).
responses:
Array<SpeechModelResponseMetadata>
Response metadata from the provider. There may be multiple responses if we made multiple calls to the model.
SpeechModelResponseMetadata
timestamp:
Date
Timestamp for the start of the generated response.
modelId:
string
The ID of the response model that was used to generate the response.
headers?:
Record<string, string>
Response headers.