`generateSpeech()`

generateSpeech is an experimental feature.

Generates speech audio from text.

import { experimental_generateSpeech as generateSpeech } from 'ai';
import { openai } from '@ai-sdk/openai';
import { readFile } from 'fs/promises';

const { audio } = await generateSpeech({
  model: openai.speech('tts-1'),
  text: 'Hello from the AI SDK!',
});

console.log(audio);

Import

import { experimental_generateSpeech as generateSpeech } from "ai"

API Signature

Parameters

model:

SpeechModelV1

The speech model to use.

text:

string

The text to generate the speech from.

voice?:

string

The voice to use for the speech.

outputFormat?:

string

The output format to use for the speech e.g. "mp3", "wav", etc.

instructions?:

string

Instructions for the speech generation.

speed?:

number

The speed of the speech generation.

providerOptions?:

Record<string, Record<string, JSONValue>>

Additional provider-specific options.

maxRetries?:

number

Maximum number of retries. Default: 2.

abortSignal?:

AbortSignal

An optional abort signal to cancel the call.

headers?:

Record<string, string>

Additional HTTP headers for the request.

Returns

audio:

GeneratedAudioFile

The generated audio.

GeneratedAudioFile

base64:

string

Audio as a base64 encoded string.

uint8Array:

Uint8Array

Audio as a Uint8Array.

mimeType:

string

MIME type of the audio (e.g. "audio/mpeg").

format:

string

Format of the audio (e.g. "mp3").

warnings:

SpeechWarning[]

Warnings from the model provider (e.g. unsupported settings).

responses:

Array<SpeechModelResponseMetadata>

Response metadata from the provider. There may be multiple responses if we made multiple calls to the model.

SpeechModelResponseMetadata

timestamp:

Date

Timestamp for the start of the generated response.

modelId:

string

The ID of the response model that was used to generate the response.

headers?:

Record<string, string>

Response headers.