transcribe()

transcribe is an experimental feature.

Generates a transcript from an audio file.

import { experimental_transcribe as transcribe } from 'ai';
import { openai } from '@ai-sdk/openai';
import { readFile } from 'fs/promises';
const { transcript } = await transcribe({
model: openai.transcription('whisper-1'),
audio: await readFile('audio.mp3'),
});
console.log(transcript);

Import

import { experimental_transcribe as transcribe } from "ai"

API Signature

Parameters

model:

TranscriptionModelV1
The transcription model to use.

audio:

DataContent (string | Uint8Array | ArrayBuffer | Buffer) | URL
The audio file to generate the transcript from.

providerOptions?:

Record<string, Record<string, JSONValue>>
Additional provider-specific options.

maxRetries?:

number
Maximum number of retries. Default: 2.

abortSignal?:

AbortSignal
An optional abort signal to cancel the call.

headers?:

Record<string, string>
Additional HTTP headers for the request.

Returns

text:

string
The complete transcribed text from the audio input.

segments:

Array<{ text: string; startSecond: number; endSecond: number }>
An array of transcript segments, each containing a portion of the transcribed text along with its start and end times in seconds.

language:

string | undefined
The language of the transcript in ISO-639-1 format e.g. "en" for English.

durationInSeconds:

number | undefined
The duration of the transcript in seconds.

warnings:

TranscriptionWarning[]
Warnings from the model provider (e.g. unsupported settings).

responses:

Array<TranscriptionModelResponseMetadata>
Response metadata from the provider. There may be multiple responses if we made multiple calls to the model.
TranscriptionModelResponseMetadata

timestamp:

Date
Timestamp for the start of the generated response.

modelId:

string
The ID of the response model that was used to generate the response.

headers?:

Record<string, string>
Response headers.