AssemblyAI Provider

The AssemblyAI provider contains language model support for the AssemblyAI transcription API.

Setup

The AssemblyAI provider is available in the @ai-sdk/assemblyai module. You can install it with

pnpm

npm

yarn

pnpm add @ai-sdk/assemblyai

Provider Instance

You can import the default provider instance assemblyai from @ai-sdk/assemblyai:

import { assemblyai } from '@ai-sdk/assemblyai';

If you need a customized setup, you can import createAssemblyAI from @ai-sdk/assemblyai and create a provider instance with your settings:

import { createAssemblyAI } from '@ai-sdk/assemblyai';

const assemblyai = createAssemblyAI({
  // custom settings, e.g.
  fetch: customFetch,
});

You can use the following optional settings to customize the AssemblyAI provider instance:

apiKey string

API key that is being sent using the Authorization header. It defaults to the ASSEMBLYAI_API_KEY environment variable.
headers Record<string,string>

Custom headers to include in the requests.
fetch (input: RequestInfo, init?: RequestInit) => Promise<Response>

Custom fetch implementation. Defaults to the global fetch function. You can use it as a middleware to intercept requests, or to provide a custom fetch implementation for e.g. testing.

Transcription Models

You can create models that call the AssemblyAI transcription API using the .transcription() factory method.

The first argument is the model id e.g. best.

const model = assemblyai.transcription('best');

You can also pass additional provider-specific options using the providerOptions argument. For example, supplying the contentSafety option will enable content safety filtering.

import { experimental_transcribe as transcribe } from 'ai';
import { assemblyai } from '@ai-sdk/assemblyai';
import { readFile } from 'fs/promises';

const result = await transcribe({
  model: assemblyai.transcription('best'),
  audio: await readFile('audio.mp3'),
  providerOptions: { assemblyai: { contentSafety: true } },
});

The following provider options are available:

audioEndAt number

End time of the audio in milliseconds. Optional.
audioStartFrom number

Start time of the audio in milliseconds. Optional.
autoChapters boolean

Whether to automatically generate chapters for the transcription. Optional.
autoHighlights boolean

Whether to automatically generate highlights for the transcription. Optional.
boostParam enum

Boost parameter for the transcription. Allowed values: 'low', 'default', 'high'. Optional.
contentSafety boolean

Whether to enable content safety filtering. Optional.
contentSafetyConfidence number

Confidence threshold for content safety filtering (25-100). Optional.
customSpelling array of objects

Custom spelling rules for the transcription. Each object has from (array of strings) and to (string) properties. Optional.
disfluencies boolean

Whether to include disfluencies (um, uh, etc.) in the transcription. Optional.
entityDetection boolean

Whether to detect entities in the transcription. Optional.
filterProfanity boolean

Whether to filter profanity in the transcription. Optional.
formatText boolean

Whether to format the text in the transcription. Optional.
iabCategories boolean

Whether to include IAB categories in the transcription. Optional.
languageCode string

Language code for the audio. Supports numerous ISO-639-1 and ISO-639-3 language codes. Optional.
languageConfidenceThreshold number

Confidence threshold for language detection. Optional.
languageDetection boolean

Whether to enable language detection. Optional.
multichannel boolean

Whether to process multiple audio channels separately. Optional.
punctuate boolean

Whether to add punctuation to the transcription. Optional.
redactPii boolean

Whether to redact personally identifiable information. Optional.
redactPiiAudio boolean

Whether to redact PII in the audio file. Optional.
redactPiiAudioQuality enum

Quality of the redacted audio file. Allowed values: 'mp3', 'wav'. Optional.
redactPiiPolicies array of enums

Policies for PII redaction, specifying which types of information to redact. Supports numerous types like 'person_name', 'phone_number', etc. Optional.
redactPiiSub enum

Substitution method for redacted PII. Allowed values: 'entity_name', 'hash'. Optional.
sentimentAnalysis boolean

Whether to perform sentiment analysis on the transcription. Optional.
speakerLabels boolean

Whether to label different speakers in the transcription. Optional.
speakersExpected number

Expected number of speakers in the audio. Optional.
speechThreshold number

Threshold for speech detection (0-1). Optional.
summarization boolean

Whether to generate a summary of the transcription. Optional.
summaryModel enum

Model to use for summarization. Allowed values: 'informative', 'conversational', 'catchy'. Optional.
summaryType enum

Type of summary to generate. Allowed values: 'bullets', 'bullets_verbose', 'gist', 'headline', 'paragraph'. Optional.
topics array of strings

List of topics to detect in the transcription. Optional.
webhookAuthHeaderName string

Name of the authentication header for webhook requests. Optional.
webhookAuthHeaderValue string

Value of the authentication header for webhook requests. Optional.
webhookUrl string

URL to send webhook notifications to. Optional.
wordBoost array of strings

List of words to boost in the transcription. Optional.

Model Capabilities

Model	Transcription	Duration	Segments	Language
`best`
`nano`