Gladia Provider
The Gladia provider contains language model support for the Gladia transcription API.
Setup
The Gladia provider is available in the @ai-sdk/gladia
module. You can install it with
pnpm add @ai-sdk/gladia
Provider Instance
You can import the default provider instance gladia
from @ai-sdk/gladia
:
import { gladia } from '@ai-sdk/gladia';
If you need a customized setup, you can import createGladia
from @ai-sdk/gladia
and create a provider instance with your settings:
import { createGladia } from '@ai-sdk/gladia';
const gladia = createGladia({ // custom settings, e.g. fetch: customFetch,});
You can use the following optional settings to customize the Gladia provider instance:
-
apiKey string
API key that is being sent using the
Authorization
header. It defaults to theDEEPGRAM_API_KEY
environment variable. -
headers Record<string,string>
Custom headers to include in the requests.
-
fetch (input: RequestInfo, init?: RequestInit) => Promise<Response>
Custom fetch implementation. Defaults to the global
fetch
function. You can use it as a middleware to intercept requests, or to provide a custom fetch implementation for e.g. testing.
Transcription Models
You can create models that call the Gladia transcription API
using the .transcription()
factory method.
const model = gladia.transcription();
You can also pass additional provider-specific options using the providerOptions
argument. For example, supplying the summarize
option will enable summaries for sections of content.
import { experimental_transcribe as transcribe } from 'ai';import { gladia } from '@ai-sdk/gladia';import { readFile } from 'fs/promises';
const result = await transcribe({ model: gladia.transcription(), audio: await readFile('audio.mp3'), providerOptions: { gladia: { summarize: true } },});
Gladia does not have various models, so you can omit the standard model
id
parameter.
The following provider options are available:
-
contextPrompt string
Context to feed the transcription model with for possible better accuracy. Optional.
-
customVocabulary boolean | any[]
Custom vocabulary to improve transcription accuracy. Optional.
-
customVocabularyConfig object
Configuration for custom vocabulary. Optional.
- vocabulary Array<string | { value: string, intensity?: number, pronunciations?: string[], language?: string }>
- defaultIntensity number
-
detectLanguage boolean
Whether to automatically detect the language. Optional.
-
enableCodeSwitching boolean
Enable code switching for multilingual audio. Optional.
-
codeSwitchingConfig object
Configuration for code switching. Optional.
- languages string[]
-
language string
Specify the language of the audio. Optional.
-
callback boolean
Enable callback when transcription is complete. Optional.
-
callbackConfig object
Configuration for callback. Optional.
- url string
- method 'POST' | 'PUT'
-
subtitles boolean
Generate subtitles from the transcription. Optional.
-
subtitlesConfig object
Configuration for subtitles. Optional.
- formats Array<'srt' | 'vtt'>
- minimumDuration number
- maximumDuration number
- maximumCharactersPerRow number
- maximumRowsPerCaption number
- style 'default' | 'compliance'
-
diarization boolean
Enable speaker diarization. Defaults to
true
. Optional. -
diarizationConfig object
Configuration for diarization. Optional.
- numberOfSpeakers number
- minSpeakers number
- maxSpeakers number
- enhanced boolean
-
translation boolean
Enable translation of the transcription. Optional.
-
translationConfig object
Configuration for translation. Optional.
- targetLanguages string[]
- model 'base' | 'enhanced'
- matchOriginalUtterances boolean
-
summarization boolean
Enable summarization of the transcription. Optional.
-
summarizationConfig object
Configuration for summarization. Optional.
- type 'general' | 'bullet_points' | 'concise'
-
moderation boolean
Enable content moderation. Optional.
-
namedEntityRecognition boolean
Enable named entity recognition. Optional.
-
chapterization boolean
Enable chapterization of the transcription. Optional.
-
nameConsistency boolean
Enable name consistency in the transcription. Optional.
-
customSpelling boolean
Enable custom spelling. Optional.
-
customSpellingConfig object
Configuration for custom spelling. Optional.
- spellingDictionary Record<string, string[]>
-
structuredDataExtraction boolean
Enable structured data extraction. Optional.
-
structuredDataExtractionConfig object
Configuration for structured data extraction. Optional.
- classes string[]
-
sentimentAnalysis boolean
Enable sentiment analysis. Optional.
-
audioToLlm boolean
Enable audio to LLM processing. Optional.
-
audioToLlmConfig object
Configuration for audio to LLM. Optional.
- prompts string[]
-
customMetadata Record<string, any>
Custom metadata to include with the request. Optional.
-
sentences boolean
Enable sentence detection. Optional.
-
displayMode boolean
Enable display mode. Optional.
-
punctuationEnhanced boolean
Enable enhanced punctuation. Optional.
Model Capabilities
Model | Transcription | Duration | Segments | Language |
---|---|---|---|---|
Default |