Language Model Middleware
Language model middleware is an experimental feature.
Language model middleware is a way to enhance the behavior of language models by intercepting and modifying the calls to the language model.
It can be used to add features like guardrails, RAG, caching, and logging in a language model agnostic way. Such middleware can be developed and distributed independently from the language models that they are applied to.
Using Language Model Middleware
You can use language model middleware with the wrapLanguageModel
function.
It takes a language model and a language model middleware and returns a new
language model that incorporates the middleware.
import { experimental_wrapLanguageModel as wrapLanguageModel } from 'ai';
const wrappedLanguageModel = wrapLanguageModel({ model: yourModel, middleware: yourLanguageModelMiddleware,});
The wrapped language model can be used just like any other language model, e.g. in streamText
:
const result = streamText({ model: wrappedLanguageModel, prompt: 'What cities are in the United States?',});
Implementing Language Model Middleware
Implementing language model middleware is advanced functionality and requires a solid understanding of the language model specification.
You can implement any of the following three function to modify the behavior of the language model:
transformParams
: Transforms the parameters before they are passed to the language model, for bothdoGenerate
anddoStream
.wrapGenerate
: Wraps thedoGenerate
method of the language model. You can modify the parameters, call the language model, and modify the result.wrapStream
: Wraps thedoStream
method of the language model. You can modify the parameters, call the language model, and modify the result.
Here are some examples of how to implement language model middleware:
Examples
These examples are not meant to be used in production. They are just to show how you can use middleware to enhance the behavior of language models.
Logging
This example shows how to log the parameters and generated text of a language model call.
import type { Experimental_LanguageModelV1Middleware as LanguageModelV1Middleware, LanguageModelV1StreamPart,} from 'ai';
export const yourLogMiddleware: LanguageModelV1Middleware = { wrapGenerate: async ({ doGenerate, params }) => { console.log('doGenerate called'); console.log(`params: ${JSON.stringify(params, null, 2)}`);
const result = await doGenerate();
console.log('doGenerate finished'); console.log(`generated text: ${result.text}`);
return result; },
wrapStream: async ({ doStream, params }) => { console.log('doStream called'); console.log(`params: ${JSON.stringify(params, null, 2)}`);
const { stream, ...rest } = await doStream();
let generatedText = '';
const transformStream = new TransformStream< LanguageModelV1StreamPart, LanguageModelV1StreamPart >({ transform(chunk, controller) { if (chunk.type === 'text-delta') { generatedText += chunk.textDelta; }
controller.enqueue(chunk); },
flush() { console.log('doStream finished'); console.log(`generated text: ${generatedText}`); }, });
return { stream: stream.pipeThrough(transformStream), ...rest, }; },};
Caching
This example shows how to build a simple cache for the generated text of a language model call.
import type { Experimental_LanguageModelV1Middleware as LanguageModelV1Middleware } from 'ai';
const cache = new Map<string, any>();
export const yourCacheMiddleware: LanguageModelV1Middleware = { wrapGenerate: async ({ doGenerate, params }) => { const cacheKey = JSON.stringify(params);
if (cache.has(cacheKey)) { return cache.get(cacheKey); }
const result = await doGenerate();
cache.set(cacheKey, result);
return result; },
// here you would implement the caching logic for streaming};
Retrieval Augmented Generation (RAG)
This example shows how to use RAG as middleware.
Helper functions like getLastUserMessageText
and findSources
are not part
of the AI SDK. They are just used in this example to illustrate the concept of
RAG.
import type { Experimental_LanguageModelV1Middleware as LanguageModelV1Middleware } from 'ai';
export const yourRagMiddleware: LanguageModelV1Middleware = { transformParams: async ({ params }) => { const lastUserMessageText = getLastUserMessageText({ prompt: params.prompt, });
if (lastUserMessageText == null) { return params; // do not use RAG (send unmodified parameters) }
const instruction = 'Use the following information to answer the question:\n' + findSources({ text: lastUserMessageText }) .map(chunk => JSON.stringify(chunk)) .join('\n');
return addToLastUserMessage({ params, text: instruction }); },};
Guardrails
Guard rails are a way to ensure that the generated text of a language model call is safe and appropriate. This example shows how to use guardrails as middleware.
import type { Experimental_LanguageModelV1Middleware as LanguageModelV1Middleware } from 'ai';
export const yourGuardrailMiddleware: LanguageModelV1Middleware = { wrapGenerate: async ({ doGenerate }) => { const { text, ...rest } = await doGenerate();
// filtering approach, e.g. for PII or other sensitive information: const cleanedText = text?.replace(/badword/g, '<REDACTED>');
return { text: cleanedText, ...rest }; },
// here you would implement the guardrail logic for streaming // Note: streaming guardrails are difficult to implement, because // you do not know the full content of the stream until it's finished.};