Skip to content


HuggingFaceStream(iter: AsyncGenerator<any>, cb?: AIStreamCallbacks): ReadableStream

The HuggingFaceStream function is a utility that transforms the output from an array of text generation models hosted on Hugging (opens in a new tab) into a ReadableStream. The transformation uses an AsyncGenerator as provided by the Hugging Face Inference SDK (opens in a new tab)'s hf.textGenerationStream method. This feature enables you to handle AI responses in real-time by means of a readable stream.

While HuggingFaceStream is compatible with most Hugging Face text generation models, the rapidly evolving landscape of models may result in certain new or niche models not being supported. If you encounter a model that isn't supported, we encourage you to open an issue (opens in a new tab).

To ensure that AI responses are comprised purely of text without any delimiters that could pose issues when rendering in chat or completion modes, we standardize and remove special end-of-response tokens. If your use case requires a different handling of responses, you can fork and modify this stream to meet your specific needs.

Currently, </s> and <|endoftext|> are recognized as end-of-stream tokens.

HuggingFaceStream is compatible with the following models, as specified through the model parameter in the Hugging Face Inference SDK:


iter: AsyncGenerator<any>

This parameter should be an AsyncGenerator, as returned by the hf.textGenerationStream method in the Hugging Face Inference SDK.

cb?: AIStreamCallbacks

This optional parameter can be an object containing callback functions to handle the start, each token, and completion of the AI response. In the absence of this parameter, default behavior is implemented.


The HuggingFaceStream function can be coupled with the Hugging Face Inference SDK to generate a readable stream from a text generation stream. This stream can then facilitate the real-time consumption of AI outputs as they're being generated.

Here's a step-by-step example of how to implement HuggingFaceStream:

import { HfInference } from '@huggingface/inference';
import { HuggingFaceStream, StreamingTextResponse } from 'ai';
export const runtime = 'edge';
const Hf = new HfInference(process.env.HUGGINGFACE_API_KEY);
export async function POST(req: Request) {
  const { prompt } = await req.json();
  // Initialize a text generation stream using Hugging Face Inference SDK
  const iter = await Hf.textGenerationStream({
    model: 'google/flan-t5-xxl',
    inputs: prompt,
    parameters: {
      max_new_tokens: 200,
      temperature: 0.5,
      repetition_penalty: 1,
      return_full_text: false,
  // Convert the async generator into a readable stream
  const stream = HuggingFaceStream(iter);
  // Return a StreamingTextResponse, enabling the client to consume the response
  return new StreamingTextResponse(stream);

In this example, the HuggingFaceStream function transforms the text generation stream from the Hugging Face Inference SDK into a ReadableStream. This allows clients to consume AI outputs in real-time as they're generated, instead of waiting for the complete response.

© 2023 Vercel Inc.