Hugging Face
The AI SDK provides a set of utilities to make it easy to use Hugging Face's APIs. In this guide, we'll walk through how to use the utilities to create a chat bot and a text completion app. Hugging Face offers two API services:
- Free Inference API: Use over 100k+ models out of the box. Ideal for quick exploration of models.
- Inference Endpoints: Production-ready service with autoscaling, dedicated infra and flexibility.
Guide: Chat Bot
Create a Next.js app
Create a Next.js application and install ai
and @huggingface/inference
:
pnpm dlx create-next-app my-ai-appcd my-ai-apppnpm add ai @huggingface/inference
Add your Hugging Face API Key to .env
Create a .env
file in your project root and add your Hugging Face token (generate one here):
HUGGINGFACE_API_KEY=xxxxxxxxx
Create a Route Handler
Create a Next.js Route Handler that uses the OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5
model to generate a response to a series of messages via the Hugging Face API, and returns the response as a streaming text response.
For this example, we'll create a route handler at app/api/chat/route.ts
that accepts a POST
request with a messages
array of strings:
import { HfInference } from '@huggingface/inference';import { HuggingFaceStream, StreamingTextResponse } from 'ai';import { experimental_buildOpenAssistantPrompt } from 'ai/prompts';
// Create a new HuggingFace Inference instanceconst Hf = new HfInference(process.env.HUGGINGFACE_API_KEY);
export async function POST(req: Request) { // Extract the `messages` from the body of the request const { messages } = await req.json();
const response = Hf.textGenerationStream({ model: 'OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5', inputs: experimental_buildOpenAssistantPrompt(messages), parameters: { max_new_tokens: 200, // @ts-ignore (this is a valid parameter specifically in OpenAssistant models) typical_p: 0.2, repetition_penalty: 1, truncate: 1000, return_full_text: false, }, });
// Convert the response into a friendly text-stream const stream = HuggingFaceStream(response);
// Respond with the stream return new StreamingTextResponse(stream);}
The AI SDK provides 2 utility helpers to make the above seamless: First, we
pass the streaming response
we receive from Hf.textGenerationStream
to
HuggingFaceStream
.
This method decodes/extracts the text tokens in the response and then
re-encodes them properly for simple consumption. We can then pass that new
stream directly to
StreamingTextResponse
.
This is another utility class that extends the normal Node/Edge Runtime
Response
class with the default headers you probably want (hint:
'Content-Type': 'text/plain; charset=utf-8'
is already set for you).
Wire up the UI
Create a Client component with a form that we'll use to gather the prompt from the user and then stream back the completion from.
By default, the useChat
hook will use the POST
Route Handler we created above (it defaults to /api/chat
). You can override this by passing a api
prop to useChat({ api: '...'})
.
'use client';
import { useChat } from 'ai/react';
export default function Chat() { const { messages, input, handleInputChange, handleSubmit } = useChat();
return ( <div className="mx-auto w-full max-w-md py-24 flex flex-col stretch"> {messages.map(m => ( <div key={m.id}> {m.role === 'user' ? 'User: ' : 'AI: '} {m.content} </div> ))}
<form onSubmit={handleSubmit}> <label> Say something... <input className="fixed w-full max-w-md bottom-0 border border-gray-300 rounded mb-8 shadow-xl p-2" value={input} onChange={handleInputChange} /> </label> <button type="submit">Send</button> </form> </div> );}
Guide: Text Completion
Use the Completion API
Similar to the Chat Bot example above, we'll create a Next.js Route Handler that generates a text completion via the same Hugging Face API that we'll then stream back to our Next.js. It accepts a POST
request with a prompt
string:
import { HfInference } from '@huggingface/inference';import { HuggingFaceStream, StreamingTextResponse } from 'ai';
// Create a new Hugging Face Inference instanceconst Hf = new HfInference(process.env.HUGGINGFACE_API_KEY);
export async function POST(req: Request) { // Extract the `prompt` from the body of the request const { prompt } = await req.json();
const response = Hf.textGenerationStream({ model: 'OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5', inputs: `<|prompter|>${prompt}<|endoftext|><|assistant|>`, parameters: { max_new_tokens: 200, // @ts-ignore (this is a valid parameter specifically in OpenAssistant models) typical_p: 0.2, repetition_penalty: 1, truncate: 1000, return_full_text: false, }, });
// Convert the response into a friendly text-stream const stream = HuggingFaceStream(response);
// Respond with the stream return new StreamingTextResponse(stream);}
Wire up the UI
We can use the useCompletion
hook to make it easy to wire up the UI. By default, the useCompletion
hook will use the POST
Route Handler we created above (it defaults to /api/completion
). You can override this by passing a api
prop to useCompletion({ api: '...'})
.
'use client';
import { useCompletion } from 'ai/react';
export default function Completion() { const { completion, input, stop, isLoading, handleInputChange, handleSubmit, } = useCompletion();
return ( <div className="mx-auto w-full max-w-md py-24 flex flex-col stretch"> <form onSubmit={handleSubmit}> <label> Say something... <input className="fixed w-full max-w-md bottom-0 border border-gray-300 rounded mb-8 shadow-xl p-2" value={input} onChange={handleInputChange} /> </label> <output>Completion result: {completion}</output> <button type="button" onClick={stop}> Stop </button> <button disabled={isLoading} type="submit"> Send </button> </form> </div> );}
Guide: Using Production-ready Inference Endpoints
Inference Endpoints offer a secure solution to deploy models from the Hub on dedicated and autoscaling infrastructure.
@huggingface/inference
also works with Inference Endpoints, making it very easy to switch between the two API services. The only change needed is how you create your inference instance.
import { HfInferenceEndpoint } from '@huggingface/inference';
// Create a new Hugging Face Inference instanceconst endpointUrl = 'https://YOUR_ENDPOINT.endpoints.huggingface.cloud/gpt2';const Hf = new HfInferenceEndpoint( endpointUrl, process.env.HUGGINGFACE_API_KEY,);
// Rest of the code stays the same
Guide: Save to Database After Completion
It’s common to want to save the result of a completion to a database after streaming it back to the user. The HuggingFaceStream
adapter accepts a couple of optional callbacks that can be used to do this.
export async function POST(req: Request) { // ...
// Convert the response into a friendly text-stream const stream = HuggingFaceStream(response, { onStart: async () => { // This callback is called when the stream starts // You can use this to save the prompt to your database await savePromptToDatabase(prompt); }, onToken: async (token: string) => { // This callback is called for each token in the stream // You can use this to debug the stream or save the tokens to your database console.log(token); }, onCompletion: async (completion: string) => { // This callback is called when the stream completes // You can use this to save the final completion to your database await saveCompletionToDatabase(completion); }, });
// Respond with the stream return new StreamingTextResponse(stream);}