Replicate
Vercel AI SDK supports streaming responses for certain Replicate (opens in a new tab) text models (including Llama 2). You can see supported models on their website (opens in a new tab).
Guide: Llama 2 Chatbot
Create a Next.js app
Create a Next.js application and install ai
and replicate
.
pnpm dlx create-next-app my-ai-app
cd my-ai-app
pnpm install ai replicate
Add your Replicate API Key to .env
Create a .env
file in your project root and add your Replicate API Key:
REPLICATE_API_KEY=xxxxxxxxx
Create a Route Handler
import { ReplicateStream, StreamingTextResponse } from 'ai';
import Replicate from 'replicate';
import { experimental_buildLlama2Prompt } from 'ai/prompts';
// Create a Replicate API client (that's edge friendly!)
const replicate = new Replicate({
auth: process.env.REPLICATE_API_KEY || '',
});
// IMPORTANT! Set the runtime to edge
export const runtime = 'edge';
export async function POST(req: Request) {
const { messages } = await req.json();
const response = await replicate.predictions.create({
// You must enable streaming.
stream: true,
// The model must support streaming. See https://replicate.com/docs/streaming
// This is the model ID for Llama 2 70b Chat
version: '2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1',
// Format the message list into the format expected by Llama 2
// @see https://github.com/vercel/ai/blob/99cf16edf0a09405d15d3867f997c96a8da869c6/packages/core/prompts/huggingface.ts#L53C1-L78C2
input: {
prompt: experimental_buildLlama2Prompt(messages),
},
});
// Convert the response into a friendly text-stream
const stream = await ReplicateStream(response);
// Respond with the stream
return new StreamingTextResponse(stream);
}
Wire up the UI
Create a Client component with a form that we'll use to gather the prompt from the user and then stream back the completion from.
By default, the useChat
hook will use the POST
Route Handler we created above (it defaults to /api/chat
). You can override this by passing a api
prop to useChat({ api: '...'})
.
'use client';
import { useChat } from 'ai/react';
export default function Chat() {
const { messages, input, handleInputChange, handleSubmit } = useChat();
return (
<div className="flex flex-col w-full max-w-md py-24 mx-auto stretch">
{messages.map(m => (
<div key={m.id} className="whitespace-pre-wrap">
{m.role === 'user' ? 'User: ' : 'AI: '}
{m.content}
</div>
))}
<form onSubmit={handleSubmit}>
<input
className="fixed bottom-0 w-full max-w-md p-2 mb-8 border border-gray-300 rounded shadow-xl"
value={input}
placeholder="Say something..."
onChange={handleInputChange}
/>
</form>
</div>
);
}
Guide: Text Completion
Create and stream a completion
import { ReplicateStream, StreamingTextResponse } from 'ai';
import Replicate from 'replicate';
const replicate = new Replicate({
auth: process.env.REPLICATE_API_KEY || '',
});
export async function POST(req: Request) {
// Get the prompt from the request body
const { prompt } = await req.json();
const response = await replicate.predictions.create({
// You must enable streaming.
stream: true,
// The model must support streaming. See https://replicate.com/docs/streaming
// This is the model ID for Llama 2 70b Chat
version: '2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1',
// Format the message list into the format expected by Llama 2
// @see https://github.com/vercel/ai/blob/99cf16edf0a09405d15d3867f997c96a8da869c6/packages/core/prompts/huggingface.ts#L53C1-L78C2
input: {
prompt,
},
});
// Convert the response into a friendly text-stream
const stream = await ReplicateStream(response);
// Respond with the stream
return new StreamingTextResponse(stream);
}
Wire up the UI
We can use the useCompletion
hook to make it easy to wire up the UI. By default, the useCompletion
hook will use the POST
Route Handler we created above (it defaults to /api/completion
). You can override this by passing a api
prop to useCompletion({ api: '...'})
.
'use client';
import { useCompletion } from 'ai/react';
export default function Chat() {
const { completion, input, handleInputChange, handleSubmit, error } =
useCompletion();
return (
<div className="flex flex-col w-full max-w-md py-24 mx-auto stretch">
<h4 className="text-xl font-bold text-gray-900 md:text-xl pb-4">
useCompletion Example
</h4>
{error && (
<div className="fixed top-0 left-0 w-full p-4 text-center bg-red-500 text-white">
{error.message}
</div>
)}
{completion}
<form onSubmit={handleSubmit}>
<input
className="fixed bottom-0 w-full max-w-md p-2 mb-8 border border-gray-300 rounded shadow-xl"
value={input}
placeholder="Say something..."
onChange={handleInputChange}
/>
</form>
</div>
);
}
Guide: Save to Database After Completion
It’s common to want to save the result of a completion to a database after streaming it back to the user. The ReplicateStream
adapter accepts a couple of optional callbacks that can be used to do this.
export async function POST(req: Request) {
// ...
// Convert the response into a friendly text-stream
const stream = await ReplicateStream(response, {
onStart: async () => {
// This callback is called when the stream starts
// You can use this to save the prompt to your database
await savePromptToDatabase(prompt);
},
onToken: async (token: string) => {
// This callback is called for each token in the stream
// You can use this to debug the stream or save the tokens to your database
console.log(token);
},
onCompletion: async (completion: string) => {
// This callback is called when the stream completes
// You can use this to save the final completion to your database
await saveCompletionToDatabase(completion);
},
});
// Respond with the stream
return new StreamingTextResponse(stream);
}