Replicate

Vercel AI SDK supports streaming responses for certain Replicate text models (including Llama 2). You can see supported models on their website.

Guide: Llama 2 Chatbot

Create a Next.js app

Create a Next.js application and install ai and replicate.

pnpm dlx create-next-app my-ai-app
cd my-ai-app
pnpm install ai replicate

Add your Replicate API Key to .env

Create a .env file in your project root and add your Replicate API Key:

REPLICATE_API_KEY=xxxxxxxxx

Create a Route Handler

import { ReplicateStream, StreamingTextResponse } from 'ai'
import Replicate from 'replicate'
import { experimental_buildLlama2Prompt } from 'ai/prompts'
// Create a Replicate API client
const replicate = new Replicate({
auth: process.env.REPLICATE_API_KEY || ''
})
export async function POST(req: Request) {
const { messages } = await req.json()
const response = await replicate.predictions.create({
// You must enable streaming.
stream: true,
// The model must support streaming. See https://replicate.com/docs/streaming
// This is the model ID for Llama 2 70b Chat
version: '2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1',
// Format the message list into the format expected by Llama 2
// @see https://github.com/vercel/ai/blob/99cf16edf0a09405d15d3867f997c96a8da869c6/packages/core/prompts/huggingface.ts#L53C1-L78C2
input: {
prompt: experimental_buildLlama2Prompt(messages)
}
})
// Convert the response into a friendly text-stream
const stream = await ReplicateStream(response)
// Respond with the stream
return new StreamingTextResponse(stream)
}

Wire up the UI

Create a Client component with a form that we'll use to gather the prompt from the user and then stream back the completion from. By default, the useChat hook will use the POST Route Handler we created above (it defaults to /api/chat). You can override this by passing a api prop to useChat({ api: '...'}).

'use client'
import { useChat } from 'ai/react'
export default function Chat() {
const { messages, input, handleInputChange, handleSubmit } = useChat()
return (
<div className="flex flex-col w-full max-w-md py-24 mx-auto stretch">
{messages.map(m => (
<div key={m.id} className="whitespace-pre-wrap">
{m.role === 'user' ? 'User: ' : 'AI: '}
{m.content}
</div>
))}
<form onSubmit={handleSubmit}>
<input
className="fixed bottom-0 w-full max-w-md p-2 mb-8 border border-gray-300 rounded shadow-xl"
value={input}
placeholder="Say something..."
onChange={handleInputChange}
/>
</form>
</div>
)
}

Guide: Text Completion

Create and stream a completion

import { ReplicateStream, StreamingTextResponse } from 'ai'
import Replicate from 'replicate'
const replicate = new Replicate({
auth: process.env.REPLICATE_API_KEY || ''
})
export async function POST(req: Request) {
// Get the prompt from the request body
const { prompt } = await req.json()
const response = await replicate.predictions.create({
// You must enable streaming.
stream: true,
// The model must support streaming. See https://replicate.com/docs/streaming
// This is the model ID for Llama 2 70b Chat
version: '2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1',
// Format the message list into the format expected by Llama 2
// @see https://github.com/vercel/ai/blob/99cf16edf0a09405d15d3867f997c96a8da869c6/packages/core/prompts/huggingface.ts#L53C1-L78C2
input: {
prompt
}
})
// Convert the response into a friendly text-stream
const stream = await ReplicateStream(response)
// Respond with the stream
return new StreamingTextResponse(stream)
}

Wire up the UI

We can use the useCompletion hook to make it easy to wire up the UI. By default, the useCompletion hook will use the POST Route Handler we created above (it defaults to /api/completion). You can override this by passing a api prop to useCompletion({ api: '...'}).

'use client'
import { useCompletion } from 'ai/react'
export default function Chat() {
const { completion, input, handleInputChange, handleSubmit, error } =
useCompletion()
return (
<div className="flex flex-col w-full max-w-md py-24 mx-auto stretch">
<h4 className="text-xl font-bold text-gray-900 md:text-xl pb-4">
useCompletion Example
</h4>
{error && (
<div className="fixed top-0 left-0 w-full p-4 text-center bg-red-500 text-white">
{error.message}
</div>
)}
{completion}
<form onSubmit={handleSubmit}>
<input
className="fixed bottom-0 w-full max-w-md p-2 mb-8 border border-gray-300 rounded shadow-xl"
value={input}
placeholder="Say something..."
onChange={handleInputChange}
/>
</form>
</div>
)
}

Guide: Save to Database After Completion

It’s common to want to save the result of a completion to a database after streaming it back to the user. The ReplicateStream adapter accepts a couple of optional callbacks that can be used to do this.

export async function POST(req: Request) {
// ...
// Convert the response into a friendly text-stream
const stream = await ReplicateStream(response, {
onStart: async () => {
// This callback is called when the stream starts
// You can use this to save the prompt to your database
await savePromptToDatabase(prompt)
},
onToken: async (token: string) => {
// This callback is called for each token in the stream
// You can use this to debug the stream or save the tokens to your database
console.log(token)
},
onCompletion: async (completion: string) => {
// This callback is called when the stream completes
// You can use this to save the final completion to your database
await saveCompletionToDatabase(completion)
}
})
// Respond with the stream
return new StreamingTextResponse(stream)
}