Skip to content
Docs
Providers
Replicate

Replicate

Vercel AI SDK supports streaming responses for certain Replicate (opens in a new tab) text models (including Llama 2). You can see supported models on their website (opens in a new tab).

Guide: Llama 2 Chatbot

Create a Next.js app

Create a Next.js application and install ai and replicate.

pnpm dlx create-next-app my-ai-app
cd my-ai-app
pnpm install ai replicate

Add your Replicate API Key to .env

Create a .env file in your project root and add your Replicate API Key:

.env
REPLICATE_API_KEY=xxxxxxxxx

Create a Route Handler

app/api/chat/route.ts
import { ReplicateStream, StreamingTextResponse } from 'ai';
import Replicate from 'replicate';
import { experimental_buildLlama2Prompt } from 'ai/prompts';
 
// Create a Replicate API client (that's edge friendly!)
const replicate = new Replicate({
  auth: process.env.REPLICATE_API_KEY || '',
});
 
// IMPORTANT! Set the runtime to edge
export const runtime = 'edge';
 
export async function POST(req: Request) {
  const { messages } = await req.json();
 
  const response = await replicate.predictions.create({
    // You must enable streaming.
    stream: true,
    // The model must support streaming. See https://replicate.com/docs/streaming
    // This is the model ID for Llama 2 70b Chat
    version: '2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1',
    // Format the message list into the format expected by Llama 2
    // @see https://github.com/vercel/ai/blob/99cf16edf0a09405d15d3867f997c96a8da869c6/packages/core/prompts/huggingface.ts#L53C1-L78C2
    input: {
      prompt: experimental_buildLlama2Prompt(messages),
    },
  });
 
  // Convert the response into a friendly text-stream
  const stream = await ReplicateStream(response);
  // Respond with the stream
  return new StreamingTextResponse(stream);
}

Wire up the UI

Create a Client component with a form that we'll use to gather the prompt from the user and then stream back the completion from. By default, the useChat hook will use the POST Route Handler we created above (it defaults to /api/chat). You can override this by passing a api prop to useChat({ api: '...'}).

app/page.tsx
'use client';
 
import { useChat } from 'ai/react';
 
export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat();
 
  return (
    <div className="flex flex-col w-full max-w-md py-24 mx-auto stretch">
      {messages.map(m => (
        <div key={m.id} className="whitespace-pre-wrap">
          {m.role === 'user' ? 'User: ' : 'AI: '}
          {m.content}
        </div>
      ))}
 
      <form onSubmit={handleSubmit}>
        <input
          className="fixed bottom-0 w-full max-w-md p-2 mb-8 border border-gray-300 rounded shadow-xl"
          value={input}
          placeholder="Say something..."
          onChange={handleInputChange}
        />
      </form>
    </div>
  );
}

Guide: Text Completion

Create and stream a completion

app/api/completion/route.ts
import { ReplicateStream, StreamingTextResponse } from 'ai';
import Replicate from 'replicate';
 
const replicate = new Replicate({
  auth: process.env.REPLICATE_API_KEY || '',
});
 
export async function POST(req: Request) {
  // Get the prompt from the request body
  const { prompt } = await req.json();
 
  const response = await replicate.predictions.create({
    // You must enable streaming.
    stream: true,
    // The model must support streaming. See https://replicate.com/docs/streaming
    // This is the model ID for Llama 2 70b Chat
    version: '2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1',
    // Format the message list into the format expected by Llama 2
    // @see https://github.com/vercel/ai/blob/99cf16edf0a09405d15d3867f997c96a8da869c6/packages/core/prompts/huggingface.ts#L53C1-L78C2
    input: {
      prompt,
    },
  });
 
  // Convert the response into a friendly text-stream
  const stream = await ReplicateStream(response);
  // Respond with the stream
  return new StreamingTextResponse(stream);
}

Wire up the UI

We can use the useCompletion hook to make it easy to wire up the UI. By default, the useCompletion hook will use the POST Route Handler we created above (it defaults to /api/completion). You can override this by passing a api prop to useCompletion({ api: '...'}).

app/page.tsx
'use client';
 
import { useCompletion } from 'ai/react';
 
export default function Chat() {
  const { completion, input, handleInputChange, handleSubmit, error } =
    useCompletion();
 
  return (
    <div className="flex flex-col w-full max-w-md py-24 mx-auto stretch">
      <h4 className="text-xl font-bold text-gray-900 md:text-xl pb-4">
        useCompletion Example
      </h4>
      {error && (
        <div className="fixed top-0 left-0 w-full p-4 text-center bg-red-500 text-white">
          {error.message}
        </div>
      )}
      {completion}
      <form onSubmit={handleSubmit}>
        <input
          className="fixed bottom-0 w-full max-w-md p-2 mb-8 border border-gray-300 rounded shadow-xl"
          value={input}
          placeholder="Say something..."
          onChange={handleInputChange}
        />
      </form>
    </div>
  );
}

Guide: Save to Database After Completion

It’s common to want to save the result of a completion to a database after streaming it back to the user. The ReplicateStream adapter accepts a couple of optional callbacks that can be used to do this.

app/api/completion/route.ts
export async function POST(req: Request) {
  // ...
 
  // Convert the response into a friendly text-stream
  const stream = await ReplicateStream(response, {
    onStart: async () => {
      // This callback is called when the stream starts
      // You can use this to save the prompt to your database
      await savePromptToDatabase(prompt);
    },
    onToken: async (token: string) => {
      // This callback is called for each token in the stream
      // You can use this to debug the stream or save the tokens to your database
      console.log(token);
    },
    onCompletion: async (completion: string) => {
      // This callback is called when the stream completes
      // You can use this to save the final completion to your database
      await saveCompletionToDatabase(completion);
    },
  });
 
  // Respond with the stream
  return new StreamingTextResponse(stream);
}

© 2023 Vercel Inc.