Multi-Modal Chatbot
In this guide, you will build a multi-modal AI-chatbot with a streaming user interface.
Multi-modal refers to the ability of the chatbot to understand and generate responses in multiple formats, such as text, images, and videos. In this example, we will focus on sending images and generating text-based responses.
Prerequisites
To follow this quickstart, you'll need:
- Node.js 18+ and pnpm installed on your local development machine.
- An OpenAI API key.
If you haven't obtained your OpenAI API key, you can do so by signing up on the OpenAI website.
Create Your Application
Start by creating a new Next.js application. This command will create a new directory named multi-modal-chatbot
and set up a basic Next.js application inside it.
Be sure to select yes when prompted to use the App Router. If you are looking for the Next.js Pages Router quickstart guide, you can find it here.
pnpm create next-app@latest multi-modal-chatbot
Navigate to the newly created directory:
cd multi-modal-chatbot
Install dependencies
Install ai
and @ai-sdk/openai
, the Vercel AI package and the AI SDK's OpenAI provider respectively.
The AI SDK is designed to be a unified interface to interact with any large language model. This means that you can change model and providers with just one line of code! Learn more about available providers and building custom providers in the providers section.
pnpm add ai @ai-sdk/openai
Make sure you are using ai
version 3.2.27 or higher.
Configure OpenAI API key
Create a .env.local
file in your project root and add your OpenAI API Key. This key is used to authenticate your application with the OpenAI service.
touch .env.local
Edit the .env.local
file:
OPENAI_API_KEY=xxxxxxxxx
Replace xxxxxxxxx
with your actual OpenAI API key.
The AI SDK's OpenAI Provider will default to using the OPENAI_API_KEY
environment variable.
Implementation Plan
To build a multi-modal chatbot, you will need to:
- Create a Route Handler to handle incoming chat messages and generate responses.
- Wire up the UI to display chat messages, provide a user input, and handle submitting new messages.
- Add the ability to upload images and attach them alongside the chat messages.
Create a Route Handler
Create a route handler, app/api/chat/route.ts
and add the following code:
import { openai } from '@ai-sdk/openai';import { streamText } from 'ai';
// Allow streaming responses up to 30 secondsexport const maxDuration = 30;
export async function POST(req: Request) { const { messages } = await req.json();
const result = streamText({ model: openai('gpt-4-turbo'), messages, });
return result.toDataStreamResponse();}
Let's take a look at what is happening in this code:
- Define an asynchronous
POST
request handler and extractmessages
from the body of the request. Themessages
variable contains a history of the conversation between you and the chatbot and provides the chatbot with the necessary context to make the next generation. - Call
streamText
, which is imported from theai
package. This function accepts a configuration object that contains amodel
provider (imported from@ai-sdk/openai
) andmessages
(defined in step 1). You can pass additional settings to further customise the model's behaviour. - The
streamText
function returns aStreamTextResult
. This result object contains thetoDataStreamResponse
function which converts the result to a streamed response object. - Finally, return the result to the client to stream the response.
This Route Handler creates a POST request endpoint at /api/chat
.
Wire up the UI
Now that you have a Route Handler that can query a large language model (LLM), it's time to setup your frontend. AI SDK UI abstracts the complexity of a chat interface into one hook, useChat
.
Update your root page (app/page.tsx
) with the following code to show a list of chat messages and provide a user message input:
'use client';
import { useChat } from 'ai/react';
export default function Chat() { const { messages, input, handleInputChange, handleSubmit } = useChat(); return ( <div className="flex flex-col w-full max-w-md py-24 mx-auto stretch"> {messages.map(m => ( <div key={m.id} className="whitespace-pre-wrap"> {m.role === 'user' ? 'User: ' : 'AI: '} {m.content} </div> ))}
<form onSubmit={handleSubmit} className="fixed bottom-0 w-full max-w-md mb-8 border border-gray-300 rounded shadow-xl" > <input className="w-full p-2" value={input} placeholder="Say something..." onChange={handleInputChange} /> </form> </div> );}
Make sure you add the "use client"
directive to the top of your file. This
allows you to add interactivity with Javascript.
This page utilizes the useChat
hook, which will, by default, use the POST
API route you created earlier (/api/chat
). The hook provides functions and state for handling user input and form submission. The useChat
hook provides multiple utility functions and state variables:
messages
- the current chat messages (an array of objects withid
,role
, andcontent
properties).input
- the current value of the user's input field.handleInputChange
andhandleSubmit
- functions to handle user interactions (typing into the input field and submitting the form, respectively).isLoading
- boolean that indicates whether the API request is in progress.
Add Image Upload
To make your chatbot multi-modal, let's add the ability to upload and send images to the model. There are two ways to send attachments alongside a message with the useChat
hook: by providing a FileList
object or a list of URLs to the handleSubmit
function. In this guide, you will be using the FileList
approach as it does not require any additional setup.
Update your root page (app/page.tsx
) with the following code:
'use client';
import { useChat } from 'ai/react';import { useRef, useState } from 'react';import Image from 'next/image';
export default function Chat() { const { messages, input, handleInputChange, handleSubmit } = useChat();
const [files, setFiles] = useState<FileList | undefined>(undefined); const fileInputRef = useRef<HTMLInputElement>(null);
return ( <div className="flex flex-col w-full max-w-md py-24 mx-auto stretch"> {messages.map(m => ( <div key={m.id} className="whitespace-pre-wrap"> {m.role === 'user' ? 'User: ' : 'AI: '} {m.content} <div> {m?.experimental_attachments ?.filter(attachment => attachment?.contentType?.startsWith('image/'), ) .map((attachment, index) => ( <Image key={`${m.id}-${index}`} src={attachment.url} width={500} height={500} alt={attachment.name ?? `attachment-${index}`} /> ))} </div> </div> ))}
<form className="fixed bottom-0 w-full max-w-md p-2 mb-8 border border-gray-300 rounded shadow-xl space-y-2" onSubmit={event => { handleSubmit(event, { experimental_attachments: files, });
setFiles(undefined);
if (fileInputRef.current) { fileInputRef.current.value = ''; } }} > <input type="file" className="" onChange={event => { if (event.target.files) { setFiles(event.target.files); } }} multiple ref={fileInputRef} /> <input className="w-full p-2" value={input} placeholder="Say something..." onChange={handleInputChange} /> </form> </div> );}
In this code, you:
- Create state to hold the files and create a ref to the file input field.
- Display the "uploaded" files in the UI.
- Update the
onSubmit
function, to call thehandleSubmit
function manually, passing the the files as an option using theexperimental_attachments
key. - Add a file input field to the form, including an
onChange
handler to handle updating the files state.
Running Your Application
With that, you have built everything you need for your multi-modal chatbot! To start your application, use the command:
pnpm run dev
Head to your browser and open http://localhost:3000. You should see an input field and a button to upload an image.
Upload a file and ask the model to describe what it sees. Watch as the model's response is streamed back to you!
Where to Next?
You've built a multi-modal AI chatbot using the AI SDK! Experiment and extend the functionality of this application further by exploring tool calling or introducing more granular control over AI and UI states.
If you are looking to leverage the broader capabilities of LLMs, Vercel AI SDK Core provides a comprehensive set of lower-level tools and APIs that will help you unlock a wider range of AI functionalities beyond the chatbot paradigm.