Generate Image with Chat Prompt

When building a chatbot, you may want to allow the user to generate an image. This can be done by creating a tool that generates an image using the experimental_generateImage function from the AI SDK.

Server

Let's create an endpoint at /api/chat that generates the assistant's response based on the conversation history. You will also define a tool called generateImage that will generate an image based on the assistant's response.

app/api/chat/route.ts

import { openai } from '@ai-sdk/openai';
import { experimental_generateImage, Message, streamText, tool } from 'ai';
import { z } from 'zod';

export async function POST(request: Request) {
  const { messages }: { messages: Message[] } = await request.json();

  // filter through messages and remove base64 image data to avoid sending to the model
  const formattedMessages = messages.map(m => {
    if (m.role === 'assistant' && m.toolInvocations) {
      m.toolInvocations.forEach(ti => {
        if (ti.toolName === 'generateImage' && ti.state === 'result') {
          ti.result.image = `redacted-for-length`;
        }
      });
    }
    return m;
  });

  const result = streamText({
    model: openai('gpt-4o'),
    messages: formattedMessages,
    tools: {
      generateImage: tool({
        description: 'Generate an image',
        parameters: z.object({
          prompt: z.string().describe('The prompt to generate the image from'),
        }),
        execute: async ({ prompt }) => {
          const { image } = await experimental_generateImage({
            model: openai.image('dall-e-3'),
            prompt,
          });
          // in production, save this image to blob storage and return a URL
          return { image: image.base64, prompt };
        },
      }),
    },
  });
  return result.toDataStreamResponse();
}

In production, you should save the generated image to a blob storage and return a URL instead of the base64 image data. If you don't, the base64 image data will be sent to the model which may cause the generation to fail.

Client

Let's create a simple chat interface with useChat. You will call the /api/chat endpoint to generate the assistant's response. If the assistant's response contains a generateImage tool invocation, you will display the tool result (the image in base64 format and the prompt) using the Next Image component.

app/page.tsx

'use client';

import { useChat } from '@ai-sdk/react';
import Image from 'next/image';

export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat();

  return (
    <div className="flex flex-col w-full max-w-md py-24 mx-auto stretch">
      <div className="space-y-4">
        {messages.map(m => (
          <div key={m.id} className="whitespace-pre-wrap">
            <div key={m.id}>
              <div className="font-bold">{m.role}</div>
              {m.toolInvocations ? (
                m.toolInvocations.map(ti =>
                  ti.toolName === 'generateImage' ? (
                    ti.state === 'result' ? (
                      <Image
                        key={ti.toolCallId}
                        src={`data:image/png;base64,${ti.result.image}`}
                        alt={ti.result.prompt}
                        height={400}
                        width={400}
                      />
                    ) : (
                      <div key={ti.toolCallId} className="animate-pulse">
                        Generating image...
                      </div>
                    )
                  ) : null,
                )
              ) : (
                <p>{m.content}</p>
              )}
            </div>
          </div>
        ))}
      </div>

      <form onSubmit={handleSubmit}>
        <input
          className="fixed bottom-0 w-full max-w-md p-2 mb-8 border border-gray-300 rounded shadow-xl"
          value={input}
          placeholder="Say something..."
          onChange={handleInputChange}
        />
      </form>
    </div>
  );
}