Get started with Computer Use
With the release of Computer Use in Claude 3.5 Sonnet, you can now direct AI models to interact with computers like humans do - moving cursors, clicking buttons, and typing text. This capability enables automation of complex tasks while leveraging Claude's advanced reasoning abilities.
The AI SDK is a powerful TypeScript toolkit for building AI applications with large language models (LLMs) like Anthropic's Claude alongside popular frameworks like React, Next.js, Vue, Svelte, Node.js, and more. In this guide, you will learn how to integrate Computer Use into your AI SDK applications.
Computer Use is currently in beta with some limitations . The feature may be error-prone at times. Anthropic recommends starting with low-risk tasks and implementing appropriate safety measures.
Computer Use
Anthropic recently released a new version of the Claude 3.5 Sonnet model which is capable of 'Computer Use'. This allows the model to interact with computer interfaces through basic actions like:
- Moving the cursor
- Clicking buttons
- Typing text
- Taking screenshots
- Reading screen content
How It Works
Computer Use enables the model to read and interact with on-screen content through a series of coordinated steps. Here's how the process works:
-
Start with a prompt and tools
Add Anthropic-defined Computer Use tools to your request and provide a task (prompt) for the model. For example: "save an image to your downloads folder."
-
Select the right tool
The model evaluates which computer tools can help accomplish the task. It then sends a formatted
tool_call
to use the appropriate tool. -
Execute the action and return results
The AI SDK processes Claude's request by running the selected tool. The results can then be sent back to Claude through a
tool_result
message. -
Complete the task through iterations
Claude analyzes each result to determine if more actions are needed. It continues requesting tool use and processing results until it completes your task or requires additional input.
Available Tools
There are three main tools available in the Computer Use API:
- Computer Tool: Enables basic computer control like mouse movement, clicking, and keyboard input
- Text Editor Tool: Provides functionality for viewing and editing text files
- Bash Tool: Allows execution of bash commands
Implementation Considerations
Computer Use tools in the AI SDK are predefined interfaces that require your own implementation of the execution layer. While the SDK provides the type definitions and structure for these tools, you need to:
- Set up a controlled environment for Computer Use execution
- Implement core functionality like mouse control and keyboard input
- Handle screenshot capture and processing
- Set up rules and limits for how Claude can interact with your system
The recommended approach is to start with Anthropic's reference implementation , which provides:
- A containerized environment configured for safe Computer Use
- Ready-to-use (Python) implementations of Computer Use tools
- An agent loop for API interaction and tool execution
- A web interface for monitoring and control
This reference implementation serves as a foundation to understand the requirements before building your own custom solution.
Getting Started with the AI SDK
If you have never used the AI SDK before, start by following the Getting Started guide.
First, ensure you have the AI SDK and Anthropic AI SDK provider installed:
pnpm add ai @ai-sdk/anthropic
You can add Computer Use to your AI SDK applications using provider-defined tools. These tools accept various input parameters (like display height and width in the case of the computer tool) and then require that you define an execute function.
Here's how you could set up the Computer Tool with the AI SDK:
import { anthropic } from '@ai-sdk/anthropic';import { getScreenshot, executeComputerAction } from '@/utils/computer-use';
const computerTool = anthropic.tools.computer_20241022({ displayWidthPx: 1920, displayHeightPx: 1080, execute: async ({ action, coordinate, text }) => { switch (action) { case 'screenshot': { return { type: 'image', data: getScreenshot(), }; } default: { return executeComputerAction(action, coordinate, text); } } }, experimental_toToolResultContent(result) { return typeof result === 'string' ? [{ type: 'text', text: result }] : [{ type: 'image', data: result.data, mimeType: 'image/png' }]; },});
The computerTool
handles two main actions: taking screenshots via getScreenshot()
and executing computer actions like mouse movements and clicks through executeComputerAction()
. Remember, you have to implement this execution logic (eg. the getScreenshot
and executeComputerAction
functions) to handle the actual computer interactions. The execute
function should handle all low-level interactions with the operating system.
Finally, to send tool results back to the model, use the experimental_toToolResultContent()
function to convert text and image responses into a format the model can process. The AI SDK includes experimental support for these multi-modal tool results when using Anthropic's models.
Computer Use requires appropriate safety measures like using virtual machines, limiting access to sensitive data, and implementing human oversight for critical actions.
Using Computer Tools with Text Generation
Once your tool is defined, you can use it with both the generateText
and streamText
functions.
For one-shot text generation, use generateText
:
const result = await generateText({ model: anthropic('claude-3-5-sonnet-20241022'), prompt: 'Move the cursor to the center of the screen and take a screenshot', tools: { computer: computerTool },});
console.log(response.text);
For streaming responses, use streamText
to receive updates in real-time:
const result = streamText({ model: anthropic('claude-3-5-sonnet-20241022'), prompt: 'Open the browser and navigate to vercel.com', tools: { computer: computerTool },});
for await (const chunk of result.textStream) { console.log(chunk);}
Configure Multi-Step (Agentic) Generations
To allow the model to perform multiple steps without user intervention, specify a maxSteps
value. This will automatically send any tool results back to the model to trigger a subsequent generation:
const stream = streamText({ model: anthropic('claude-3-5-sonnet-20241022'), prompt: 'Open the browser and navigate to vercel.com', tools: { computer: computerTool }, maxSteps: 10, // experiment with this value based on your use case});
Combine Multiple Tools
You can combine multiple tools in a single request to enable more complex workflows. The AI SDK supports all three of Claude's Computer Use tools:
const computerTool = anthropic.tools.computer_20241022({ ...});
const bashTool = anthropic.tools.bash_20241022({ execute: async ({ command, restart }) => execSync(command).toString()});
const textEditorTool = anthropic.tools.textEditor_20241022({ execute: async ({ command, path, file_text, insert_line, new_str, old_str, view_range }) => { // Handle file operations based on command switch(command) { return executeTextEditorFunction({ command, path, fileText: file_text, insertLine: insert_line, newStr: new_str, oldStr: old_str, viewRange: view_range }); } }});
const response = await generateText({ model: anthropic("claude-3-5-sonnet-20241022"), prompt: "Create a new file called example.txt, write 'Hello World' to it, and run 'cat example.txt' in the terminal", tools: { computer: computerTool, textEditor: textEditorTool, bash: bashTool },});
Always implement appropriate security measures and obtain user consent before enabling Computer Use in production applications.
Best Practices for Computer Use
To get the best results when using Computer Use:
- Specify simple, well-defined tasks with explicit instructions for each step
- Prompt Claude to verify outcomes through screenshots
- Use keyboard shortcuts when UI elements are difficult to manipulate
- Include example screenshots for repeatable tasks
- Provide explicit tips in system prompts for known tasks
Security Measures
Remember, Computer Use is a beta feature. Please be aware that it poses unique risks that are distinct from standard API features or chat interfaces. These risks are heightened when using Computer Use to interact with the internet. To minimize risks, consider taking precautions such as:
- Use a dedicated virtual machine or container with minimal privileges to prevent direct system attacks or accidents.
- Avoid giving the model access to sensitive data, such as account login information, to prevent information theft.
- Limit internet access to an allowlist of domains to reduce exposure to malicious content.
- Ask a human to confirm decisions that may result in meaningful real-world consequences as well as any tasks requiring affirmative consent, such as accepting cookies, executing financial transactions, or agreeing to terms of service.