AI software company Anthropic has announced a new tool that can take control of the user’s mouse cursor and perform basic tasks on their computer.
Announced alongside other improvements to Anthropic’s Claude and Haiku models, the tool is straightforwardly called “Computer Use.” It’s available exclusively with the company’s mid-range 3.5 Sonnet model right now, via the API. Users can give multi-step instructions (Anthropic claims it can go for tens or even hundreds of steps) to accomplish tasks on the user’s computer by “looking at a screen, moving a cursor, clicking buttons, and typing text.”
Here’s how Anthropic says it works:
When a developer tasks Claude with using a piece of computer software and gives it the necessary access, Claude looks at screenshots of what’s visible to the user, then counts how many pixels vertically or horizontally it needs to move a cursor in order to click in the correct place. Training Claude to count pixels accurately was critical. Without this skill, the model finds it difficult to give mouse commands—similar to how models often struggle with simple-seeming questions like “how many A’s in the word ‘banana’?”.
It has limitations, of course. It operates by taking rapid successive screengrabs rather than working with a live video stream, so it can miss short-lived notifications or other changes. It’s still incapable of doing some common actions, like drag and drop.
Anthropic has also said it is known to be “cumbersome and error-prone” at times. A blog post about developing the tool gave one example of a way it has gone wrong in testing: It abandoned a coding task before completing it and began instead “to peruse photos of Yellowstone National Park”—perhaps one of the most human-like things an AI bot has done. (I kid.)
The tool is now in public beta but has been with partner organizations for a while, with employees of companies like Amazon, Canva, Asana, and Notion testing it in limited ways.