Google Debuts Gemini 2.5 Computer Use: AI That Clicks, Types, and Scrolls Like Humans
Share this article
Google DeepMind has taken a significant leap towards realizing practical, autonomous web agents with the public preview release of Gemini 2.5 Computer Use. This new model, built upon the foundation of Gemini 2.5 Pro, is explicitly designed to understand and interact with web user interfaces (UIs) in a fundamentally human-like way: clicking buttons, typing text, and scrolling pages. It represents a concrete step beyond simple information retrieval, moving into the realm of AI actively performing tasks within digital environments.
How It Works: Reasoning and Acting in the Browser
Unlike traditional AI assistants that might retrieve information about a webpage, Gemini 2.5 Computer Use operates inside it. Here's the process:
- User Prompt: A user provides a natural language instruction, such as "Open Wikipedia, search for 'Atlantis,' and summarize the history of the myth in Western thought."
- UI Analysis: The model fetches the target URL and analyzes screenshots of the web page to understand its structure and interactive elements.
- Autonomous Execution: Using an iterative looping function, the model plans and executes the necessary steps (navigating to the site, locating the search bar, typing the query, finding the relevant section, extracting text).
- Reasoning Transparency: Crucially, the model outlines its reasoning and actions step-by-step in a visible text box for the user, providing insight into its decision-making process.
- Confirmation for Sensitive Actions: For tasks involving potential risks (like purchases), the model is designed to request explicit user confirmation before proceeding.
Performance and Positioning
Google claims Gemini 2.5 Computer Use outperforms competing web-browsing models from OpenAI and Anthropic in both accuracy and latency, based on benchmarks like Online-Mind2Web, a framework specifically designed to evaluate web agents. Demo videos (sped up) showcase the model updating a CRM record and reorganizing notes on Google's discontinued Jamboard platform.
"The model's ability to maintain context of its recent actions within a specific UI allows it to perform sequences of tasks more effectively," the announcement noted, highlighting the advantage of its iterative approach. It's primarily targeted at web browsers but also shows "strong promise" for mobile interactions.
Availability and Trying It Out
Developers and interested users can access Gemini 2.5 Computer Use now:
* Via the Gemini API in Google AI.
* Through Vertex AI.
* Experiment with a demo version hosted by Browserbase.
Safety and Acknowledged Limitations: Proceed with Caution
Recognizing the inherent risks of agents acting autonomously online, Google has incorporated safety controls:
- Developers can configure the model to prevent specific actions like bypassing CAPTCHAs, compromising data security, or attempting to control sensitive systems like medical devices.
- The model can be instructed to require user confirmation for predefined sensitive operations.
Perhaps more importantly, Google explicitly stated in the model's system card that it inherits the well-documented limitations of large foundation models, including:
- Hallucinations: Generating incorrect or nonsensical information.
- Limited Causal Understanding: Struggling with complex cause-and-effect reasoning.
- Difficulty with Complex Logic and Counterfactuals: Challenges in intricate deduction or reasoning about hypothetical scenarios.
This transparency aligns with growing industry awareness, highlighted by recent Anthropic research showing models can misinterpret harmless information as unethical or illegal.
The Evolving Landscape of Web Agents
Gemini 2.5 Computer Use enters a rapidly developing field. OpenAI and Anthropic have released similar web-interaction capabilities, and Google itself previously experimented with Project Mariner, a Chrome extension for task automation. This convergence signals a clear industry direction: moving beyond AI as a conversational partner towards AI as an active, task-completing agent within our digital workspaces. While the promise of seamless automation is compelling, Google's candid admission of the model's limitations serves as a crucial reminder that robust, reliable, and safe autonomous web interaction remains a complex challenge demanding careful deployment and continuous oversight. The era of AI navigating the web for us is dawning, but it arrives with both significant potential and necessary caution.
Source: Based on reporting by Webb Wright for ZDNet (October 9, 2025).