Isa Fulford, research lead for OpenAI's new ChatGPT agent, recently delegated a personal chore to the AI: ordering a large batch of cupcakes. "I was very specific about what I wanted, and it was a lot of cupcakes," Fulford recounts. "That one took almost an hour—but it was easier than me doing it myself, because I didn't want to do it." This anecdote encapsulates the promise and limitations of OpenAI's latest innovation—a ChatGPT agent designed to automate digital tasks through a virtual browser, now capable of creating downloadable PowerPoint presentations and Excel spreadsheets.

What the ChatGPT Agent Can Do

Dubbed an "agent" in AI parlance, this tool navigates third-party websites and software independently, executing multi-step tasks based on user instructions. It merges capabilities from OpenAI's earlier projects—Operator (visual web browsing) and deep research (text-based processing)—into a unified system. Key features include:
- File generation: Producing editable PowerPoint slides and Excel sheets, potentially reducing reliance on Microsoft Office.
- Task automation: Filling online forms, using programming terminals, and interfacing with APIs for services like Google Drive and SharePoint.
- Dual browsing modes: Switching between visual (click-based) and text-based navigation for efficiency.

Yash Kumar, product lead for the agent, emphasizes its enterprise focus: "We’ve tried to build a product with a whole lot of enterprise use cases." In demos, the agent handled diverse scenarios, from planning a date night (taking ~5 minutes) to analyzing Nvidia's earnings and creating a financial slide deck (~25 minutes). Kumar notes that most tasks average 10–15 minutes, with users able to run multiple in parallel.

Rollout, Limits, and Safety

The agent launches initially for ChatGPT Pro, Plus, and Team subscribers, with Pro users getting up to 400 prompts monthly. Enterprise and Education access follows later this summer. Free users remain excluded for now. Despite its potential, the tool isn't without friction:
- Processing delays: Tasks like Fulford's cupcake order highlight how complex requests can stretch to an hour, requiring patience.
- Safety protocols: High-risk actions (e.g., on social media or financial sites) trigger a "watch mode" where users must monitor the agent in real-time. Memory integration—which could personalize tasks—is delayed due to concerns over prompt injection attacks. "It’s not that we don’t think it’s safe," Kumar clarifies. "We’re just taking an extra precaution."
- Replay feature: Users can review recordings of the agent's actions, offering transparency into its decision-making process—a glimpse into how AI might redefine web interaction.

Why This Matters for Tech and Business

This launch is a strategic move in OpenAI's monetization push, addressing the high costs of AI operations amid fierce talent competition. For developers and enterprises, it signals a shift toward AI agents handling tedious digital labor, potentially disrupting Microsoft's stronghold on productivity software. Yet, the reliance on OpenAI's models underscores ongoing negotiations between the two partners, hinting at future tensions in the enterprise AI space.

As Fulford and Kumar hint, successful agent adoption hinges on user trust and control. If these tools gain traction, they could transform not just workflows but our fundamental relationship with the internet—turning passive browsing into delegated, replayable journeys. For now, though, the era of AI agents remains an experiment in balancing ambition with practicality.

Source: WIRED