HarmonyOS 7 Beta Puts Agents at the OS Layer, but the Hard Evidence Is Still Missing

Huawei is pitching HarmonyOS 7 as an agent-first operating system, with Xiaoyi moving from assistant UI toward system-level task execution. The interesting part is not the slogan, it is the proposed control plane for skills, screens, apps, devices, and large models.

Huawei used HDC 2026 to introduce the HarmonyOS 7 Developer Beta, and the company is framing it as a shift from an app-centered mobile OS to an agent-centered task platform. The headline pieces are HarmonyOS Intelligence, the new HarmonyOS Agent Framework 2.0, deeper Xiaoyi integration, system capabilities exposed as callable Skills, and GUI control that can read screen state and operate apps even when those apps have not published explicit APIs.

That is a meaningful architectural claim, but it is not yet a measured ML breakthrough. The announcement, as reported by Pandaily, gives ecosystem numbers, 66 million HarmonyOS 6 devices, more than 11 million registered developers, over 400,000 apps and services, and more than 17,000 overseas apps and services. It does not provide model names, parameter counts, latency numbers, task success rates, benchmark results, safety evaluations, or developer documentation detailed enough to judge how reliable the agent stack is in practice. For developers, the useful reading starts at Huawei's HarmonyOS developer portal, not the stage language.

What's Claimed

Huawei's claim is that HarmonyOS 7 makes the operating system itself agent-friendly. In the older assistant model, Xiaoyi could answer questions, launch apps, and trigger some predefined actions. In the HarmonyOS 7 pitch, Xiaoyi becomes a system-level coordinator that can decompose a user request, select Skills, call tools, inspect GUI state, and execute multi-step tasks across devices and services.

The most concrete new term is HarmonyOS Agent Framework 2.0. Huawei describes it as infrastructure for task decomposition, tool calling, and autonomous execution. Those words matter because they map directly to the agent pattern that has become common in LLM systems: take a high-level instruction, split it into smaller operations, choose tools, observe results, revise the plan, and continue until a stopping condition is reached. On phones and PCs, the tools are not just search APIs or code interpreters. They include contacts, calendars, files, payment flows, device settings, app screens, notifications, camera input, and cross-device state.

Huawei also says system capabilities have been Skill-ified. In plain engineering terms, a Skill should be a typed, permissioned capability that an agent can call instead of guessing how to operate a UI. A weather Skill, a ride-hailing Skill, a photo-editing Skill, or a device-control Skill can expose inputs, outputs, constraints, and confirmation requirements. That is cleaner than making an LLM tap through a UI, because the OS can validate arguments, enforce permissions, and log what happened.

The more aggressive claim is GUI control. Agent Framework 2.0 reportedly opens graphical interface operation for the first time, allowing the agent to read on-screen information and simulate clicks or use accessibility interfaces. This is the bridge for legacy apps. If an app has not published a Skill or API, the system can still try to operate it by observing the interface, much like browser agents operate web pages. That can make demos look impressive. It also moves the system into a failure mode ML practitioners know well: visual parsing, ambiguous UI state, hidden side effects, pop-up interruptions, and brittle workflows.

Huawei's phrase for the product concept is Intention as Service. Strip away the packaging and the goal is straightforward: turn a user request such as "book the same restaurant as last Friday for two people and send the address to my partner" into a sequence of retrieval, reasoning, app operation, confirmation, and messaging. The practical value depends less on whether the phrase is catchy and more on whether the system can recover when the restaurant app changes its UI, a payment step appears, a contact name is ambiguous, or the user asks for something that requires consent.

What's Actually New

The OS-level placement is the important part. Most current LLM agents live in apps, browsers, IDEs, or cloud services. They can call tools that a developer registers, but they sit above the operating system. HarmonyOS 7's proposal is closer to making the OS a broker for agent actions. If implemented well, that gives the system better context and better enforcement than an assistant bolted onto individual apps.

A real agent platform needs three layers. The first is a model layer, usually a mix of on-device models and cloud models. Huawei has not named the specific HarmonyOS 7 models in this announcement, but it says the system supports stable operation of on-device and cloud-side large models. The second is a capability layer, where Skills expose structured actions. The third is an execution layer, where the OS controls permissions, UI automation, device routing, and user confirmations. HarmonyOS 7 appears to be an attempt to make those layers native rather than optional.

That is different from a chatbot with plugins. A plugin can answer a travel query or create a calendar event. An OS agent can potentially coordinate the state of the current screen, nearby devices, app permissions, user identity, and hardware sensors. For Huawei, that matters because HarmonyOS is not only a phone OS. The company wants the same agent layer to span phones, tablets, watches, PCs, cars, TVs, and smart-home devices. Cross-device execution is where OS ownership becomes more useful than a standalone assistant app.

There is also a developer ecosystem angle. Huawei's app stack already pushes developers toward HarmonyOS-native development through ArkTS, ArkUI, DevEco Studio, and HarmonyOS Ability Package apps. The open-source sibling, OpenHarmony, gives developers a view into parts of the broader system direction, while Huawei's commercial HarmonyOS remains the consumer platform. If Skills become a first-class packaging target, developers may need to think about app features less as screens and more as callable operations with clear contracts.

That would be a real shift in app design. A food delivery app, for example, would not only expose pages for searching restaurants and checking out. It would expose operations such as search restaurants, reorder previous meal, apply dietary constraint, estimate arrival time, request confirmation, and place order. The UI remains useful for humans, but the agent gets a structured route through the same business logic. The apps that do this well will be easier for Xiaoyi to operate. The apps that do not may be left to screen scraping and click simulation, which is usually less reliable.

The competitive context is obvious. Google is pushing Gemini deeper into Android and its app ecosystem. Apple is building Apple Intelligence around on-device context, private cloud processing, and developer-facing mechanisms such as App Intents through Apple's developer resources. Huawei's version is constrained by its own ecosystem, especially outside China, but it also has one advantage: it can design the OS, assistant, device graph, and native app model as one stack.

The developer beta label matters. This is not a consumer proof point yet. Developer betas are where APIs, permission models, logs, simulators, and documentation either mature or expose gaps. The most useful questions for HarmonyOS 7 are not whether Xiaoyi can complete a staged demo. They are whether third-party developers can register Skills cleanly, test them locally, define failure behavior, protect sensitive actions, and measure agent performance across app versions.

Limitations

The largest missing piece is evaluation. Huawei reported ecosystem scale, but no agent benchmark results. There are no public numbers for task completion rate, average steps per task, latency, rollback rate, hallucinated action rate, tool selection accuracy, GUI grounding accuracy, or user intervention frequency. Those are the metrics that would tell us whether HarmonyOS 7 is a dependable agent runtime or an ambitious assistant interface.

Model details are also absent. The announcement does not identify the specific large models used by Xiaoyi in HarmonyOS 7, their context limits, their multimodal capabilities, or how the system decides between local and cloud inference. For an ML practitioner, that matters. A small on-device model can be fast and private, but may struggle with long-horizon planning and messy UI interpretation. A cloud model can handle harder reasoning, but adds latency, connectivity dependence, cost, and data-governance questions. A serious agent platform usually needs routing policies, not a single model choice.

GUI control is useful, but it should be treated as a fallback, not the main abstraction. Screen-reading agents are fragile because interfaces are built for people, not machines. A button label may change. A list may reorder. A modal may obscure the target. A localization setting may alter text. An app may lazy-load content. Accessibility APIs help, but they do not automatically solve intent, authorization, or business logic. Structured Skills are the better path when developers can provide them.

Security and consent are the second hard problem. Once an agent can operate apps, it can also make mistakes with consequences. Sending a message, deleting a file, confirming a payment, changing a privacy setting, or booking a service are not equivalent to summarizing a document. HarmonyOS 7 will need clear permission boundaries, action previews, confirmation prompts, audit trails, and policy controls for enterprises. A system-level agent without strong guardrails becomes a high-trust automation surface with many ways to fail.

There is also the prompt injection problem. If Xiaoyi can read screen content, then malicious or careless content on a page could try to influence the agent. This is not theoretical. Any agent that mixes user instructions, app content, web content, and tool calls needs instruction hierarchy and content isolation. The OS must distinguish between what the user asked, what an app displayed, what a web page says, and what a tool result returned. Without that separation, GUI agents can be tricked into following instructions embedded in untrusted content.

The ecosystem numbers need context too. Huawei's 66 million HarmonyOS 6 devices and 400,000 apps and services are large, but the agent value depends on how many high-frequency apps expose meaningful Skills. A long tail of apps does not help much if banking, travel, food delivery, messaging, productivity, and enterprise workflows lack structured agent interfaces. The reported 17,000 overseas apps and services suggests progress, but HarmonyOS remains much stronger in China than in many global markets.

The practical applications are still plausible. On phones, agents can compress repetitive tasks such as booking, search, scheduling, file organization, settings changes, and message drafting. On PCs, they can manage document workflows and app-to-app operations. In cars, they can coordinate navigation, communication, climate, and entertainment while reducing manual interaction. In smart-home contexts, they can route intent across appliances and sensors. These are useful domains because the OS has context and the tasks are often procedural.

The harder question is reliability under normal user messiness. Agent demos usually assume clean accounts, supported apps, good connectivity, and simple preferences. Real users have duplicate contacts, expired sessions, partial permissions, inconsistent app states, regional service differences, and habits the model has never seen. HarmonyOS 7's agent architecture will be judged by how often it asks for the right clarification and how safely it stops when it cannot complete a task.

For now, HarmonyOS 7 Developer Beta is best read as an architectural milestone, not proof of mature autonomous computing. Huawei is pushing the same broad direction as Google and Apple, but with a stronger claim that the agent should be part of the operating system's control plane. That could be technically significant if Skills are well specified, GUI automation is constrained, and the model layer is evaluated honestly. Until Huawei publishes model details, benchmark results, and developer-facing failure semantics, the sober interpretation is simple: the architecture is interesting, the product claim is plausible, and the evidence is incomplete.