DeepSeek’s flagship model has been ported to Huawei’s Ascend AI chips, allowing inference at scale without foreign hardware. The integration shows progress in China’s domestic AI stack, but performance gaps, tooling maturity, and ecosystem lock‑in remain open questions.
DeepSeek V4 on Huawei Ascend: What’s New, What It Actually Means, and What Still Limits the Stack

The claim
Press releases from DeepSeek and Huawei announce that DeepSeek V4 now runs natively on the Ascend 910B/910C family, completing what the companies call a “full adaptation” of the model to the Ascend ecosystem. The headline suggests that Chinese enterprises can now run large‑scale inference workloads without touching NVIDIA GPUs or other foreign silicon.
What’s actually new
- Model‑to‑chip port – DeepSeek V4, a 175‑billion‑parameter transformer, was originally trained on a mixed‑GPU/TPU cluster. The latest release includes a set of Ascend‑specific kernels and a conversion of the model weights into the MindSpore format used by Huawei’s CANN (Compute Architecture for Neural Networks) runtime. This is more than a simple ONNX export; it required custom fused operators for attention, rotary embeddings, and the hierarchical‑mix‑feed‑forward block that DeepSeek introduced in V3.
- Benchmark numbers – In Huawei’s own tests, the Ascend 910B achieved 68 % of the throughput of an NVIDIA H100 for the same batch size (8‑token per request) while consuming roughly 55 % of the power. For batch‑size‑32 inference, the latency gap narrowed to 1.2 ms (Ascend: 9.8 ms, H100: 8.6 ms). These figures are comparable to the numbers reported for the OpenAI‑compatible Llama‑2‑70B on Ascend earlier this year, suggesting the kernel optimizations are now mature enough for a 175B model.
- Software stack maturity – The integration ships with a MindSpore‑compatible inference library that mirrors the API of DeepSeek’s original Python client. It also includes a set of profiling tools (CANN Profiler, Ascend Model Zoo) that let developers measure per‑operator latency and memory usage. The library supports dynamic batching, a feature that many Chinese cloud providers have been demanding for multi‑tenant SaaS workloads.
- Ecosystem signals – At the Kunpeng Ascend Developer Conference 2026, Huawei announced a roadmap that extends Ascend support to training of models up to 300 B parameters, with a planned “Ascend‑AI‑Hub” marketplace for third‑party models. DeepSeek V4’s successful inference port is being used as a reference case for that roadmap.
Why it matters (or doesn’t)
- Supply‑chain independence – For companies that must keep data and compute within China’s regulatory perimeter, the ability to run a state‑of‑the‑art LLM without importing NVIDIA GPUs is a concrete advantage. It also reduces exposure to the export‑control restrictions that have affected other domestic chip projects.
- Cost profile – Power consumption is a major component of inference TCO. The reported 45 % power reduction translates to roughly $0.08 per 1 M tokens on Ascend versus $0.14 on an H100‑based cluster, assuming comparable cloud pricing. That could make large‑scale chat‑bot deployments economically viable for midsize firms.
- Benchmark parity is still limited – The throughput numbers are promising but still fall short of the 80‑90 % range that NVIDIA claims for its latest Hopper GPUs on the same model. For latency‑critical applications (e.g., real‑time translation), the extra millisecond can be noticeable.
- Tooling friction – While MindSpore has improved, the developer experience is still less polished than PyTorch or TensorFlow. Debugging kernel failures often requires diving into CANN logs, and the community around Ascend‑specific extensions is relatively small. This raises the barrier for startups that lack in‑house AI engineers.
- Model updates – DeepSeek V4 is a static release. Future improvements (e.g., sparsity, retrieval‑augmented generation) will need to be re‑engineered for Ascend, which could slow the iteration cycle compared to the open‑source ecosystem that moves on PyTorch.
Limitations and open questions
| Area | Current state | Open issue |
|---|---|---|
| Training | Ascend supports up to 300 B parameters in theory, but no public training runs of DeepSeek‑scale models have been demonstrated. | Will the same kernel optimizations that help inference also accelerate pre‑training, or will training remain GPU‑centric? |
| Precision | The model runs in FP16 with occasional FP8 kernels for matmul. | FP8 support is still experimental; numerical stability for longer context windows has not been fully validated. |
| Ecosystem | MindSpore, CANN, and Ascend Model Zoo are officially supported. | Third‑party libraries (e.g., Hugging Face Transformers) still require a compatibility shim, which adds latency and maintenance overhead. |
| Deployment | Huawei Cloud offers Ascend‑based instances; several Chinese SaaS vendors have begun beta testing. | Pricing and availability outside mainland China remain unclear, limiting global rollout. |
| Benchmark transparency | Benchmarks are released by Huawei and DeepSeek under NDA to select partners. | Independent reproducibility is limited; the community has not yet published a head‑to‑head comparison on a public benchmark suite like MLPerf Inference v4.1. |
Bottom line
DeepSeek V4’s adaptation to Huawei’s Ascend chips is a concrete step toward a self‑contained AI stack in China. The integration delivers respectable throughput and a clear power advantage, which matters for large‑scale inference services that must stay within domestic hardware ecosystems. However, the performance gap to the latest NVIDIA GPUs, the relative immaturity of the MindSpore tooling, and the lack of publicly verifiable benchmarks mean the claim of “parity” should be taken with caution. The real test will be whether third‑party developers can adopt the stack without a steep learning curve and whether future model upgrades can be ported as smoothly as the current release.
For more technical details, see the official DeepSeek V4 release notes, the Huawei Ascend 910B documentation, and the MindSpore CANN API reference.

Comments
Please log in or register to join the discussion