Huawei Claims Successful Post-Training of 1.6T-Parameter Model Using Domestic Ascend 910C Chips

Huawei-led research team announces completion of full-parameter post-training for DeepSeek's V4-Pro model using 1,000 Ascend 910C accelerators, representing a potential breakthrough in China's AI chip capabilities under export controls.

A research consortium led by Huawei Technologies claims to have successfully completed full-parameter post-training of DeepSeek's V4-Pro, a 1.6-trillion-parameter AI model, using a cluster of at least 1,000 Huawei Ascend 910C AI accelerators. The collaboration included the Shenzhen Loop Area Institute, the Shenzhen campus of Harbin Institute of Technology, and the Shenzhen Research Institute of Big Data, with the work being acknowledged by the Shenzhen municipal government.

The DeepSeek logo against a hexagonal textured background

This achievement, if verified, would mark a significant milestone for China's domestic AI chip capabilities, particularly in addressing the training portion of AI model development that Chinese firms have struggled to move off Nvidia hardware amid US export controls. While Chinese accelerators have demonstrated competitive performance at inference—where trained models process prompts—they have historically lagged in training, where model weights are recalculated across massive datasets.

Technical Specifications and Capabilities

The Ascend 910C represents Huawei's current flagship AI accelerator, a dual-die processor that in earlier testing by DeepSeek achieved approximately 60% of the inference performance of Nvidia's H100 GPU. This positions it as potentially the most powerful domestic AI accelerator currently available in China.

DeepSeek-V4-Pro, released in April 2024, was designed from the outset to utilize Ascend architecture, representing a strategic shift from previous models that relied primarily on Nvidia hardware for training tasks. The model's pre-training phase utilized over 32 trillion tokens of text corpus to establish its core capabilities.

Pre-Training vs. Post-Training: Understanding the Technical Distinction

The announcement specifically highlights "full-parameter post-training," which refers to the tuning phase that follows the much larger pre-training process. In AI model development:

Pre-training: Involves working through enormous text corpora to establish a model's fundamental capabilities, representing the computationally intensive initial phase
Post-training: Focuses on refining behavior through instruction-following, safety alignment, and task-specific data, adjusting the model's parameters for specific applications

The distinction is crucial because while post-training is significant, it represents less than 10% of the total computational workload compared to pre-training for large language models. Completing post-training on Ascend silicon demonstrates capability but doesn't prove the chips can handle the heavier pre-training workload from scratch.

Microsoft data center in Mount Pleasant, Wisconsin

Previous Challenges and Setbacks

This claim comes amid previous difficulties in training large models on Chinese hardware. In August 2023, reports indicated that DeepSeek couldn't complete a single successful training run for its R2 model using Ascend chips, even with Huawei engineers providing direct support. The failures were attributed to:

Unstable chip performance
Slow chip-to-chip interconnects
Gaps in Huawei's CANN software stack, its substitute for Nvidia's CUDA

These challenges forced DeepSeek to rely on Nvidia GPUs for training while using Ascend accelerators primarily for inference tasks. The current announcement suggests potential progress in addressing these limitations.

Analysis of the Claim

The announcement from the Shenzhen research group lacks critical technical details that would allow independent verification:

No performance benchmarks comparing results to equivalent training on Nvidia hardware
No timeline for how long the post-training process took
No efficiency metrics indicating how effectively the 1,000-chip cluster was utilized
No information about software stack improvements or interconnect performance

These omissions align with a pattern of unverified claims from Chinese state-affiliated entities regarding AI capabilities. DeepSeek itself has not officially commented on the reported achievement.

Broader Implications for the AI Chip Market

Should this claim be substantiated, it would represent several significant developments:

Reduced dependency on US AI hardware: China's AI industry could further reduce reliance on Nvidia GPUs for training workloads
Software ecosystem maturation: Improved CANN stack performance would address a key bottleneck
Interconnect progress: Enhanced chip-to-chip communication would be essential for larger-scale deployments
Competitive landscape: Increased pressure on Nvidia and other Western AI chip vendors

The development comes amid intensifying competition in the AI accelerator market, with companies like Nvidia, AMD, Intel, and numerous startups developing increasingly powerful chips. China's ability to field competitive domestic solutions has significant implications for global AI development and geopolitical technology dynamics.

For more information on DeepSeek's models, you can visit their official documentation. For technical details on the Ascend 910C, Huawei's AI computing product page provides specifications and capabilities.