Chinese startup Ziyouliangji develops AI music platform Hitto to lower barriers to music creation, targeting both consumer and commercial markets with specialized technology for Chinese-language music.
As the competition in large AI models shifts from parameter races to real-world applications, vertical-focused AI startups are beginning to carve out their niches. Among them is Ziyouliangji Information Technology, founded in 2023, which is tackling one of the most complex creative domains: music. The company's core offering, Hitto (YinChao), represents an ambitious attempt to democratize music creation through artificial intelligence.
Unlike many AI companies chasing general-purpose large models, Ziyouliangji has deliberately chosen the high-barrier music sector. This strategic focus reflects a growing trend in China's AI landscape—specialized applications that address specific market needs rather than competing directly with tech giants in general AI capabilities.
The inspiration behind Hitto comes from personal experience. Jiang Tao, the company's CTO, shared how eight years ago he attempted to create a song for his wife as a wedding anniversary gift but was deterred by the complicated and expensive traditional music production process. This experience planted the seeds for what would eventually become Ziyouliangji's mission.
"Music democratization is a concept we repeatedly emphasize," Jiang noted, highlighting how traditional music creation has long been limited to those with formal training or significant financial resources.
In 2024, as end-to-end music generation models matured, Jiang assembled a cross-disciplinary team combining algorithm expertise with musical backgrounds to develop a proprietary music foundation model. This technical foundation has proven crucial, as music generation presents unique challenges compared to text or image generation.
"Music requires handling ultra-long context, melodic structures, and emotional expression," explains the technical team. "Chinese songs add another layer of complexity with unique linguistic features such as tones and soft pronunciations. This is likely one reason overseas AI music models have struggled to fully adapt to the Chinese-language market."
Technically, Hitto's strength lies in its fully self-developed pipeline. The team adopted a hybrid AR+NAR (autoregressive + non-autoregressive) architecture, enabling the model to maintain coherent song structures while delivering refined local details. This approach balances the need for overall musical coherence with the nuanced details that make music compelling.
The platform's multimodal capabilities allow it to understand text, images, audio, and even video inputs in a unified representation space. Users can generate complete songs simply by entering a sentence, uploading a photo, or describing an emotion—a significant departure from traditional music production workflows.
In the latest Hitto V3.0 release, the team has made substantial improvements to AI vocal performance quality. The model can now produce subtle singing techniques such as humming, vocal runs, and breathy vocals, while adjusting emotional delivery according to lyrical content. This attention to vocal nuance addresses one of the common criticisms of AI-generated music—its tendency to sound technically proficient but emotionally flat.
Simultaneously, the company has tackled a persistent issue in AI music: songs that sound smooth but lack memorable hooks. By optimizing melody and arrangement generation, Hitto now produces more catchy and emotionally engaging compositions that better connect with listeners.
Ziyouliangji's current user base primarily consists of ordinary consumers, with numerous life-oriented creative examples emerging on the platform. These include truck drivers turning poems written on cigarette boxes into songs, families using photo-based song generation to document children's growth, and users transforming heartbreak into musical expression. For many, music is no longer viewed as a professional skill but as a new form of emotional expression.
Beyond consumer entertainment, AI music is finding commercial applications. The lyrics, composition, and vocals for "AI For Good," the English theme song of the 2025 World Artificial Intelligence Conference, were partially generated using the Hitto model. This demonstrates the platform's potential beyond casual use into professional and institutional contexts.
The company has also begun collaborating with organizations in education, healthcare, and mental wellness to explore AI music's therapeutic applications. These partnerships suggest a recognition that music's emotional resonance can be harnessed for purposes beyond entertainment.
As the AI music industry matures, competition is shifting from whether music can be generated to whether it can genuinely resonate with listeners. Ziyouliangji aims not merely to build a music generation tool but to provide ordinary people with meaningful ways to express themselves through music.
In an era where AI capabilities are increasingly commoditized, this emphasis on emotion and creative freedom represents a distinctive direction among Chinese AI startups. Rather than competing on raw technical power, Ziyouliangji is focusing on solving real problems for specific audiences—a strategy that may prove more sustainable in the long run.
The company's participation in the BEYOND Expo 2026 suggests growing industry recognition of their approach. As they continue to refine their technology and expand their applications, the question remains whether AI-generated music can truly capture the emotional depth that makes music meaningful to human listeners. For now, Ziyouliangji is betting that their specialized approach to Chinese-language music creation will strike the right balance between technical capability and artistic expression.
For more information about Hitto and its capabilities, visit the official platform page. The company's progress can also be followed through updates from the BEYOND Expo where they showcased their latest developments.

Comments
Please log in or register to join the discussion