Chinese AI Firms race to Develop Text-to-Video tools amid rising competition

In a rapidly evolving technological landscape, Chinese artificial intelligence (AI) companies are vying for dominance in the burgeoning text-to-video market. Start-ups such as Zhipu AI and industry giants like ByteDance have made significant strides in launching their video-generation tools in recent weeks. However, they face stiff competition from local rivals and the challenge of distinguishing themselves from one another.

The competition in text-to-video generation was sparked earlier this year when OpenAI unveiled its pioneering model, Sora, in February, which has since set a benchmark for the industry. Although Chinese firms are currently lagging a few months behind in developing similar capabilities, analysts suggest they are well-positioned to close the gap rapidly thanks to China's substantial investments in AI technology.

According to Lu Yanxia, research director for emerging technology at IDC China, the interest in text-to-video models has surged with the release of various tools by multiple firms. ByteDance recently introduced its own version called Jimeng, which is now available on local Android stores and differentiates itself by allowing users to input both text and image prompts to create clips of up to 12 seconds—currently the longest offering on the market.

Other entrants, such as Kuaishou Technology, have released models that generate clips up to 10 seconds long, while Zhipu AI's Qing and Shengshu AI's Vidu produce shorter videos ranging from four to six seconds. Notably, Shengshu AI has gained attention for its rapid video generation capabilities, producing a four-second video in under 30 seconds, a significant advantage over competitors.

Despite the aggressive push into this market, an employee from one of the firms mentioned that the technology among Chinese AI companies tends to be homogenized, with little variation in the core models. This means that differentiation is likely to hinge upon the range of services offered and the specific industries targeted.

All four major players have adopted a freemium model, enabling users to trial their services at no cost, albeit with longer wait times during peak usage. Users can opt for various pricing plans to receive improved service, including faster processing times and higher-definition output.

IDC’s Lu expects that these text-to-video models will initially find adoption in the internet sector, particularly in live streaming and video gaming, with subsequent applications in areas like smart cities and manufacturing. "This will be the main competitive field for generative AI technologies," Lu stated.

As Chinese firms intensify their efforts to establish a foothold in the text-to-video market, the race for innovation and consumer preference continues, promising significant developments in the AI domain.

Chinese AI Firms race to Develop Text-to-Video tools amid rising competition

Issue that matters

Inspiration

Court Jails Father of Three for Currency Racketeering in Enugu

Language