HappyHorse anonymously tops AI video blind test, Alibaba TaoTian and Sand.ai are both suspected

robot
Abstract generation in progress

According to 1M AI News monitoring, an anonymous model named HappyHorse-1.0 topped the Video Arena leaderboard on the AI video evaluation platform Artificial Analysis last week. It took first place in both text-to-video and image-to-video tracks (no audio category). ByteDance’s Seedance 2.0 was pushed down to second. In the audio category, Seedance 2.0 still leads by a narrow margin. There has been no press conference, no technical blog, no company attribution, and to date, no one has publicly claimed it.

The Video Arena ranking is based on an Elo blind testing system, where users cast a vote to choose between two generated videos without knowing the models’ identities. HappyHorse has been on the leaderboard for a shorter time; its comparison sample size is about 3,500 times, which is less than half of Seedance 2.0. Its confidence interval is wider (±12-13 points), but its lead in the no-audio tracks (about 76 points for text-to-video and about 48 points for image-to-video) is still far beyond the margin of error.

Based on the website’s language order (Chinese and Cantonese come before English) and the “HappyHorse” 2026 Year of the Horse internet meme, industry observers judge that the model comes from a China-based team. The two mainstream claims are:

  1. Multiple industry social media accounts say the model comes from Alibaba Taotian Group’s Future Life Lab, led by Zhang Di. Zhang Di previously served as Vice President of Technology at Kuaishou; starting in 2024, he led the development of Kuaishou’s Keling AI and released Ling 2.0 Master Edition in April 2025, and returned to Alibaba in November that same year
  2. On X, user Vigo Zhao conducted a point-by-point comparison and found that HappyHorse matches Sand.ai—an AI video startup’s daVinci-MagiHuman—open-sourced in March this year across multiple benchmark metrics. The website structure is also highly similar. Sand.ai was founded by Cao Yue, the first author of Swin Transformer, and the industry calls it “DeepSeek in the AI video space”

HappyHorse’s official website shows the model has 15 billion parameters, a 40-layer self-attention Transformer, and uses a Transfusion architecture (within the same model, jointly handles text autoregressive prediction and video audio diffusion generation). It performs 8-step inference, outputs 1080p videos with synchronized audio, supports lip-sync synchronization in seven languages: Chinese, English, Japanese, Korean, German, French, and Cantonese. It is fully open-source and allows commercial use.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin