Alibaba releases the new generation of the base model Qianwen 3.5, topping the list of the world's most powerful open-source large models

robot
Abstract generation in progress

On Lunar New Year's Eve, February 16th, Alibaba open-sourced the all-new generation large model Qwen3.5-Plus, with performance comparable to Gemini 3 Pro, topping the list of the world's most powerful open-source models.

It is reported that Qwen3.5 has achieved a comprehensive overhaul of the underlying model architecture. The released Qwen3.5-Plus version has a total of 397 billion parameters, with only 17 billion active. It outperforms larger models with over a trillion parameters like Qwen3-Max, reduces deployment VRAM usage by 60%, and significantly improves inference efficiency, with maximum inference throughput increasing up to 19 times. The API price for Qwen3.5-Plus is as low as 0.8 yuan per million tokens, only 1/18 of Gemini 3 Pro.

Unlike previous generations of Qwen large language models, Qwen3.5 has achieved an evolutionary leap from a pure text model to a native multimodal model. Qwen 3 was pretrained on pure text tokens, while Qwen3.5 is pretrained on a mixture of visual and text tokens, with substantial additions of data in Chinese, English, multilingual, STEM, and reasoning tasks. This enables the "eye-opening" large model to learn more intensive world knowledge and reasoning logic, achieving top performance comparable to the trillion-parameter Qwen3-Max base model with less than 40% of the parameters. It excels across various benchmarks including reasoning, programming, and agent intelligence. For example, Qwen3.5 scores 87.8 on the MMLU-Pro knowledge reasoning test, surpassing GPT-5.2; achieves 88.4 on the challenging GPQA doctoral-level questions, higher than Claude 4.5; sets a new record with 76.5 on instruction-following benchmark IFBench; and outperforms Gemini 3 Pro and GPT-5.2 in general agent evaluations like BFCL-V4 and Browsecomp.

Native multimodal training also brings a leap in Qwen3.5's visual capabilities: in numerous authoritative evaluations such as multimodal reasoning (MathVision), general visual question answering (RealWorldQA), text recognition and document understanding (CC_OCR), spatial intelligence (RefCOCO-avg), and video understanding (MLVU), Qwen3.5 consistently achieves top performance. In disciplines like problem-solving, task planning, and physical space reasoning, Qwen3.5 outperforms the specialized Qwen3-VL model, with significantly enhanced spatial localization and visual reasoning abilities, providing more detailed and precise inference analysis. In video understanding, Qwen3.5 supports direct input of videos up to 2 hours long (1 million tokens of context), suitable for long video content analysis and summarization. Additionally, Qwen3.5 seamlessly integrates visual understanding with coding capabilities; combined with image search and generative tools, it can convert hand-drawn sketches directly into usable front-end code, allowing a screenshot to locate and fix UI issues, making visual programming a true productivity tool.

Qwen3.5's native multimodal training was efficiently conducted on Alibaba Cloud's AI infrastructure. Through a series of technological innovations, the training throughput for mixed data of text, images, and videos nearly matches that of pure text base models, greatly lowering the barrier for native multimodal training. Meanwhile, by employing carefully designed FP8 and FP32 precision strategies, the training process remains stable when scaled to hundreds of trillions of tokens, with memory usage reduced by about 50% and training speed increased by 10%, further reducing training costs and improving efficiency.

Qwen3.5 also marks a new breakthrough from the agent framework to agent applications. It can autonomously operate smartphones and computers, efficiently complete daily tasks, support more mainstream apps and commands on mobile devices, and handle more complex multi-step operations on PCs, such as cross-application data management and automation workflows, significantly improving operational efficiency. Additionally, the Qwen team has built an extensible asynchronous reinforcement learning framework for agents, which can accelerate end-to-end processes by 3 to 5 times, and support plugin-based intelligent agents at scale up to millions.

It is reported that the Qwen3.5-Plus model has been integrated into the Qwen app and PC versions. Developers can download the new model from the Mofa Community and HuggingFace, or access API services directly via Alibaba Cloud Baolian. Alibaba will soon continue to open-source different sizes and functionalities of the Qwen3.5 series models. The more powerful flagship model, Qwen3.5-Max, will also be released soon.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned