Alibaba releases Qianwen 3.5, with performance comparable to Gemini 3, and the token price is only 1/18 of its price.

2026-02-17 09:25:49

Abstract generation in progress

Year of the Snake concludes, and Alibaba’s more powerful Qwen 3.5-Plus model debuts.

On Lunar New Year’s Eve, February 16th, Alibaba open-sourced the new generation large model Qwen 3.5-Plus. Qwen 3.5 is pre-trained on a mixture of textual and visual data, achieving a native multimodal breakthrough. It performs excellently across comprehensive benchmarks in reasoning, programming, agent intelligence, and more, and has won multiple top performance awards in authoritative visual understanding evaluations.

The core breakthrough of Qwen 3.5 lies in systematically solving the “efficiency-accuracy” paradox at the architecture level for large models. Through a hybrid attention mechanism, the model dynamically focuses on long texts, eliminating the waste of full attention computation; while the ultra-sparse MoE architecture activates only about 5% of the total parameters—170 billion out of 3,970 billion—using knowledge stored in the entire 3,970-billion-parameter reserve, drastically reducing inference costs.

Alongside efficiency improvements, native multi-token prediction enables the model to progress from “word-by-word” to “multi-step planning,” nearly doubling response speed. Stability optimizations such as attention gating, which won the NeurIPS best paper award by the Tongyi team, provide systemic guarantees for these radical innovations, ensuring super-large-scale training runs stably. These four major technologies collectively aim to: use less compute power to awaken stronger intelligence.

Qwen 3.5-Plus has been immediately integrated into the Qwen app and PC versions. Developers can download the new model from the Modao community and HuggingFace, or access API services directly via Alibaba Cloud Bailing.

Performance comparable to Gemini 3 Pro, with high cost-effectiveness

According to Alibaba, the open-sourced new generation large model Qwen 3.5-Plus from Alibaba matches the performance of Gemini 3 Pro, ranking as the world’s strongest open-source model. Qwen 3.5 achieves a comprehensive overhaul of the underlying model architecture. The released Qwen 3.5-Plus version has a total of 397 billion parameters, with only 17 billion activated, outperforming the 1-trillion-parameter Qwen 3-Max model, while reducing deployment memory by 60%, greatly improving inference efficiency, with maximum throughput up to 19 times higher.

In terms of price, the API cost for Qwen 3.5-Plus is as low as 0.8 yuan per million tokens, only 1/18 of Gemini 3 Pro.

Four major technological breakthroughs: from architectural innovation to system stability

The core technological breakthroughs of Qwen 3.5 are reflected in four innovative dimensions. First is the hybrid attention mechanism, enabling the model to “read with detail and overview.” Traditional large models processing long texts require full attention across all tokens, which consumes enormous computational resources as text length increases—this is a key bottleneck limiting long-context capabilities. Qwen 3.5 dynamically allocates attention resources, focusing deeply on important information and skimming less critical parts, achieving simultaneous improvements in efficiency and accuracy.

Second is the ultra-sparse MoE architecture. Conventional dense models activate all parameters during inference, with higher parameter counts leading to higher computational costs. The MoE innovation activates only the most relevant “expert” subnetworks based on input content. Qwen 3.5 pushes this to the extreme—out of 3,970 billion total parameters, only 170 billion are active at a time, enabling the use of less than 5% of the total compute to access the entire knowledge base, significantly lowering inference costs.

Third is the native multi-token prediction capability. Traditional models generate tokens sequentially, limiting inference efficiency. Qwen 3.5 learns during training to jointly predict multiple subsequent positions, nearly doubling inference speed. This “multi-step planning” ability benefits high-frequency scenarios like long text generation, code completion, multi-turn conversations, providing responses close to “instant reply.”

Finally, system-level training stability optimizations ensure these architectural innovations run reliably at super-large scale. For example, the attention gating mechanism, awarded the NeurIPS 2025 Best Paper by the Tongyi team, adds an “intelligent switch” at the attention layer output, akin to a faucet controlling information flow—preventing effective information from being drowned out and avoiding amplification of irrelevant data, thus improving output accuracy and long-context generalization. Deep improvements like normalization strategies and expert routing initialization address stability issues at different stages, jointly ensuring robust large-scale training.

A new human-computer interaction paradigm: from “response” to “operation”

Unlike traditional chatbots, Qwen 3.5 no longer merely responds. Its visual intelligence capabilities allow it to “view” screens on mobile and PC, accurately understand interface elements’ positions and functions, and autonomously perform operations. In official demos, users only need to give natural language commands, and the model can complete tasks across apps on mobile or handle data organization, multi-step automation, and complex workflows on PC, elevating human-machine collaboration to a new level.

This capability stems from its advanced visual understanding technology. Qwen 3.5 can precisely locate screen elements, recognize buttons, text boxes, icons, and their functions, then simulate clicks, swipes, and inputs. By encoding visual content and semantic parsing, AI gains “visual” and “manual” abilities to interact with the digital world. Users can choose local or cloud deployment based on their needs, balancing computational efficiency and data control.

Cross-application collaboration is another breakthrough. In demos, the model can extract information from emails, read spreadsheet data, and send messages via communication apps—breaking down data silos between traditional applications. It automates multi-step workflows by acting as a “user agent,” efficiently coordinating various apps. This evolution from single tools to versatile digital assistants opens new horizons for human-AI collaboration.

6 minutes 48 seconds—from a sketch to code: how powerful is Qwen 3.5’s “mind-reading”?

Even more impressive is Qwen 3.5’s visual programming ability. In a demo video, a user sketches a webpage layout, and within 6 minutes 48 seconds, the model converts it into a structured, directly runnable webpage code, even matching high-quality images automatically. This “from sketch to product” capability demonstrates deep understanding of visual information—recognizing that circles may be buttons, lines indicate layout divisions, inferring design intent like “this is a navigation bar” or “this is a content area,” and generating corresponding HTML, CSS, and JavaScript logic.

Deeper technical details reveal that this ability stems from Qwen 3.5’s native multimodal architecture. Unlike previous approaches that simply concatenated visual encoders with language models, Qwen 3.5 integrates text and visual data deeply during pre-training, enabling simultaneous understanding of pixel-level positional information and semantic abstractions. Data shows that its context window extends to 1 million tokens, capable of directly processing two-hour videos—meaning it can watch an entire movie and organize plot, characters, and visual styles into documents or code. This cross-modal “panoramic” memory far exceeds human single-pass processing capacity.

Risk warning and disclaimer

Market risks are present; investments should be cautious. This article does not constitute personal investment advice and does not consider individual users’ specific investment goals, financial situations, or needs. Users should evaluate whether any opinions, viewpoints, or conclusions herein are suitable for their circumstances. Invest at your own risk.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.