Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Alibaba releases Qianwen 3.5, with performance comparable to Gemini 3, and the token price is only 1/18 of its price.
Year of the Snake concludes, and Alibaba’s more powerful Qwen 3.5-Plus model debuts.
On Lunar New Year’s Eve, February 16th, Alibaba open-sourced the new generation large model Qwen 3.5-Plus. Qwen 3.5 is pre-trained on a mixture of textual and visual data, achieving a native multimodal breakthrough. It performs excellently across comprehensive benchmarks in reasoning, programming, agent intelligence, and more, and has won multiple top performance awards in authoritative visual understanding evaluations.
The core breakthrough of Qwen 3.5 lies in systematically solving the “efficiency-accuracy” paradox at the architecture level for large models. Through a hybrid attention mechanism, the model dynamically focuses on long texts, eliminating the waste of full attention computation; while the ultra-sparse MoE architecture activates only about 5% of the total parameters—170 billion out of 3,970 billion—using knowledge stored in the entire 3,970-billion-parameter reserve, drastically reducing inference costs.
Alongside efficiency improvements, native multi-token prediction enables the model to progress from “word-by-word” to “multi-step planning,” nearly doubling response speed. Stability optimizations such as attention gating, which won the NeurIPS best paper award by the Tongyi team, provide systemic guarantees for these radical innovations, ensuring super-large-scale training runs stably. These four major technologies collectively aim to: use less compute power to awaken stronger intelligence.
Qwen 3.5-Plus has been immediately integrated into the Qwen app and PC versions. Developers can download the new model from the Modao community and HuggingFace, or access API services directly via Alibaba Cloud Bailing.
Performance comparable to Gemini 3 Pro, with high cost-effectiveness
According to Alibaba, the open-sourced new generation large model Qwen 3.5-Plus from Alibaba matches the performance of Gemini 3 Pro, ranking as the world’s strongest open-source model. Qwen 3.5 achieves a comprehensive overhaul of the underlying model architecture. The released Qwen 3.5-Plus version has a total of 397 billion parameters, with only 17 billion activated, outperforming the 1-trillion-parameter Qwen 3-Max model, while reducing deployment memory by 60%, greatly improving inference efficiency, with maximum throughput up to 19 times higher.
In terms of price, the API cost for Qwen 3.5-Plus is as low as 0.8 yuan per million tokens, only 1/18 of Gemini 3 Pro.
Four major technological breakthroughs: from architectural innovation to system stability
The core technological breakthroughs of Qwen 3.5 are reflected in four innovative dimensions. First is the hybrid attention mechanism, enabling the model to “read with detail and overview.” Traditional large models processing long texts require full attention across all tokens, which consumes enormous computational resources as text length increases—this is a key bottleneck limiting long-context capabilities. Qwen 3.5 dynamically allocates attention resources, focusing deeply on important information and skimming less critical parts, achieving simultaneous improvements in efficiency and accuracy.
Second is the ultra-sparse MoE architecture. Conventional dense models activate all parameters during inference, with higher parameter counts leading to higher computational costs. The MoE innovation activates only the most relevant “expert” subnetworks based on input content. Qwen 3.5 pushes this to the extreme—out of 3,970 billion total parameters, only 170 billion are active at a time, enabling the use of less than 5% of the total compute to access the entire knowledge base, significantly lowering inference costs.
Third is the native multi-token prediction capability. Traditional models generate tokens sequentially, limiting inference efficiency. Qwen 3.5 learns during training to jointly predict multiple subsequent positions, nearly doubling inference speed. This “multi-step planning” ability benefits high-frequency scenarios like long text generation, code completion, multi-turn conversations, providing responses close to “instant reply.”
Finally, system-level training stability optimizations ensure these architectural innovations run reliably at super-large scale. For example, the attention gating mechanism, awarded the NeurIPS 2025 Best Paper by the Tongyi team, adds an “intelligent switch” at the attention layer output, akin to a faucet controlling information flow—preventing effective information from being drowned out and avoiding amplification of irrelevant data, thus improving output accuracy and long-context generalization. Deep improvements like normalization strategies and expert routing initialization address stability issues at different stages, jointly ensuring robust large-scale training.
A new human-computer interaction paradigm: from “response” to “operation”
Unlike traditional chatbots, Qwen 3.5 no longer merely responds. Its visual intelligence capabilities allow it to “view” screens on mobile and PC, accurately understand interface elements’ positions and functions, and autonomously perform operations. In official demos, users only need to give natural language commands, and the model can complete tasks across apps on mobile or handle data organization, multi-step automation, and complex workflows on PC, elevating human-machine collaboration to a new level.
This capability stems from its advanced visual understanding technology. Qwen 3.5 can precisely locate screen elements, recognize buttons, text boxes, icons, and their functions, then simulate clicks, swipes, and inputs. By encoding visual content and semantic parsing, AI gains “visual” and “manual” abilities to interact with the digital world. Users can choose local or cloud deployment based on their needs, balancing computational efficiency and data control.
Cross-application collaboration is another breakthrough. In demos, the model can extract information from emails, read spreadsheet data, and send messages via communication apps—breaking down data silos between traditional applications. It automates multi-step workflows by acting as a “user agent,” efficiently coordinating various apps. This evolution from single tools to versatile digital assistants opens new horizons for human-AI collaboration.
6 minutes 48 seconds—from a sketch to code: how powerful is Qwen 3.5’s “mind-reading”?
Even more impressive is Qwen 3.5’s visual programming ability. In a demo video, a user sketches a webpage layout, and within 6 minutes 48 seconds, the model converts it into a structured, directly runnable webpage code, even matching high-quality images automatically. This “from sketch to product” capability demonstrates deep understanding of visual information—recognizing that circles may be buttons, lines indicate layout divisions, inferring design intent like “this is a navigation bar” or “this is a content area,” and generating corresponding HTML, CSS, and JavaScript logic.
Deeper technical details reveal that this ability stems from Qwen 3.5’s native multimodal architecture. Unlike previous approaches that simply concatenated visual encoders with language models, Qwen 3.5 integrates text and visual data deeply during pre-training, enabling simultaneous understanding of pixel-level positional information and semantic abstractions. Data shows that its context window extends to 1 million tokens, capable of directly processing two-hour videos—meaning it can watch an entire movie and organize plot, characters, and visual styles into documents or code. This cross-modal “panoramic” memory far exceeds human single-pass processing capacity.
Risk warning and disclaimer
Market risks are present; investments should be cautious. This article does not constitute personal investment advice and does not consider individual users’ specific investment goals, financial situations, or needs. Users should evaluate whether any opinions, viewpoints, or conclusions herein are suitable for their circumstances. Invest at your own risk.