Futures
Access hundreds of perpetual contracts
CFD
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
CFD
U.S. stock CFD derivatives
US Stocks
Access real US stocks and ETFs
HK Stocks
Trade quality Hong Kong-listed stocks
Korean Stocks
SK Hynix
Real Korean stocks and top assets
Stock Futures
High leverage, 24/7 trading
Tokenized Stocks
Backed by real stock assets
IPO Access
Unlock full access to global stock IPOs
GUSD
Mint GUSD for Treasury RWA yields
Stocks Activities
Trade Popular Stocks and Unlock Generous Airdrops
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
IPO Access
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
Recently, in discussions about Huawei's τ scaling (time scaling), I noticed that the conversations have remained superficial, without touching its essence. This is probably because many friends don't have an EE background and are unaware of the classic meaning of τ in circuits. The first time constant taught in circuit theory is τ = RC—the resistance of a wire multiplied by its capacitance gives the order of magnitude of the time it takes for a signal to travel through that wire. The longer the wire, the larger the resistance and capacitance, and the slower the signal. Within this framework, the geometric scaling of the past six decades has been reinterpreted as one implementation method of time scaling: transistors are made smaller to shorten switching delays, circuits are arranged more tightly to reduce metal interconnects and lower signal propagation delays. Geometric scaling is just a means; compressing delay is the goal. Huawei's theory is that when geometric scaling becomes unfeasible, other methods are adopted to continue compressing delays.
Coincidentally, He Tingbo's τ scaling paper recently released v2, expanding from 16 pages to 23 pages. I compared the two versions; the data and conclusions remain unchanged, and the added content primarily responds to several industry criticisms of v1. There are three main points worth discussing.
The most important addition is the inclusion of test evidence to support the previously declarative "41% energy efficiency improvement." In v1, this figure lacked a baseline and test conditions, making it the most easily questioned point. V2 includes a complete comparison table. The baseline is the Kirin 9030 Pro from 2025. Both chips use the same mature process node, with the key difference being that the baseline uses a traditional planar design, while the Kirin 2026 folds critical paths onto two stacked wafers. Folding shortens interconnects and reduces interconnect delay. The additional timing margin on critical paths directly translates to an increase in maximum clock frequency, reaching 3.1GHz at 1.1V supply—13% higher than the baseline. The "41% energy efficiency improvement" comes from a specifically set operating point: voltage reduced to 0.9V, frequency reduced to 2.5GHz, for an iso-performance comparison with the baseline. Measured actual power consumption at 25°C is 0.59 times that of the baseline. This also holds theoretically: dynamic power is approximately proportional to the square of voltage. A supply voltage reduction of 18% alone contributes about one-third of the reduction from the square term. Combined with a 9% frequency reduction and the reduction in interconnect capacitance from folding, it falls right around 0.59. Therefore, the precise meaning of the 41% energy efficiency improvement is a reduction in power consumption under equal performance. Essentially, the timing margin gained from folding is traded for power reduction, and the energy efficiency improvement comes from logic folding. Additionally, v2 includes data showing that after dual-layer stacking, power density is actually 5.6% lower than the baseline.
The second new addition addresses the most common question from peers: 3D stacking has existed for a while—AMD's 3D V-Cache and Intel's Foveros are already in mass production—so what is new about your LogicFolding? To understand the paper's answer, one must first know how signals are transmitted between two chips: through bonding points between layers, which function like elevators connecting the upper and lower layers. In previous mass-produced 3D stacking, the pitch of bonding points ranged from 9 micrometers to several tens of micrometers, allowing for more than 10,000 connections per square millimeter—enough for a cache to connect to a bus. So the traditional design approach was to move an entire functional block to the upper layer, like AMD stacking an entire cache on top of a processor, with each layer designed separately and connected via interfaces. However, inside a chip, one square millimeter contains hundreds of millions of transistors. To place adjacent logic gates on different layers, the connection density needs to be much higher. The Kirin 2026 achieves a bonding point pitch of 1.5 micrometers, enabling 440k connections per square millimeter, which is close to the density of the top metal layer inside a chip. The overhead of routing a signal across layers is now nearly the same as routing it within a chip's metal layers. At this level, the two silicon wafers effectively merge into a single circuit, and EDA tools can decide at the logic gate granularity which layer each gate goes on. Algorithms perform global optimization, giving a design freedom that is orders of magnitude greater than before. The paper also explains why they did not take a more aggressive route—directly fabricating another layer of devices on top of an existing one. That route offers the densest inter-layer connections, but fabricating the second layer requires high temperatures, which damages the already fabricated first layer, making it unfeasible for mass production at present.
The third addition is thermal management. Vertical stacking significantly increases heat density per unit area, and the heat dissipation path of the lower die is partially blocked by the upper die. This is the first inevitable question about 3D stacking, which v1 did not discuss in depth. V2 explicitly acknowledges that thermal management remains a key challenge for the LogicFolding architecture. The countermeasure is thermally aware partitioning and floorplanning: high-power circuits are excluded from folding during the design phase, and structurally, high-power modules are prevented from being vertically adjacent to avoid hotspot overlap. The paper does not specify whether this strategy is manually applied by engineers or has been embedded into the automatic flow of internal EDA tools, but it explicitly lists multiphysics tool chains as the most important investment for the next decade. Combined with actual measurement data showing a power density 5.6% lower than the baseline under the iso-performance operating point, the thermal issue receives a direct response. However, this approach is essentially evasive. As the number of stacked layers increases to three or four, the selection space for foldable circuits will be continuously compressed by thermal constraints—a boundary the paper does not discuss.
Additionally, v2 includes a microscopic cross-sectional image of the bonding interface between two silicon wafers and explicitly states that wafer-on-wafer hybrid bonding is used. This specification is worth comparing with the industry: a 1.5μm pitch wafer-to-wafer hybrid bonding has no precedent in mass-produced logic chips. TSMC's current SoIC mass-production pitch is 6μm, and Intel's Foveros Direct is 9μm—quite impressive.
After comparing the two versions of the paper, I still have two questions. One is about equipment: who supplies the bonding equipment for this specification? The paper only states that it is the result of years of process development across a multi-supplier ecosystem. The other is about EDA tools: to treat two silicon wafers as a single chip for design, existing commercial EDA tools cannot handle this task. The paper acknowledges this, only stating that the methodological details will be published "within a few months." However, in the frequency table, the 2027 generation Kirin at 3.39GHz is already labeled as a physical chip, indicating that this tool chain has been running smoothly inside Huawei and has gone through at least two product generations. My personal guess is that this EDA tool is self-developed by Huawei. I welcome friends with knowledge of this situation to share their thoughts.