Coincidentally, He Tingbo's τ scaling paper recently released v2, expanding from 16 pages to 23 pages. I compared the two versions; the data and conclusions remain unchanged, and the added content primarily responds to several industry criticisms of v1. There are three main points worth discussing.

The most important addition is the inclusion of test evidence to support the previously declarative "41% energy efficiency improvement." In v1, this figure lacked a baseline and test conditions, making it the most easily questioned point. V2 includes a complete comparison table. The baseline is the Kirin 9030 Pro from 2025. Both chips use the same mature process node, with the key difference being that the baseline uses a traditional planar design, while the Kirin 2026 folds critical paths onto two stacked wafers. Folding shortens interconnects and reduces interconnect delay. The additional timing margin on critical paths directly translates to an increase in maximum clock frequency, reaching 3.1GHz at 1.1V supply—13% higher than the baseline. The "41% energy efficiency improvement" comes from a specifically set operating point: voltage reduced to 0.9V, frequency reduced to 2.5GHz, for an iso-performance comparison with the baseline. Measured actual power consumption at 25°C is 0.59 times that of the baseline. This also holds theoretically: dynamic power is approximately proportional to the square of voltage. A supply voltage reduction of 18% alone contributes about one-third of the reduction from the square term. Combined with a 9% frequency reduction and the reduction in interconnect capacitance from folding, it falls right around 0.59. Therefore, the precise meaning of the 41% energy efficiency improvement is a reduction in power consumption under equal performance. Essentially, the timing margin gained from folding is traded for power reduction, and the energy efficiency improvement comes from logic folding. Additionally, v2 includes data showing that after dual-layer stacking, power density is actually 5.6% lower than the baseline.

The second new addition addresses the most common question from peers: 3D stacking has existed for a while—AMD's 3D V-Cache and Intel's Foveros are already in mass production—so what is new about your LogicFolding? To understand the paper's answer, one must first know how signals are transmitted between two chips: through bonding points between layers, which function like elevators connecting the upper and lower layers. In previous mass-produced 3D stacking, the pitch of bonding points ranged from 9 micrometers to several tens of micrometers, allowing for more than 10,000 connections per square millimeter—enough for a cache to connect to a bus. So the traditional design approach was to move an entire functional block to the upper layer, like AMD stacking an entire cache on top of a processor, with each layer designed separately and connected via interfaces. However, inside a chip, one square millimeter contains hundreds of millions of transistors. To place adjacent logic gates on different layers, the connection density needs to be much higher. The Kirin 2026 achieves a bonding point pitch of 1.5 micrometers, enabling 440k connections per square millimeter, which is close to the density of the top metal layer inside a chip. The overhead of routing a signal across layers is now nearly the same as routing it within a chip's metal layers. At this level, the two silicon wafers effectively merge into a single circuit, and EDA tools can decide at the logic gate granularity which layer each gate goes on. Algorithms perform global optimization, giving a design freedom that is orders of magnitude greater than before. The paper also explains why they did not take a more aggressive route—directly fabricating another layer of devices on top of an existing one. That route offers the densest inter-layer connections, but fabricating the second layer requires high temperatures, which damages the already fabricated first layer, making it unfeasible for mass production at present.

The third addition is thermal management. Vertical stacking significantly increases heat density per unit area, and the heat dissipation path of the lower die is partially blocked by the upper die. This is the first inevitable question about 3D stacking, which v1 did not discuss in depth. V2 explicitly acknowledges that thermal management remains a key challenge for the LogicFolding architecture. The countermeasure is thermally aware partitioning and floorplanning: high-power circuits are excluded from folding during the design phase, and structurally, high-power modules are prevented from being vertically adjacent to avoid hotspot overlap. The paper does not specify whether this strategy is manually applied by engineers or has been embedded into the automatic flow of internal EDA tools, but it explicitly lists multiphysics tool chains as the most important investment for the next decade. Combined with actual measurement data showing a power density 5.6% lower than the baseline under the iso-performance operating point, the thermal issue receives a direct response. However, this approach is essentially evasive. As the number of stacked layers increases to three or four, the selection space for foldable circuits will be continuously compressed by thermal constraints—a boundary the paper does not discuss.

Additionally, v2 includes a microscopic cross-sectional image of the bonding interface between two silicon wafers and explicitly states that wafer-on-wafer hybrid bonding is used. This specification is worth comparing with the industry: a 1.5μm pitch wafer-to-wafer hybrid bonding has no precedent in mass-produced logic chips. TSMC's current SoIC mass-production pitch is 6μm, and Intel's Foveros Direct is 9μm—quite impressive.

After comparing the two versions of the paper, I still have two questions. One is about equipment: who supplies the bonding equipment for this specification? The paper only states that it is the result of years of process development across a multi-supplier ecosystem. The other is about EDA tools: to treat two silicon wafers as a single chip for design, existing commercial EDA tools cannot handle this task. The paper acknowledges this, only stating that the methodological details will be published "within a few months." However, in the frequency table, the 2027 generation Kirin at 3.39GHz is already labeled as a physical chip, indicating that this tool chain has been running smoothly inside Huawei and has gone through at least two product generations. My personal guess is that this EDA tool is self-developed by Huawei. I welcome friends with knowledge of this situation to share their thoughts.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
gStocksTokenizedStocksLive
4.81M Popularity
#
WeakNFPShakesRateHikeOdds
1.07M Popularity
#
PredictWorldCup🇧🇷vs🇳🇴
227.1K Popularity
#
ETHBreaks1700
152.63M Popularity
#
MetaSellsComputeTriggersChipSlump
1.41M Popularity

Pinned

Sitemap

Trending Topics

gStocksTokenizedStocksLive

WeakNFPShakesRateHikeOdds

PredictWorldCup🇧🇷vs🇳🇴

ETHBreaks1700

MetaSellsComputeTriggersChipSlump

Pinned