Futures
Access hundreds of perpetual contracts
CFD
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
GateRouter
Smartly choose from 40+ AI models, with 0% extra fees
DeepSeek new paper: How manifold-constrained hyperconnected architecture addresses the training challenges of deep networks
【ChainWen】DeepSeek’s recently published new paper has attracted attention in the tech community. They proposed a new architecture called Manifold-Constrained Hypergraph (mHC), with a core purpose that is quite straightforward—addressing two pain points of existing Hypergraph (HC) techniques: unstable training and limited scalability.
The root of the problem lies in the fact that HC techniques disrupt the properties of identity mappings. DeepSeek’s solution is to map the residual connection space of HC onto a specific manifold, thereby restoring the properties of identity mappings. It may sound a bit abstract, but essentially, it involves smarter mathematical mappings to make deep network training more stable and scalable.
The paper also incorporates infrastructure optimization to ensure practical efficiency. Experimental results show significant performance improvements and excellent scalability. This means that when using deeper network structures, the training process becomes more controllable.
DeepSeek believes that mHC is a flexible and practical extension of HC technology. This work not only helps the industry gain a deeper understanding of topological architecture design but also points to a very promising direction for the evolution of large models. This paper was completed through collaboration among Zhendao Jie, Yixuan Wei, Huanqi Cao, and Wenfeng Liang.
In the long term, breakthroughs in such foundational architectures will have a profound impact on the stability and scalability of large models.
All these mathematical black magic tricks again? Basically, it's just to prevent network training from dropping the ball.
I really don't understand the manifold constraints, but as long as the experimental data looks good, that's enough.
Deep networks can finally be trained stably? This time, another group of people will be pushed out.
But on the other hand, if scalability can really be solved, the application deployment will be much faster later on.
If this paper is really reliable, it shows that there are still many pitfalls to fill in the foundational layer of AI.
Wait, how efficient is this in actual running? Don't tell me it's just shiny on paper again.
The new architecture sounds awesome, but whether it actually works needs market validation. Anyway, I’ll just watch and laugh.
This logic is just like my crypto trading—perfect theory, but reality crashes hard, haha.
DeepSeek is paving the way for large model training; once the deep network is stable, the chances of releasing monster-level models increase.
Honestly, if this basic research is done well, the benefits will mainly go to big corporations. We retail investors can only eat the leftovers.
If it can truly stabilize deep training, then we should carefully examine the experimental data. Don't let the papers look good but perform poorly in practice again.
Restoring the identity mapping property... we can wait for feedback from the production environment before praising it.
Deep learning papers are becoming more and more competitive. If there is a real breakthrough in scalability, it will indeed be good news for the training costs of large models.
I need to take a close look at this mathematical mapping approach. It feels like I’ll need to connect theory with practice for a while.
On-chain data hasn't shown any movement yet. We retail investors should keep observing for now to avoid becoming bagholders. But to be fair, DeepSeek is indeed at the eye of the storm; early adopters who went all-in might be laughing.
Manifold constraints sound very advanced, but how far is this architectural innovation from real-world application? Are any major institutions already doing arbitrage in this area?
Honestly, pure technical breakthroughs are often overhyped. I'm actually betting on market reaction, not just the paper itself. Once miner fees catch up, it will be time for me to run.
When will the latest scalability data be released? Is there a detailed comparison with benchmark solutions? That’s what I truly care about.
---
DeepSeek has come up with a new approach, seems like they're patching the old HC technology.
---
All they've been talking about is making training more stable. How much faster it can actually run is still uncertain.
---
I didn't quite understand the part about the identity mapping. Feels like the authors just make simple things complicated.
---
Superior scalability? How many percentage points faster than existing solutions? Is there a benchmark?
---
Another "revolutionary" architecture. Let's wait and see if it can be used in real-world scenarios.
---
The term "manifold constraints" sounds very fancy. I wonder what the actual running costs are.
---
Algorithm optimization is always about: "Theoretically great, but in practice, it depends on the GPU."
---
It looks like they've put effort into it, but it feels like the paper is full of fluff. Where are the details?
---
The deep network training stability issue has been solved. What about GPU memory usage? Such solutions usually have issues, right?
Manifold constraints? Basically, it's to prevent network training from crashing. Anyway, I didn't quite understand it, haha.
Deeper networks are more stable. Does this help with mining optimization?
Mathematical mapping, mapping, mapping—can it directly improve gas fee calculation efficiency?
DeepSeek is also working on model architecture again. This pace is really hard to keep up with.
I just want to know if it can finally run without crashing; everything else is just talk.