New Version, Worth Being Seen! #GateAPPRefreshExperience
🎁 Gate APP has been updated to the latest version v8.0.5. Share your authentic experience on Gate Square for a chance to win Gate-exclusive Christmas gift boxes and position experience vouchers.
How to Participate:
1. Download and update the Gate APP to version v8.0.5
2. Publish a post on Gate Square and include the hashtag: #GateAPPRefreshExperience
3. Share your real experience with the new version, such as:
Key new features and optimizations
App smoothness and UI/UX changes
Improvements in trading or market data experience
Your fa
Zhong DeepSeek unveils 'mHC', the next-generation AI architecture surpassing residual connections
Source: TokenPost Original Title: DeepSeek in China Unveils Next-Generation AI Architecture ‘mHC’ Surpassing Residual Connections Original Link: https://www.tokenpost.kr/news/ai/320188 Chinese AI research institution DeepSeek(DeepSeek) has announced a new architecture that significantly enhances next-generation AI training performance. Named ‘mHC(Manifold-Constrained Hyper-Connections)’, this technology surpasses the essential ‘residual connection(residual connection)’ method used in large language models(LLM) and visual recognition models, and is praised for improving not only training accuracy but also hardware efficiency.
mHC is an improved version of the existing ‘Hyper-Connections(Hyper-Connections)’ technology. Hyper-connections have been recognized for helping to transmit information more efficiently between layers(layer) of deep learning models, but in practical deployment, various technical constraints limited their widespread use. DeepSeek overcame this limitation by integrating the concept of ‘manifold(manifold)’. Mathematically, a manifold is a space with multiple layers, ranging from simple circular forms to complex structures exceeding three dimensions. DeepSeek explains that mHC utilizes this manifold-based structure to ensure the stability and consistency of gradients(error backpropagation signals) generated during model training, playing a key role.
To validate the performance of this architecture, DeepSeek trained three types of LLMs with 3 billion, 9 billion, and 27 billion parameters using the mHC structure, and compared them with models of the same specifications based on hyper-connections. The company claims that models with the mHC structure consistently outperformed in 8 benchmark tests. Notably, the models required less memory, enabling more efficient training, and the hardware overhead during training was reported to be only about 6.27%.
DeepSeek researchers emphasized, “By gaining a deeper understanding of the relationship between the manifold-based topological structure and optimization algorithms, mHC can overcome current AI model limitations and open new pathways for next-generation infrastructure design.”
This announcement is noteworthy as it comes amid a global reevaluation of AI training architectures. The residual connection method, introduced in deep learning research in 2015, has been widely used in LLMs and image classification models. This structure involves propagating error signals generated at the final output layer back through the network, helping to compensate for information distortion that occurs during this process.
However, as AI models grow larger and more complex, the limitations of residual connections have become apparent, prompting various efforts to improve them. DeepSeek’s mHC is a recent technology developed in this context, and analysts believe it can directly contribute to improving the training efficiency of foundational models across the AI industry.