The context length of large models has increased 100 times, and long text technology has become the new standard.

robot
Abstract generation in progress

Long text technology has become a new standard for large models; what does expanding the context length by 100 times mean?

The context length of large models is skyrocketing at an astonishing rate, rising from 4,000 tokens to 400,000 tokens. Long text capability seems to have become the new "standard configuration" for large model vendors.

Abroad, OpenAI has upgraded GPT-4's context length to 32,000 tokens through multiple updates. Anthropic has also expanded the context length of its model Claude to 100,000 tokens. LongLLaMA has pushed this number to 256,000 tokens.

Domestic players are also not to be outdone. It is reported that the startup Moon's Dark Side has launched Kimi Chat, which supports input of approximately 400,000 tokens. The LongLoRA technology developed jointly by CUHK and MIT can extend the text length of a 70B model to 32,000 tokens.

Currently, many top large model companies, including OpenAI, Anthropic, Meta, and Moonlight Dark Side, are focusing on expanding context length as a key upgrade. These companies are all darlings of the capital market. For example, OpenAI has received nearly $12 billion in investment; Anthropic is valued at around $30 billion; and Moonlight Dark Side completed two rounds of financing totaling nearly 2 billion yuan within six months of its establishment.

What does it actually mean for large model companies to place such importance on long text technology, with the context length expanded by 100 times?

On the surface, this means that the amount of text the model can handle has significantly increased. The Kimi Chat with 400,000 tokens can already read an entire long novel. But the deeper significance is that long text technology is driving the application of large models in professional fields such as finance, justice, and scientific research.

However, a longer text length is not necessarily better. Research indicates that the improvement in performance from supporting longer contextual inputs is not simply equivalent. The key lies in how effectively the model utilizes the contextual content.

Currently, the industry's exploration of text length has yet to reach the "critical point". 400,000 tokens may just be the beginning.

The founder of the Dark Side of the Moon, Yang Zhilin, stated that long text technology can address some early issues of large models and enhance certain functionalities, while also being a key technology for advancing the application of the industry. This marks the transition of large model development from LLM to Long LLM.

Breakthroughs in long text technology have brought a series of new features, such as key information extraction and summary analysis for ultra-long texts, complex code generation, and personalized role-playing dialogues. These features are driving chatbots towards specialization, personalization, and depth.

However, long text technologies also face the "impossible triangle" dilemma: it is difficult to balance text length, attention, and computational power. The main challenge comes from the self-attention mechanism in the Transformer architecture, whose computational load grows quadratically with the length of the context.

Currently, there are mainly three solutions: using external tools to assist processing, optimizing the self-attention mechanism computation, and optimizing the model itself. Each solution has its own advantages and disadvantages, and the key lies in finding the best balance between text length, attention, and computing power.

Although long-text technology still faces many challenges, it is undoubtedly an important step in the industrialization of large models. In the future, with continuous breakthroughs in technology, we can expect to see more innovative applications based on long-text technology.

TOKEN-4.03%
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 7
  • Share
Comment
0/400
DeadTrades_Walkingvip
· 07-17 08:36
The resource consumption has increased, right?
View OriginalReply0
BrokeBeansvip
· 07-15 01:21
So long, can't remember it.
View OriginalReply0
TommyTeacher1vip
· 07-15 01:20
1. This speed can't keep up, my legs are exhausted.
View OriginalReply0
CryptoPunstervip
· 07-15 01:11
The length of the text for large models has skyrocketed, but the IQ still hasn't risen.
View OriginalReply0
PumpStrategistvip
· 07-15 00:57
The typical hype is too big, the KPIs must have come out by now.
View OriginalReply0
LiquidityOraclevip
· 07-15 00:55
What skill is there in playing for so long?
View OriginalReply0
AllInDaddyvip
· 07-15 00:53
This wave still needs to be fished.
View OriginalReply0
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)