🎉 Gate Square Growth Points Summer Lucky Draw Round 1️⃣ 2️⃣ Is Live!
🎁 Prize pool over $10,000! Win Huawei Mate Tri-fold Phone, F1 Red Bull Racing Car Model, exclusive Gate merch, popular tokens & more!
Try your luck now 👉 https://www.gate.com/activities/pointprize?now_period=12
How to earn Growth Points fast?
1️⃣ Go to [Square], tap the icon next to your avatar to enter [Community Center]
2️⃣ Complete daily tasks like posting, commenting, liking, and chatting to earn points
100% chance to win — prizes guaranteed! Come and draw now!
Event ends: August 9, 16:00 UTC
More details: https://www
The context length of large models has increased 100 times, and long text technology has become the new standard.
Long text technology has become a new standard for large models; what does expanding the context length by 100 times mean?
The context length of large models is skyrocketing at an astonishing rate, rising from 4,000 tokens to 400,000 tokens. Long text capability seems to have become the new "standard configuration" for large model vendors.
Abroad, OpenAI has upgraded GPT-4's context length to 32,000 tokens through multiple updates. Anthropic has also expanded the context length of its model Claude to 100,000 tokens. LongLLaMA has pushed this number to 256,000 tokens.
Domestic players are also not to be outdone. It is reported that the startup Moon's Dark Side has launched Kimi Chat, which supports input of approximately 400,000 tokens. The LongLoRA technology developed jointly by CUHK and MIT can extend the text length of a 70B model to 32,000 tokens.
Currently, many top large model companies, including OpenAI, Anthropic, Meta, and Moonlight Dark Side, are focusing on expanding context length as a key upgrade. These companies are all darlings of the capital market. For example, OpenAI has received nearly $12 billion in investment; Anthropic is valued at around $30 billion; and Moonlight Dark Side completed two rounds of financing totaling nearly 2 billion yuan within six months of its establishment.
What does it actually mean for large model companies to place such importance on long text technology, with the context length expanded by 100 times?
On the surface, this means that the amount of text the model can handle has significantly increased. The Kimi Chat with 400,000 tokens can already read an entire long novel. But the deeper significance is that long text technology is driving the application of large models in professional fields such as finance, justice, and scientific research.
However, a longer text length is not necessarily better. Research indicates that the improvement in performance from supporting longer contextual inputs is not simply equivalent. The key lies in how effectively the model utilizes the contextual content.
Currently, the industry's exploration of text length has yet to reach the "critical point". 400,000 tokens may just be the beginning.
The founder of the Dark Side of the Moon, Yang Zhilin, stated that long text technology can address some early issues of large models and enhance certain functionalities, while also being a key technology for advancing the application of the industry. This marks the transition of large model development from LLM to Long LLM.
Breakthroughs in long text technology have brought a series of new features, such as key information extraction and summary analysis for ultra-long texts, complex code generation, and personalized role-playing dialogues. These features are driving chatbots towards specialization, personalization, and depth.
However, long text technologies also face the "impossible triangle" dilemma: it is difficult to balance text length, attention, and computational power. The main challenge comes from the self-attention mechanism in the Transformer architecture, whose computational load grows quadratically with the length of the context.
Currently, there are mainly three solutions: using external tools to assist processing, optimizing the self-attention mechanism computation, and optimizing the model itself. Each solution has its own advantages and disadvantages, and the key lies in finding the best balance between text length, attention, and computing power.
Although long-text technology still faces many challenges, it is undoubtedly an important step in the industrialization of large models. In the future, with continuous breakthroughs in technology, we can expect to see more innovative applications based on long-text technology.