Will DeepSeek keep on fire?

Question

**Author: Yu Yan, Pengpai News Reporter**· A headhunter responsible for mining high-end technology talents in the field of large models told PingWest Technology that the logic of DeepSeek's employment is not much different from that of other companies in the field of large models. The core label for talents is "young and high potential", i.e. born around 1998, with working experience preferably not exceeding five years, "smart, science and engineering background, young, and less experienced."In the eyes of industry insiders, DeepSeek is considered lucky compared to other large-scale modeling start-ups in China. It does not have financing pressure, does not need to prove itself to investors, and does not need to balance the technical iteration of its models with the optimization of its products. However, as a commercial company, after a huge investment, sooner or later it will face the pressure and challenges that other modeling companies currently face.Which company is the hottest in China's large-scale model field in 2024? Hangzhou DeepSeek Artificial Intelligence Research Co., Ltd. (hereinafter referred to as DeepSeek) is definitely a strong competitor. If DeepSeek, as the initiator of last year's mid-year price war in large-scale models, initially entered the public eye, it completely ignited the public opinion field of large-scale models after successively releasing the open-source model DeepSeek-V3 and the inference model DeepSeek-R1 at the end of the year and the beginning of the year. People are amazed at its cost-effectiveness in training costs (it is said that DeepSeek-V3 only cost 5.576 million US dollars in training costs), and applaud its behavior of open-sourcing models and publishing technical reports. The release of DeepSeek-R1 has excited many scientists, developers, and users, and even regarded DeepSeek as a strong competitor to OpenAI's o1 and other inference models.How can this low-key company achieve high-performance large models with extremely low training costs? What did it do right to gain popularity today? In the future, what challenges will it face if it wants to continue riding the waves and moving forward in the 'model circle'?### Algorithm innovation has significantly reduced computing power costs."DeepSeek entered early and has accumulated a lot, with its own characteristics in algorithms." An executive of a well-known large-scale model startup in China said when referring to DeepSeek that he believes the core advantage of DeepSeek's popularity lies in the innovation in algorithms. "Chinese companies, due to the lack of computing power, pay more attention to cost savings in computing power costs compared to OpenAI."According to the information released by DeepSeek, DeepSeek-R1 used reinforcement learning technology on a large scale in the post-training phase, greatly improving the model's reasoning ability with very few annotated data, and performing as well as the official version of OpenAI o1 in tasks such as mathematics, code, and natural language reasoning.![Can DeepSeek continue to be popular?](https://img.gateio.im/social/moments-83c6f32c3efc3ac478a4fadb8f222ba8)```DeepSeek-R1 API Price```DeepSeek founder Liang Wenfeng has repeatedly emphasized that DeepSeek is committed to exploring a differentiated technical path, rather than replicating OpenAI's model. DeepSeek must come up with more effective methods to train its models.“They have used a series of engineering techniques to optimize the model architecture, such as innovatively using model blending methods, and the essential purpose is to reduce costs through engineering so that it can be profitable,” a senior professional who has been working in the technology industry for many years told PingWest Technology.According to the information disclosed by DeepSeek, it has made significant progress in the MLA (Multi-head Latent Attention) multi-head latent attention mechanism and the self-developed DeepSeekMOE(Mixture-of-Experts (Mixture of Experts) structure, and these two technical designs reduce training computing resources, make the DeepSeek model more cost-effective, and also improve training efficiency. According to the data from the research institution Epoch AI, DeepSeek's latest model is very efficient.In terms of data, unlike OpenAI's "massive data feeding" approach, DeepSeek uses algorithms to summarize and classify data. After selective processing, it is delivered to the large model, improving training efficiency and reducing DeepSeek's costs. The emergence of DeepSeek-V3 achieves a balance between high performance and low cost, providing new possibilities for the development of large models.“Perhaps in the future, there won't be a need for super large-scale GPU clusters anymore,” said Andrej Karpathy, founding member of OpenAI, after the release of DeepSeek's high-performance and cost-effective model.Liu Zhiyuan, a tenured associate professor in the Department of Computer Science at Tsinghua University, told the Pengpai Technology that the success of DeepSeek precisely demonstrates our competitive advantage - achieving more with less through the ultimate and efficient use of limited resources. The release of R1 indicates that the gap between our AI capabilities and those of the United States has significantly narrowed. The Economist also stated in its latest issue that DeepSeek is changing the technology industry with its low-cost training and innovative model design.The current CEO and co-founder of Google DeepMind, Demis Hassabis, said that while it is not yet fully clear the specific extent to which DeepSeek relies on Western systems for training data and open source models, it must be acknowledged that the team's achievements are indeed impressive. On the one hand, he acknowledges China's very strong engineering and scaling capabilities, while on the other hand, he also points out that the West still leads and needs to consider how to maintain the leading position of Western cutting-edge models.### Years of focused accumulation, breakthrough with thick and thin hair.The reason why DeepSeek is able to achieve these innovations is not an overnight success, but the result of many years of "incubation" and long-term planning. Liang Wenfeng is also the founder of the top quantitative hedge fund, Magic Cube Quantitative. Deepseek is considered to make full use of the funds, data, and cards accumulated by Magic Cube Quantitative.Liang Wenfeng graduated from Zhejiang University with a bachelor's and master's degree in the Department of Information and Electronic Engineering. Since 2008, he has been leading a team to explore fully automated quantitative trading using machine learning and other technologies. In 2015, Huansquare Quantitative was established and the first AI model was launched the following year, with the first trade position generated by deep learning going live. In 2018, AI was established as the main development direction. In 2020, Huansquare invested over 100 million yuan and officially launched the AI supercomputer 'Firefly-1', which covers an area equivalent to a basketball court and claims to have the computing power equivalent to 40,000 personal computers. In 2021, Huansquare invested one billion yuan to build 'Firefly-2', equipped with 10,000 A100 GPU chips. At that time, there were no more than 5 companies in China with over 10,000 GPUs, and besides Huansquare Quantitative, the other 4 companies were all internet giants.In July 2023, DeepSeek was officially established and entered the field of general artificial intelligence. It has never raised external financing to date."There are relatively abundant cards, no financing pressure, only model but not product in the past few years, which makes DeepSeek appear more simple and focused compared to other large model companies in China, and can make breakthroughs in engineering technology and algorithms." The above-mentioned senior executive of a large domestic model company said.In addition, as the big model industry is gradually moving towards closure, when OpenAI is jokingly called CloseAI, the behavior of DeepSeek's model open source and public technical reports has also won the praise of developers, making its technology brand quickly stand out in the domestic and international big model market.Some researchers told PingWest Technology that the openness of DeepSeek is very impressive, and the open sourcing of models V3 and R1 has raised the benchmark level of open source models in the market.### Successfully proved the power of young people"The success of DeekSeek also shows the power of young people, and in essence, the development of this generation of artificial intelligence needs young minds." A person from a model company said to The Paper.Prior to this, Jack Clark, former policy director of OpenAI and co-founder of Anthropic, believed that DeepSeek had hired a "group of profound and unpredictable talents." In response, Liang Wenfeng, in an interview with the media, once said that there were no profound and unpredictable talents, all of them were graduates from top domestic universities, ungraduated postgraduates and interns, and some young people who had only graduated for a few years.From the media reports currently available, it can be seen that the biggest characteristic of the DeepSeek team is that they are from prestigious universities and are young. Even the team leaders are mostly under the age of 35. With a team of less than 140 people, engineers and researchers mostly come from top domestic universities such as Tsinghua University, Peking University, Sun Yat-sen University, Beijing University of Posts and Telecommunications, etc., and they have not been working for long.A headhunter responsible for mining high-end technology talents in the field of large models told PingWest Technology that the employment logic of DeepSeek is not much different from that of other companies in the field of large models. The core labels for talents are 'young and high potential', that is, born around 1998, with no more than five years of work experience, 'smart, STEM background, young, and less experience.'However, the aforementioned headhunters also stated that large-scale model start-up companies are essentially start-up companies, and they do not necessarily not want to recruit top AI talents from overseas. The reality is that there are not many top AI talents from overseas who are willing to come back.According to a DeepSeek employee who declined to be named, the company has a flat management structure and a good atmosphere for free communication. Liang Wenfeng's whereabouts are often uncertain, and most of the time, everyone communicates with him online.The employee had previously worked on large-scale model technology research and development at a major domestic factory, but felt more like a screw in the big factory and unable to create value, ultimately choosing to join DeepSeek. In his view, DeepSeek is currently more focused on underlying model technology.DeepSeek's working atmosphere is completely bottom-up, naturally divided labor, and there is no limit to the adjustment of cards and people. 'Bring your own ideas, no need to Push. In the exploration process, when he encounters problems, he will pull people to discuss.' Liang Wenfeng said in a previous interview.### "It is premature to think that Chinese AI has surpassed the United States"Business Insider, a US business media, analyzes that the newly released R1 indicates that China can be comparable to some of the top AI models in the industry and keep pace with the cutting-edge development of Silicon Valley in the United States. Secondly, such advanced open-source AI may also pose a challenge to companies that try to make huge profits by selling technology.However, it may be premature to exclaim that 'Chinese AI has surpassed the United States' at the moment. Liu Zhiyuan publicly stated the need to be cautious about the shift in public opinion from extreme pessimism to extreme optimism, feeling that we have already comprehensively surpassed and taken a commanding lead, which is 'far from the truth'. Liu Zhiyuan believes that the current AGI new technology is still evolving rapidly, and the future development path is not clear. China is still in the stage of catching up, although it is no longer out of reach, but it can only be said to be relatively promising. 'It is relatively easy to follow others' fast pace on the road already explored, but the bigger challenge lies in how to pioneer new paths in the mist ahead.'"Now it's too chaotic, everyone is too anxious, and they didn't realize that DeepSeek finally came out." People close to DeepSeek lamented to PingWest Technology that the pace of industry changes is too fast, and it is impossible to predict what can be done next, only to see the changes in the next Q3 quarter."Demis Hassabis acknowledges on the one hand that China has very strong engineering and scaling capabilities, and on the other hand, he also points out that the West is still ahead and needs to consider how to maintain the leading position of the Western cutting-edge models.Although Liang Wenfeng previously stated that DeepSeek only does models and not products, as a commercial company, it is almost impossible to only do models and not products. On January 15th, the official DeepSeek App was officially released. Insiders close to DeepSeek told Pengpai Technology that commercialization has been put on the agenda by DeepSeek.In the eyes of industry insiders, compared with other large-scale model startups in China, DeepSeek is fortunate in that it has no financing pressure, does not need to prove to investors, and does not need to take into account the technical iteration of the model and the optimization of product application. However, as a commercial company, after a huge investment, sooner or later it will have to face the pressures and challenges faced by other model companies. "This time out of the circle has made a successful marketing for DeepSeek on the eve of commercialization, but after the real commercialization in the future, it needs to be tested by the market, and it is still difficult to determine whether it can continue to break the waves." The above-mentioned model company said.It can be confirmed that DeepSeek will face more pressure and challenges in the future. The competition towards universal models has just begun, and who can win depends on the continuous investment of funds and technological iterations. However, industry insiders also believe that "for the domestic model industry, it is a good thing to have companies with genuine technical strength like DeepSeek join."