🚀 Gate Fun Chinese Meme Fever Keeps Rising!
Create, launch, and trade your own Meme tokens to share a 3,000 GT!
Post your Meme on Gate Square for a chance to win $600 in sharing rewards!
A total prize pool of $3,600 awaits all creative Meme masters 💥
🚀 Launch now: https://web3.gate.com/gatefun?tab=explore
🏆 Square Sharing Prizes:
1️⃣ Top Creator by Market Cap (1): $200 Futures Voucher + Gate X RedBull Backpack + Honor Poster
2️⃣ Most Popular Creator (1): $200 Futures Voucher + Gate X RedBull Backpack + Honor Poster
3️⃣ Lucky Participants (10): $20 Futures Voucher (for high-quality posts)
O
Tsinghua Department ChatGLM3 live face demonstration! Multimodality is close to GPT-4V, and the domestic Code Interpreter is coming
Original source: New Zhiyuan
The self-developed third-generation pedestal model ChatGLM3 is launched today!
This is another optimization of the ChatGLM base model by the Zhipu AI team since the launch of the second-generation model in June.
In addition, at the 2023 China Computer Conference (CNCC) on October 27, Zhipu AI also open-sourced ChatGLM3-6B (32k), multimodal CogVLM-17B, and agent AgentLM.
After the release of the ChatGLM3 series of models, Zhipu became the only company in China that has benchmarked OpenAI's full model product line.
The model is fully self-developed, adapting to domestic chips, with stronger performance and a more open source ecosystem.
As the first company to enter the large-scale model research, Zhipu AI is the first to submit the paper!
Moreover, Zhipu AI has completed a total of more than 2.5 billion yuan in financing this year, Meituan, Ant, Alibaba, Tencent... The luxurious list of investors all shows the industry's strong confidence in Zhipu AI.
Aiming at GPT-4V's technical upgrade
At present, the multimodal vision model GPT-4V has shown strong image recognition capabilities.
At the same time, aiming at GPT-4V, Zhipu AI has also iteratively upgraded other capabilities of ChatGLM3 this time. Among them, the multimodal comprehension model CogVLM can try to understand and refresh 10+ international standard graphic and text evaluation datasets SOTA. Currently, CogVLM-17B is open sourced.
Code Interpreter can generate and execute code according to user needs, automatically completing complex tasks such as data analysis and file processing.
Web search enhances WebGLM, which can automatically find relevant information on the Internet according to the question, and provide links to reference related literature or articles when answering.
In addition, the semantic and logical capabilities of ChatGLM3 have also been greatly enhanced.
Version 6B Direct Open Source
It is worth mentioning that once ChatGLM3 was released, Zhipu AI directly open-sourced the 6B parameter model to the community.
The evaluation results show that compared with ChatGLM 2 and compared with domestic models of the same size, ChatGLM3-6B ranked first in 9 of the 44 Chinese and English public dataset tests.
MMLU increased by 36%, C by 33%, GSM8K by 179%, and BBH by 126%.
Its open-source 32k version, ChatGLM3-6B-32K, performs best in LongBench.
In addition, it is the latest "efficient dynamic inference + video memory optimization technology" that makes the current inference framework more efficient under the same hardware and model conditions.
Compared with the current best open source implementation, compared with the vLLM launched by the University of Berkeley and the latest version of Hugging Face TGI, the inference speed is increased by 2-3 times, and the inference cost is reduced by 1 time, with only 0.5 points per thousand tokens, which is the lowest cost.
Self-developed AgentTuning, agent ability activation
What's even more surprising is that ChatGLM3 also brings a new agent ability.
Zhipu AI hopes that large models can better communicate with external tools through APIs, and even realize large model interaction through agents.
By integrating the self-developed AgentTuning technology, the intelligent agent capability of the model can be activated, especially in terms of intelligent planning and execution, which is 1000% higher than that of ChatGLM 2.
On the latest AgentBench, ChatGLM3-turbo is close to GPT-3.5.
At the same time, AgentLM is also open to the open source community. What the Zhipu AI team hopes is to make the open-source model reach or even exceed the agent capability of the closed-source model.
This means that the agent will enable the native support of domestic large models for complex scenarios such as "tool calling, code execution, games, database operations, knowledge graph search and inference, and operating systems".
1.5B/3B released at the same time, the mobile phone can run
Want to run ChatGLM on your phone? OK!
This time, ChatGLM3 also launched a terminal test model that can be deployed on mobile phones, with two parameters: 1.5B and 3B.
It can support a variety of mobile phones including Vivo, Xiaomi, Samsung, and in-vehicle platforms, and even supports the inference of CPU chips on mobile platforms, with a speed of up to 20 tokens/s.
In terms of accuracy, the performance of the 1.5B and 3B models is close to that of the ChatGLM2-6B model in the public benchmark evaluation, so go ahead and try it!
A new generation of "Zhipu Qingyan" is fully launched
Just as ChatGPT has a powerful GPT-4 model behind it, the generative AI assistant "Zhipu Qingyan" of the Zhipu AI team is also blessed by ChatGLM3.
After the live broadcast demonstration of this team, the function was directly launched, and the main thing is a sincerity!
Test address:
Code Interpreter
As one of the most popular plugins for ChatGPT, Advanced Data Analysis (formerly Code Interpreter) can analyze problems with more mathematical thinking based on natural language input, and generate appropriate code at the same time.
Now, with the support of the newly upgraded ChatGLM3, "Zhipu Qingyan" has become the first large-scale model product with Advanced Data Analysis capabilities in China, which can support image processing, mathematical computing, data analysis and other use scenarios.
The romance of science and engineering men may only be understood by "Zhipu Qingyan".
Although CEO Zhang Peng performed a live performance to draw a "red heart" overturn, but try it again, and the result came out in seconds.
With the addition of WebGLM large model capabilities, "Zhipu Qingyan" now also has the ability to search for enhanced - it can summarize the answers to questions based on the latest information on the Internet, and attach reference links.
For example, the iPhone 15 has recently ushered in a wave of price cuts, how big is the specific fluctuation?
The answer given by "Zhipu Qingyan" is not bad!
The CogVLM model improves the Chinese image and text comprehension ability of Zhipu Qingyan, and obtains the picture comprehension ability close to GPT-4V.
It can answer various types of visual questions, and can complete complex object detection, labeling, and complete automatic data annotation.
As an example, let CogVLM identify how many people are in the picture.
**GLM vs GPT: Benchmarking OpenAI's full line of products! **
From ChatGPT, a chat and conversation application, Code Interpreter, a code generation plugin, to DALL· E 3, and then to the visual multimodal model GPT-4V, OpenAI currently has a complete set of product architecture.
Looking back at China, the only company that can achieve the most comprehensive product coverage is Zhipu AI.
There is no need to say more about the introduction of the popular fried chicken ChatGPT.
At the beginning of this year, the Zhipu AI team also released ChatGLM, a 100-billion-level dialogue model.
Drawing on the design ideas of ChatGPT, the developers injected code pre-training into the 100 billion base model GLM-130B.
In fact, as early as 2022, Zhipu AI opened GLM-130B to the research community and industry, and this research was also accepted by ACL 2022 and ICLR 2023.
Both the ChatGLM-6B and ChatGLM-130B models were trained on Chinese and English corpora containing 1T tokens, using supervised fine-tuning (SFT), feedback bootstrap, and human feedback reinforcement learning (RLHF).
On March 14, Zhipu AI open-sourced ChatGLM-6B to the community, and won the first place in the third-party evaluation of Chinese natural language, Chinese dialogue, Chinese Q&A and reasoning tasks.
At the same time, hundreds of projects or applications based on ChatGLM-6B were born.
In order to further promote the development of the large model open source community, Zhipu AI released ChatGLM2 in June, and the 100 billion base dialogue model has been upgraded and open-sourced, including 6B, 12B, 32B, 66B, and 130B different sizes, improving capabilities and enriching scenarios.
It is worth mentioning that in just a few months, ChatGLM-6B and ChatGLM2-6B have been widely used.
At present, a total of 50,000+ stars have been collected on GitHub. In addition, there are 10,000,000+ downloads on Hugging Face, ranking first in the four-week trend.
Search Enhancements: WebGPT vs. WebGLM
In order to solve the problem of "illusion" of large models, the general solution is to combine the knowledge in the search engine and let the large model carry out "retrieval enhancement".
As early as 2021, OpenAI fine-tuned a model that can aggregate search results based on GPT-3 - WebGPT.
WebGPT models human search behavior, searches in web pages to find relevant answers, and gives citation sources, so that the output results can be traced.
Most importantly, it has achieved excellent results in open domain long Q&A.
Under the guidance of this idea, WebGLM, the "networked version" model of ChatGLM, was born, which is a model based on ChatGLM's 10 billion parameter fine-tuning, and the main focus is network search.
For example, when you want to know why the sky is blue. WebGLM immediately gives the answer online and includes a link to enhance the credibility of the model's response.
The LLM-based retriever is divided into two stages, one is coarse-grained network retrieval (search, acquisition, extraction), and the other is fine-grained distillation retrieval.
In the whole process of the retriever, the time is mainly consumed in the process of fetching the web page, so WebGLM uses parallel asynchronous technology to improve efficiency.
The bootstrap generator is the core and is responsible for generating high-quality answers to questions from the reference pages obtained from the retriever.
It uses the contextual inference capabilities of large models to generate high-quality QA datasets, and designs correction and selection strategies to filter out high-quality subsets for training.
Experimental results show that WebGLM can provide more accurate results and complete Q&A tasks efficiently. Even, it can approach WebGPT with 175 billion parameters with a performance of 10 billion parameters.
Image and text understanding: GPT-4V vs. CogVLM
In September of this year, OpenAI officially lifted the ban on GPT-4's amazing multimodal capabilities.
GPT-4V, which is supported by this, has a strong ability to understand images and is able to process arbitrarily mixed multimodal inputs.
For example, it can't tell that the dish in the picture is mapo tofu, and it can even give the ingredients for making it.
Different from common shallow fusion methods, CogVLM incorporates a trainable vision expert module into the attention mechanism and feedforward neural network layer.
This design achieves a deep alignment between image and text features, effectively compensating for the differences between the pre-trained language model and the image encoder.
At present, CogVLM-17B is the model with the first comprehensive score on the multimodal authoritative academic list, and has achieved SOTA or second place results on 14 datasets.
It achieves the best (SOTA) performance across 10 authoritative cross-modal benchmarks, including NoCaps, Flicker30k captioning, RefCOCO, RefCOCO+, RefCOCOg, Visual7W, GQA, ScienceQA, VizWiz-VQA, and TDIUC.
Previous multimodal models usually align image features directly to the input space of text features, and the encoder of image features is usually small, in this case, the image can be regarded as a "vassal" of the text, and the effect is naturally limited.
CogVLM, on the other hand, prioritizes visual understanding in the multimodal model, using a 5B-parameter vision encoder and a 6B-parameter vision expert module to model image features with a total of 11B parameters, which is even more than the 7B parameter amount of text.
In some tests, CogVLM even outperformed GPT-4V.
CogVLM can accurately identify these 4 houses, while GPT-4V can only identify 3.
In this question, pictures with text are tested.
OpenAI's most powerful Wensheng graph model is DALL· E 3 too.
The overall idea of CogView is to perform autoregressive training by splicing text features and image token features. Finally, only the text token feature is inputted, and the model can continuously generate image tokens.
Specifically, the text "The avatar of a cute kitten" is first converted into a token, and the SentencePiece model is used here.
Then an image of a cat is fed in, and the image part is converted into a token through a discrete automatic decoder.
Then, the text and image token features are stitched together, and then input into the GPT model of the Transformer architecture to learn to generate images.
Comparison of DALL· E and common GAN schemes, the results of CogView have been greatly improved.
In 2022, the researchers upgraded the Wensheng graph model CogView2 again, and the effect was directly compared to DALL· E2。
Compared with CogView, the architecture of CogView2 adopts hierarchical transfomer and parallel autoregressive mode for image generation.
In the paper, the researchers pre-trained a 6 billion parameter Transformer model, the Cross-Modal General Language Model (CogLM), and fine-tuned it to achieve fast super-resolution.
In November of the same year, the team built a text-to-video generation model, CogVideo, based on the CogView2 model.
The model architecture is divided into two modules: the first part is based on CogView2 and generates several frames of images from text. The second part is to interpolate the image based on the two-way attention model to generate a complete video with a higher frame rate.
Code: Codex vs. CodeGeeX
In the field of code generation, OpenAI released a new and upgraded Codex as early as August 2021, and is proficient in more than 10 programming languages including Python, Java, Go, Perl, PHP, Ruby, Swift, Type, and even Shell.
Users can simply give a simple prompt and have Codex write code automatically in natural language.
Codex is trained on GPT-3, and the data contains billions of lines of source code. In addition, Codex can support contextual information that is more than 3 times longer than GPT-3.
In July 2023, Zhipu released a stronger, faster, and lighter CodeGeeX2-6B, which can support more than 100 languages, and the weight is completely open to academic research.
CodeGeeX2 is based on the new ChatGLM2 architecture and is optimized for a variety of programming-related tasks, such as code auto-completion, code generation, code translation, cross-file code completion, and more.
Thanks to the upgrade of ChatGLM2, CodeGeeX2 can not only better support Chinese and English input, as well as a maximum sequence length of 8192, but also greatly improve various performance indicators - Python +57%, C++ +71%, Java +54%, Java +83%, Go +56%, Rust +321%.
In the Human review, CodeGeeX2 comprehensively surpassed the 15 billion parameter StarCoder model, as well as OpenAI's Code-Cushman-001 model (the model used by GitHub Copilot).
In addition, the inference speed of CodeGeeX2 is also faster than that of the first-generation CodeGeeX-13B, which only needs 6GB of video memory to run after quantization, and supports lightweight localized deployment.
At present, the CodeGeeX plug-in can be downloaded and experienced in mainstream IDEs such as VS Code, IntelliJ IDEA, PyCharm, GoLand, WebStorm, and Android Studio.
Domestic large model is fully self-developed
At the conference, Zhang Peng, CEO of Zhipu AI, threw out his own opinion at the beginning - the first year of the large model was not in the year when ChatGPT triggered the LLM boom, but in 2020, when GPT-3 was born.
At that time, Zhipu AI, which had just been established for one year, began to use the power of the whole company to ALL in large models.
As one of the first companies to enter the large-scale model research, Zhipu AI has accumulated sufficient enterprise service capabilities; As one of the "first companies to eat crabs" on open source, ChatGLM-6B topped the Hugging face trend list within four weeks of its launch, and won 5w+ stars on GitHub.
In 2023, when the war is raging in the large model industry, Zhipu AI once again stands in the spotlight and occupies the first-mover advantage with the newly upgraded ChatGLM3.
Resources: