How was the large model Qingliu Cohere created?

Question

Source: Shidao![](https://img-cdn.gateio.im/resized-social/moments-bab2147faf-b8fc9c2c8b-dd1a6f-6d2ef1) Image source: Generated by Unbounded AI**Summary of content for this issue:**> 1 20-year-old boy participated in important papers that ushered in the era of generative AI> 2 A Chinese entrepreneur who likes to play with technology and a talented boy jointly founded Cohere> 3 What risks will AI bring, and what are its biggest opportunities in the future?The global competition in the field of basic large models is still going on. OpenAI, which has raised tens of billions of dollars, is undoubtedly one of the frontrunners. Its killer application ChatGPT has hundreds of millions of active users. It is undoubtedly difficult to compete head-on with it.An AI unicorn with a valuation of US$2 billion has found a differentiated route in the competition of basic large models, becoming a breath of fresh air in the melee of large model entrepreneurship.This unicorn is Cohere, which was co-founded by Aidan Gomez, the youngest author of the groundbreaking paper "Attention is All You Need", and two of his University of Toronto alumni, Ivan Zhang and Nick Frosst.Cohere recently received US$270 million in Series C financing, bringing the cumulative financing amount to more than US$430 million and a valuation of more than US$2.1 billion. Its investor list includes corporate giants such as Salesforce, NVIDIA, and Oracle, top investment institutions such as Tiger Global and Index Ventures, as well as well-known AI experts such as Geoffrey Hinton (the three giants of deep learning and winner of the Turing Award) and Li Feifei. Its partners also include Amazon, McKinsey, etc.Why is Cohere a breath of fresh air in the melee of large-model entrepreneurship?In terms of products, it focuses on serving enterprise-level customers. Based on the powerful large model Command, it provides enterprise-level word processing, knowledge Q&A and other functions, and the model can be fine-tuned and customized. In addition, it launched Coral, an enterprise-level knowledge assistant.In terms of security, in order to dispel the doubts of enterprise customers, its products can be deployed in multi-cloud and on-premises, and have a high degree of data privacy.In terms of financing strategy, it prefers to take money from large companies related to its own industry chain and use the power of giants to develop itself, but it is not tied to giants (refer to the relationship between OpenAI and Microsoft).As a well-known AI unicorn, Cohere's products and corporate competitive advantages have been thoroughly studied. We tried to approach it from the perspective of entrepreneurs, using multiple interviews with Cohere's two founders Aidan Gomez and Ivan Zhang as materials to sort out Cohere's history. The development process from 0 to 1, as well as the many insights of the two entrepreneurs Aidan and Ivan on enterprises and AI.*Note: The material in this article comes from conversations between Cohere investor and Madrona partner Jon Turow, Weights&Bias founder Lukas Biewald, well-known media person Steven Marsh and Cohere’s two co-founders Aidan Gomez and Ivan Zhang. *## The 20-year-old participated in an important paper that ushered in the era of generative AIAidan Gomez is the youngest author of the seminal paper "Attention is All You Need" in the field of large language models. At that time, he went to Google Brain from the University of Toronto to intern. He was still about 19 or 20 years old as an undergraduate. This was his first experience in the American technology world.![](https://img-cdn.gateio.im/resized-social/moments-bab2147faf-7abf74ca64-dd1a6f-6d2ef1) Aidan Gomez at University of TorontoHis internship mentor at Google was Lukasz Kaiser, one of the main authors of "Attention is All You Need." At that time, they jointly built Tensor, a software platform for training large neural networks, and they were also training an AI model. The idea is to train a huge AI model that can learn to do many things from a data set. Training it requires using data sets in multiple modalities, including pictures, text, and even videos.At that time, Aidan and Noam Shazeer (also the author of the Transformer paper) were "students". Noam was also studying large language models, but the algorithm he studied was RNN (Recurrent Neural Network). Noam's goal is to find a new architecture that is simpler, more refined, and more scalable than RNN.So Lukasz, Aidan and Noam hit it off and planned to do this research together. Then they discovered that Niki Parmar, Jakob Uszkoreit, and Ashish Vaswani from Google Brain's translation group had similar ideas to themselves. After the two groups merged, everyone worked together. After extensive research, the final piece "Attention is All You Need" was born.The paper was submitted in the early hours of the morning, when there were only two people in the office, Aidan and Ashish. After the manuscript was submitted, they were immersed in excitement. Ashish has foreseen that this paper may have a huge impact, but the young Aidan is submitting an important paper for the first time and does not yet know the importance of this paper. As he said in an interview with New Yorker columnist Steven Marsh: "I don't think anyone foresees what it will become in the future. "He was really shocked by the practical impact of the Transformer model after returning to the University of Toronto after his internship."At the time, I was doing summer research at the University of Toronto, and then I received an email from Lukasz with the subject line "Look at this." The content of the email was a story about a Japanese punk rock band. The story recorded how they formed a group and how they released an album. Then he regretted the dissolution process. At the end of the email, Lukasz wrote: 'The only word I entered was transformer, and the model automatically generated the story.'After reading this piece of machine-generated text, I think this will start a product revolution. Because for the first time, a non-human system is using language in a way that is as compelling as us humans. ” Aidan said to Steven Marsh.![](https://img-cdn.gateio.im/resized-social/moments-bab2147faf-449ff78bb4-dd1a6f-6d2ef1) Large model evolutionary treeWhen "Attention is All You Need" and Transformer were born, this innovative model was quickly adopted by the AI community and became a new technical standard. It has set off a craze among AI researchers, and powerful models based on Transformer are constantly emerging, such as BERT and GPT. At the end of 2022, ChatGPT officially started the generative AI boom.## A Chinese entrepreneur who likes to play with technology and a talented boy jointly founded CohereIvan Zhang, co-founder of Cohere, is an atypical AI researcher, but a typical entrepreneur. He and Aidan are alumni of the University of Toronto, and later dropped out of school to start a business with Aidan. "I'm a creator. I don't like sitting in a classroom and simply absorbing a lot of information. I need to do it myself and learn while 'playing with technology.' This is the best way for me to learn." This is how he invested in Cohere Jon Turow introduces himself.![](https://img-cdn.gateio.im/resized-social/moments-bab2147faf-ca8db78d47-dd1a6f-6d2ef1) Ivan Zhang, co-founder of Cohere### **From researcher to entrepreneur, from ToC to ToB**In 2017, after dropping out of the University of Toronto, Ivan worked as a back-end engineer in a startup company. It was at this time that he met Aidan Gomez. At that time, Aidan wanted to set up an independent AI research group to conduct interest-led AI research and verify his innovative ideas, so they started FOR.ai together. This organization is still running now and is called Cohere For AI. It is composed of a number of research scientists in the field of AI and mainly conducts basic research on AI.In 2019, Ivan proposed to Aidan: "Why don't we do something new together?" So they became independent from FOR.ai and started a more formal business. At this stage, they already have entrepreneurial experience, understand the correct way to run an organization, and have met many founders in the AI field.In the early days of Cohere, their first idea was to build an AI basic platform that would allow developers to upload AI models, and then the platform would compress the size of the model to make the model more efficient. But at that point in time, the craze for generative AI had not yet arrived, and the market was still too small.As the author of the paper, Aidan observed the booming development of the Transformer model in the AI community, saw that it solved various problems related to text processing, and saw developers improving this architecture. At that time, OpenAI's GPT-2 was released, and the parameters of the Transformer model exceeded 1 billion. This also made Aidan further realize the importance of model scale and the real potential of this model architecture.As a result, several founders transformed Cohere from a model compression platform to basic large models and services."After experiencing GPT-2, we found that it is very cool, but we are not sure what services can be built using basic AI models such as GPT. We first tried to build Cohere's first project, which was a text automatic The completion tool is in the form of a Chrome browser extension. Users only need to enter a piece of text in the text box, and it can automatically continue to complete it. We initially planned to use advertising to make money. (Note: This is a ToC business model). But we Obviously we underestimated the difficulty of building a consumer product. The experience of this product is not good, and it has not gained many users. We understand that we have no competitive advantage in this direction.So we decided to dismantle the front-end interface and only provide the back-end model capabilities, moving from ToC to ToB to provide enterprise-level API services. At that time, 99% of NLP use cases required word embedding and model fine-tuning, so within a few months we built an API platform with AI generation capabilities that could embed and fine-tune the model. ” Ivan shared the thinking behind Cohere’s transformation with Jon Turow.As for why Cohere turned to ToB and the core of the company's mission, Aidan Gomez made a clear statement: "We just want to make AI large models used by more people. At that time, developers and enterprises wanted to take advantage of the capabilities of AI large models, regardless of There are many obstacles in terms of technology and computing power. The meaning of our existence is to remove the obstacles for people to use large AI models, so that developers who are not familiar with AI, as well as ordinary enterprises, can easily use AI capabilities.Because the conversational interaction unique to generative AI is the best experience for end users. Taking myself as an example, when I want to open a bank account, if a bank can have a mobile app that can interact with me 24 hours a day and solve problems efficiently, it will be much more attractive to me.Cohere is here to do just that, helping all types of businesses and organizations harness the power of generative AI to enhance their competitive advantage.![](https://img-cdn.gateio.im/resized-social/moments-bab2147faf-89e46b6e94-dd1a6f-6d2ef1) Cohere allows enterprise customers to fine-tune models with their own dataWhen enterprises adopt AI capabilities, they will also have questions about two issues, namely model hosting and data privacy. We support multi-cloud hosting. Enterprises can choose the cloud service that suits them or deploy it on a local server. We also attach great importance to data privacy. When enterprises use their own data for model fine-tuning, whether deployed in the cloud or on a local server, we will not see their data. This is one of our core features. "### **Eclectic talent strategy shapes Cohere’s high creativity**Cohere's ability to pivot quickly and find its right position in the early stages is inseparable from the talent outlook and entrepreneurial philosophy that Ivan and Aidan have accumulated since FOR.ai. Ivan shared their talent recruitment philosophy and entrepreneurial culture when talking to Jon Turow: "Our recruitment method is different. When FOR.ai started the business, we established a principle: we are looking for people from different backgrounds. , but someone who is very interested in AI and wants to make a huge impact.You don't need to have a perfect background working in Meta AI, DeepMind or Google, but you must have a very high interest and enthusiasm in the field you focus on. And you can not only write papers, but also have practical skills. We brought this recruiting approach to Cohere and built a very strong early-stage team.In terms of company culture, we like to do a lot of exploration in technology, 'play with technology', and then make breakthroughs. Although we all write papers, we are not 'nerds'. We have a very clear idea of what we should do, and we spend a lot of time on engineering practices rather than just algorithm exploration. This allows us to build products that can truly bring benefits to people. value. "Now that OpenAI occupies the C position of generative AI, will ChatGPT, which already has hundreds of millions of active users, help OpenAI monopolize it? Do other companies still have opportunities? Aidan Gomez has his own opinion: "I by no means think that there will be a monopoly in the large model field. I think that every company has its own style, direction and its own advantages, and will find its own position in the market. Consumer and enterprise customers They will choose the best partner, the most trustworthy company, and the platform that can best help them succeed.For basic model companies like Cohere, the final state we face is likely not to be a winner-take-all, but a diversified market structure. We will rely on our own advantages to win our own games. We will use various methods to help customers so that they can use the best AI capabilities. Our focus is to allow the AI model to help specific customers gain maximum value through various methods including prompts and fine-tuning. "## What risks does AI bring, and what are its biggest opportunities in the future?The explosion of generative AI, while being welcomed, has also caused a lot of concerns. At the public level, people are worried about whether AI will develop too fast and be too powerful, thus "stealing" human job opportunities; at the practical level, many people are worried about the safety and controllability of AI models.Aidan Gomez and Ivan Zhang also expressed their opinions on this topic.### **AI may “pollute” social media**Aidan Gomez’s view is more social. He said that the “pollution” of social media by AI-generated content is worthy of concern: “Instead of worrying about non-human intelligence replacing humans, which may not happen for many years, we should Pay attention to current real risks.For example, it is very possible that AI can generate millions of bots that seamlessly enter our social media and public conversations and then push a certain point of view (whether that point of view is helpful or harmful). This may have unforeseen consequences on some public issues that can have a significant impact on society.So we must weigh the risks of this matter, and it is best to have specific policies to mitigate this risk. For example, people have the right to know whether the media content or marketing content we are reading is created by humans or synthesized by machines. "Ivan Zhang’s view is relatively realistic. He believes that AI faces two major challenges: “For the challenges facing AI, the information we obtain from customers is first of all how to evaluate the ability of generative AI models. To accurately compare two AI models The ability is not easy, and in terms of text generation, this comparison is likely to be subjective. This will create certain obstacles to the commercial adoption of generative AI.Another challenge is the issue of data privacy. When using large open-source or closed-source models for commercial use, you sometimes use some sensitive data, which in turn creates compliance issues. For example, when using AI to assist you in writing a sensitive email, would you worry that the sensitive data you input into the model will be abused? Of course, this concern becomes an opportunity for us, and we are working with Oracle to address this issue. "### **Embodied intelligence is a big opportunity for AI in the future**Aidan Gomez and Ivan Zhang are both AI experts and entrepreneurs. Their views on the new directions and opportunities for AI in the future are also worthy of attention.First of all, they all mentioned the same technology on different occasions, which is embodied intelligence, that is, injecting the capabilities of generative AI into tangible machines.Aidan told Lukas Biewald: “I think it’s really cool to apply generative AI to robotics and materialization, and there is a very strong demand in this direction. We all imagined what robots with high intelligence and flexible bodies would do. How about it - it will definitely produce a huge change. But there is still a long way to go in this direction, and I also hope that I can have an impact in this direction and try to do something related."Ivan also believes that embodied intelligence is definitely a big opportunity for the next stage of AI: "I think the biggest opportunity is the 'action model' that can affect entities. Combining AI with engineering and physical products will be very exciting. There will definitely be many companies interested in them. However, for this technology to be materialized, the accuracy of the model needs to be further improved."In addition, Aidan also made a longer-term vision for the intelligent development and future applications of AI: “Now the construction of AI models relies on humans. In order to make AI more intelligent, we will use various high-level human knowledge to train it. For example, it is like asking a very smart person to teach a not-so-smart AI. Then in the future, if the AI model becomes very smart and all human knowledge has been learned by it, it will face a critical point—— Humans have nothing left to teach AI.What I'm most interested in is, what happens if AI breaks through this critical point? If a group of AIs that have learned the existing knowledge of humans talk, explore, and learn together, will they generate new knowledge?Maybe when this time comes, we humans will learn new knowledge from AI, and AI will take humans to swim in the new ocean of knowledge. "