Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
GateRouter
Smartly choose from 40+ AI models, with 0% extra fees
Fudan NLP team released an 80-page overview of large-scale model agents, providing an overview of the current situation and future of AI agents in one article
Source: Heart of the Machine
Recently, the Fudan University Natural Language Processing Team (FudanNLP) launched a review paper on LLM-based Agents. The full text is 86 pages long and has more than 600 references! Starting from the history of AI Agents, the authors comprehensively sorted out the current status of intelligent agents based on large language models, including: the background, composition, application scenarios of LLM-based Agents, and the much-discussed agent society**. At the same time, the authors discussed forward-looking and open issues related to Agent, which are of great value to the future development trends of related fields.
**Team members will also add a “one-sentence summary” to each relevant paper, welcome to the Star warehouse. **
Research Background
For a long time, researchers have been pursuing Artificial General Intelligence (AGI) that is equivalent to or even beyond human levels. As early as the 1950s, Alan Turing extended the concept of “intelligence” to artificial entities and proposed the famous Turing test. These artificial intelligence entities are often called agents. The concept of “agent” originates from philosophy and describes an entity that has desires, beliefs, intentions, and the ability to take action. In the field of artificial intelligence, this term has been given a new meaning: intelligent entities with characteristics of autonomy, reactivity, positivity and sociability.
**There is no consensus on the Chinese translation of the term Agent. Some scholars translate it as agent, actor, agent or intelligent agent. The “agent” and “intelligent agent” appearing in this article both refer to Agent. *
Since then, the design of agents has been a focus of the artificial intelligence community. However, past work has mainly focused on enhancing specific abilities of agents, such as symbolic reasoning or mastery of specific tasks (chess, Go, etc.). These studies focus more on algorithm design and training strategies, while ignoring the development of the inherent general capabilities of the model, such as knowledge memory, long-term planning, effective generalization, and efficient interaction. It turns out that **enhancing the inherent capabilities of the model is a key factor in promoting the further development of intelligent agents. **
The emergence of large language models (LLMs) brings hope for the further development of intelligent agents. If the development route from NLP to AGI is divided into five levels: corpus, Internet, perception, embodiment, and social attributes, then the current large-scale language model has reached the second level, with Internet-scale text input and output. On this basis, if LLM-based Agents are given perception space and action space, they will reach the third and fourth levels. Furthermore, when multiple agents interact and cooperate to solve more complex tasks, or reflect social behaviors in the real world, they have the potential to reach the fifth level - agent society.
The birth of an Agent
What would an intelligent agent supported by a large model look like? Inspired by Darwin’s “survival of the fittest” law, the authors proposed a general framework for intelligent agents based on large models. If a person wants to survive in society, he must learn to adapt to the environment, so he needs to have cognitive abilities and be able to perceive and respond to changes in the outside world. Similarly, the framework of intelligent agents also consists of three parts: **Control terminal (Brain), perception terminal (Perception) and action terminal (Action). **
The authors use an example to illustrate the workflow of the LLM-based Agent: when a human asks whether it will rain, the perception end (Perception) converts the instruction into a representation that LLMs can understand. Then the control terminal (Brain) starts reasoning and action planning based on the current weather and weather forecasts on the Internet. Finally, the Action responds and hands the umbrella to the human.
By repeating the above process, the intelligent agent can continuously obtain feedback and interact with the environment.
Control terminal: Brain
As the core component of the intelligent agent, the authors introduce its capabilities from five aspects:
**Natural language interaction: **Language is the medium of communication and contains rich information. Thanks to the powerful natural language generation and understanding capabilities of LLMs, intelligent agents can interact with the outside world for multiple rounds through natural language to achieve their goals. Specifically, it can be divided into two aspects:
Knowledge: LLMs trained based on large batches of corpus have the ability to store massive amounts of knowledge. In addition to language knowledge, common sense knowledge and professional skills knowledge are important components of LLM-based Agents.
Although LLMs themselves still have problems such as expired knowledge and hallucinations, some existing research can alleviate them to a certain extent through knowledge editing or calling external knowledge bases.
Memory: In the framework of this article, the memory module (Memory) stores the agent’s past observations, thoughts, and action sequences. Through specific memory mechanisms, agents can effectively reflect on and apply previous strategies, allowing them to draw on past experiences to adapt to unfamiliar environments.
There are three methods commonly used to improve memory ability:
In addition, the memory retrieval method is also important. Only by retrieving the appropriate content can the agent access the most relevant and accurate information.
Reasoning & Planning: Reasoning ability (Reasoning) is crucial for intelligent agents to perform complex tasks such as decision-making and analysis. Specific to LLMs, it is a series of prompting methods represented by Chain-of-Thought (CoT). Planning is a commonly used strategy when facing large challenges. It helps agents organize their thinking, set goals, and identify steps to achieve those goals. In specific implementation, planning can include two steps:
**Transferability & Generalization: **LLMs with world knowledge endow intelligent agents with powerful migration and generalization capabilities. A good agent is not a static knowledge base, but also has dynamic learning capabilities:
Perception end: Perception
Humans perceive the world in a multi-modal way, so researchers have the same expectations for LLM-based Agents. Multimodal perception can deepen the agent’s understanding of the work environment and significantly improve its versatility.
Text input: As the most basic ability of LLMs, I won’t go into details here.
**Visual input:**LLMs themselves do not have visual perception capabilities and can only understand discrete text content. And visual input usually contains a lot of information about the world, including the properties of objects, spatial relationships, scene layout, etc. Common methods are:
Auditory input: Hearing is also an important part of human perception. Since LLMs have excellent tool calling capabilities, an intuitive idea is that the agent can use LLMs as a control hub, calling existing tool sets or expert models in a cascade manner to perceive audio information. In addition, audio can also be visually represented through a spectrogram. Spectrograms can be used as flat images to display 2D information. Therefore, some visual processing methods can be transferred to the speech field.
Other Inputs: There is much more to information in the real world than just text, sight, and hearing. The authors hope that in the future, intelligent agents will be equipped with richer perception modules, such as touch, smell and other organs, to obtain richer attributes of target objects. At the same time, agents can also clearly feel the temperature, humidity, and brightness of the surrounding environment and take more environment-aware actions.
In addition, the agent can also be introduced to the perception of the broader overall environment: using mature perception modules such as lidar, GPS, and inertial measurement units.
Mobile terminal: Action
After the brain makes analysis and decisions, the agent also needs to take actions to adapt or change the environment:
Text output: As the most basic ability of LLMs, I won’t go into details here.
**Tool Usage:**Although LLMs have excellent knowledge reserves and professional capabilities, when facing specific problems, a series of challenges such as robustness issues and hallucinations may arise. At the same time, tools, as an extension of the user’s capabilities, can provide help in aspects such as professionalism, factuality, and interpretability. For example, you can use a calculator to solve math problems and a search engine to search for real-time information.
In addition, tools can also expand the action space of intelligent agents. For example, multi-modal actions can be obtained by calling expert models such as speech generation and image generation. Therefore, how to make agents become excellent tool users, that is, learn how to use tools effectively, is a very important and promising direction.
Currently, the main methods of tool learning include learning from demonstrations and learning from feedback. In addition, meta-learning, course learning, etc. can also be used to provide agents with generalization capabilities in using various tools. Going one step further, intelligent agents can further learn how to make tools “self-sufficiently”, thereby increasing their autonomy and independence.
**Embodied action: **Embodiment refers to the ability of an agent to understand, transform the environment and update its own state during the interaction with the environment. Embodied Action is regarded as the bridge between virtual intelligence and physical reality.
Traditional agents based on reinforcement learning have limitations in sample efficiency, generalization and complex problem reasoning, while LLM-based Agents introduce rich intrinsic knowledge of large models, enabling Embodied Agents to actively perceive and influence physics like humans. environment. Depending on the degree of autonomy of the agent in the task or the complexity of the Action, there can be the following atomic Actions:
By combining these atomic actions, agents can complete more complex tasks. For example, embodied QA tasks such as “Is the watermelon in the kitchen bigger than the bowl?” To solve this problem, the agent needs to navigate to the kitchen and derive the answer after observing the size of both.
Limited by the high cost of physical world hardware and the lack of embodied data sets, current research on embodied actions is still mainly focused on virtual sandbox environments such as the game platform “Minecraft”. Therefore, on the one hand, the authors look forward to a task paradigm and evaluation standard that is closer to reality. On the other hand, they also need more exploration on the efficient construction of relevant data sets.
Agent in Practice: Diverse application scenarios
Currently, LLM-based Agents have demonstrated impressive diversity and powerful performance. Familiar application examples such as AutoGPT, MetaGPT, CAMEL and GPT Engineer are booming at an unprecedented speed.
Before introducing specific applications, the authors discuss the design principles of Agent in Practice:
Help users free themselves from daily tasks and repetitive labor, reduce human work pressure, and improve the efficiency of solving tasks;
Users no longer need to issue explicit low-level instructions, and can analyze, plan, and solve problems completely independently;
After liberating the user’s hands, try to liberate the brain: give full play to their potential in cutting-edge scientific fields and complete innovative and exploratory work.
On this basis, the application of agents can have three paradigms:
Single agent scenario
Intelligent agents that can accept human natural language commands and perform daily tasks are currently favored by users and have high practical value. The authors first elaborated on its diverse application scenarios and corresponding capabilities in the application scenario of a single intelligent agent.
In this article, the application of a single intelligent agent is divided into the following three levels:
Multi-agent scenario
As early as 1986, Marvin Minsky made a forward-looking prediction. In The Society of Mind, he proposed a novel theory of intelligence, arguing that intelligence arises from the interaction of many smaller, function-specific agents. For example, some agents may be responsible for identifying patterns, while others may be responsible for making decisions or generating solutions.
This idea has been implemented concretely with the rise of distributed artificial intelligence. Multi-Agent systems (Multi-Agent), as one of the main research issues, mainly focus on how agents can effectively coordinate and collaborate to solve problems. The author of this article divides the interaction between multiple agents into the following two forms:
Cooperative interaction: As the most widely deployed type in practical applications, cooperative agent systems can effectively improve task efficiency and jointly improve decision-making. Specifically, according to different forms of cooperation, the authors subdivide cooperative interactions into disordered cooperation and ordered cooperation.
Adversarial interaction: Intelligent agents interact in a tit-for-tat manner. Through competition, negotiation, and debate, agents abandon their original possibly erroneous beliefs and conduct meaningful reflections on their own behavior or reasoning process, which ultimately leads to an improvement in the response quality of the entire system.
Human-computer interaction scenario
Human-Agent Interaction, as the name suggests, is an intelligent agent that cooperates with humans to complete tasks. On the one hand, the agent’s dynamic learning ability needs to be supported by communication; on the other hand, the current agent system is still insufficient in interpretability and may have problems with security, legality, etc., so it requires human participation. Regulation and supervision.
In the paper, the authors divide Human-Agent interaction into the following two modes:
Agent Society: From Personality to Sociality
For a long time, researchers have been dreaming of building an “interactive artificial society.” From the sandbox game “The Sims” to the “Metaverse”, people’s definition of simulated society can be summarized as: environment + individuals living and interacting in the environment .
In the article, the authors use a diagram to describe the conceptual framework of Agent society:
In this framework we can see:
Social Behavior and Personality of Agents
The article examines the performance of agents in society from the perspective of external behavior and internal personality:
Social behavior: From a social perspective, behavior can be divided into two levels: individual and collective:
Personality: Including cognition, emotion and personality. Just as humans gradually develop their own traits through the process of socialization, agents also exhibit so-called “human-like intelligence”, which is the gradual shaping of personality through interaction with groups and environments.
Simulated social operating environment
The agent society is not only composed of independent individuals, but also includes the environment with which they interact. The environment influences how agents perceive, act, and interact. In turn, agents also change the state of the environment through their actions and decisions. For an individual agent, the environment includes other autonomous agents, humans, and available resources.
Here, the authors explore three types of environments:
Text-based environments: Since LLMs rely primarily on language as their input and output formats, text-based environments are the most natural operating platform for agents. Social phenomena and interactions are described through words, and the textual environment provides semantic and background knowledge. Agents exist in such textual worlds and rely on textual resources to perceive, reason, and act.
Virtual sandbox environment: In the computer field, a sandbox refers to a controlled and isolated environment, often used for software testing and virus analysis. The virtual sandbox environment of the agent society serves as a platform for simulating social interaction and behavioral simulation. Its main features include:
Real Physical Environment: The physical environment is the tangible environment consisting of actual objects and spaces in which agents observe and act. This environment introduces rich sensory input (visual, auditory, and spatial). Unlike virtual environments, physical spaces place more demands on agent behavior. That is, the agent must be adaptable in the physical environment and generate executable motion control.
The author gives an example to explain the complexity of the physical environment: imagine an intelligent agent operating a robotic arm in a factory. When operating the robotic arm, precise control of force is required to avoid damaging objects of different materials; in addition, the agent needs to be in the physical workspace Navigate in the middle and adjust the movement path in time to avoid obstacles and optimize the movement trajectory of the robotic arm.
These requirements increase the complexity and challenge of agents in the physical environment.
**Simulation, start! **
In the article, the authors believe that a simulated society should be open, persistent, situational, and organized. Openness allows agents to enter and leave the simulated society autonomously; persistence means that the society has a coherent trajectory that develops over time; contextuality emphasizes the existence and operation of subjects in a specific environment; organization ensures that the simulated society has a physical world-like rules and restrictions.
As for the significance of simulated society, Stanford University’s Generative Agents town provides a vivid example for everyone - Agent society can be used to explore the capabilities of group intelligence, for example, the agents jointly organized a Valentine’s Day party; it can also be used to Accelerate social science research, such as observing communication phenomena by simulating social networks. In addition, there are also studies to explore the values behind agents by simulating ethical decision-making scenarios, and to assist decision-making by simulating the impact of policies on society.
Furthermore, the author pointed out that these simulations may also have certain risks, including but not limited to: harmful social phenomena; stereotypes and prejudices; privacy and security issues; over-dependence and addiction.
Forward-looking open questions
At the end of the paper, the author also discusses some forward-looking open questions and provides some inspiration for readers to think about:
**How can the research on intelligent agents and large language models promote each other and develop together? **Large models have shown strong potential in language understanding, decision-making, and generalization capabilities, and have become a key role in the agent construction process. The progress of agents has also put forward higher requirements for large models.
**What challenges and concerns will LLM-based Agents bring? ** Whether intelligent agents can truly be put into practice requires rigorous security assessment to avoid harm to the real world. The author summarizes more potential threats, such as: illegal abuse, risk of unemployment, impact on human well-being, etc.
**What opportunities and challenges will scaling up bring? **In a simulated society, increasing the number of individuals can significantly improve the credibility and authenticity of the simulation. However, as the number of agents increases, communication and message dissemination problems will become quite complex, and information distortion, misunderstanding, or hallucination will significantly reduce the efficiency of the entire simulation system.
**There is a debate on the Internet about whether LLM-based Agent is the appropriate path to AGI. **Some researchers believe that large models represented by GPT-4 have been trained on sufficient corpus, and agents built on this basis have the potential to become the key to opening the door to AGI. But other researchers believe that auto-regressive language modeling does not show real intelligence because they only respond. A more complete modeling method, such as World Model, can lead to AGI.
**The evolution of swarm intelligence. Swarm intelligence is a process of gathering the opinions of many people and converting them into decisions. **However, will true “intelligence” be produced by simply increasing the number of agents? In addition, how to coordinate individual agents to enable a society of intelligent agents to overcome “groupthink” and personal cognitive biases?
**Agent as a Service (AaaS). **Since LLM-based Agents are more complex than the large model itself, and are more difficult for small and medium-sized enterprises or individuals to build locally, cloud vendors can consider implementing intelligent agents in the form of services, that is, Agent-as-a-Service. Like other cloud services, AaaS has the potential to provide users with high flexibility and on-demand self-service.