Karpathy's latest Agent perspective: Big companies don't have the core technology of agents; individual developers are dominating the frontier.

robot
Abstract generation in progress

Andrej Karpathy's latest internal views on agents directly point out the biggest mistake in the current AI field: people are forcing agents to work while completely ignoring that they should first thoroughly master the underlying large models. AK made a counterintuitive conclusion: currently, those at the forefront of agents are not big companies but independent developers and entrepreneurs. The video is attached at the end of this article.

As early as 2016, OpenAI fell into this trap and paid the price for a full five years.

The core logic Karpathy wants to convey: Step one is to immediately stop the illusion of letting agents handle everything, and first understand the underlying models. Step two is to recognize the industry reality—making a demo is extremely simple, but making a product takes ten years. Self-driving has already confirmed this; if you skip the foundational stage, everything built will collapse instantly. Step three is to understand that agents themselves are not products at all; the foundational large models are the true core. Once the foundation is solid, agents will naturally emerge.

Looking back at his experience at OpenAI in 2016, Karpathy, along with Tim Shi and Jim Fan, participated in a project called "World of Bits." Their initial intention was to free reinforcement learning agents from testing on tasks like playing games, no longer playing Montezuma's Revenge, and instead try to use computers, keyboards, and mice.

They hoped the agents could perform some truly useful daily tasks, such as booking flights or ordering takeout on extremely simple web pages. At that time, they had the AI frantically click mouse and keyboard buttons, trying to break into a more advanced intelligent world by sheer luck. As expected, the project failed completely.

The technology at that time simply wasn't ready; the team's only tool was reinforcement learning. The correct approach at that point should have been to completely forget about AI agents and focus all effort on building language models.

Five years later, after briefly venturing into the field of self-driving, Karpathy found that AI agents had once again become the absolute hot topic in the industry, but the entire toolchain had undergone earth-shaking changes. Today, the methods everyone uses to solve these problems are completely restructured, and people developing agents most likely don't need to use any reinforcement learning techniques at all. This evolution exceeded everyone's expectations at the time.

Now, everyone is狂热追捧智能体 (avidly pursuing agents) because it's easy to imagine that AGI will ultimately manifest in some form of AI agent. In the future, there will likely be swarms of agents, even forming massive digital entities, organizations, or civilizations. This indeed sounds exciting.

Faced with this fervor, Karpathy chooses to pour cold water on it. There is a category of problems where it's easy to use imagination and create cool demos, but turning them into real products is extremely difficult.

Self-driving is an extremely typical example. It's easy to imagine a car driving autonomously around the block and make a demo, but turning it into a real product takes ten years. The same goes for VR. Agents perfectly match this characteristic: imagination and demos are simple, but to make them truly work, developers must be prepared to grind for ten years.

To find new ideas, Karpathy suggests everyone draw inspiration from neuroscience again. That's what was done in the early days of deep learning, and now, developing agents can once again reference the brain's operating patterns.

A complete digital entity needs to have all the cognitive tools humans possess. In addition to language models as part of the solution, it also needs an internal assistant to plan ahead and reflect on behavior.

The brain's structure provides a perfect reference blueprint. The equivalent of the hippocampus in an AI agent is recording memory traces, using vector embedding techniques for indexing and retrieval. We roughly know how to build the visual and auditory cortices for digital entities, and the role of the thalamus is also worth pondering. The thalamus is responsible for integrating all information and can be considered the seat of consciousness. When multiple digital entities compete for control and the microphone to decide the next action, the thalamus handles this complex conflict. Karpathy also specifically recommended David Eagleman's book "The Brain and Behavior," believing that neuroscience holds excellent inspiration for designing digital individuals.

Finally, Karpathy shared a highly disruptive industry status quo.

Currently, those at the very forefront of AI agent capabilities are undoubtedly the independent developers and entrepreneurs who are building agents right now. Large language model labs like OpenAI or DeepMind are not at the forefront of the agent track.

OpenAI is very good at training massive Transformer language models. If a paper proposing a new Transformer training method is published now, it's highly likely that OpenAI has already tried it two and a half years ago and clearly knows the reasons for success or failure. Big companies have absolute technical barriers in this area.

The situation is completely different when a paper about a new type of agent is published. Teams at big companies will also find it novel because they haven't secretly researched this specific branch for five years. This means that giants must compete on this track with all grassroots entrepreneurs and hackers.

For ordinary developers currently building agents, you are at the very forefront of this transformative technology.

Source: AI Cambrian

Risk Warning and Disclaimer

        Markets are risky, and investment requires caution. This article does not constitute personal investment advice, nor does it consider the specific investment objectives, financial situations, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their specific circumstances. Investment at your own risk.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned