Conversation with a16z: LLM is lossy compression, the world model is the true direction.

2025-06-05 13:42:38

World Labs is a startup founded in 2024 by renowned AI expert and Stanford University professor Fei-Fei Li, dedicated to developing the next generation of AI systems with "spatial intelligence."

Since its establishment, World Labs has completed two rounds of financing, raising a total of approximately $230 million. Major investors include a16z, Radical Ventures, NEA, NVIDIA NVentures, AMD Ventures, and Intel Capital. The company's valuation surpassed $1 billion in just three months, making it a new unicorn in the AI field.

Recently, Fei-Fei Li engaged in a dialogue with two partners from a16z, Martin Casado and Eric Torenberg. She publicly discussed for the first time the conceptual foundations, research directions, and grand vision behind their co-founding of World Labs. The strategic evolution of the a16z platform: from VC "not wanting to wipe the bottom" to "full-stack services."

Li Feifei pointed out the core point of this dialogue right from the start: "I don't need large language models to convince me; the world model is the truly important direction."

She emphasized that spatial intelligence—whether in the three-dimensional physical world we live in or the imagined digital universe—is an indispensable component of intelligence. And today, we finally have the ability to generate and reconstruct these universes.

▍Intelligence Older than Language: Spatial Perception and 3D Reconstruction

Fei-Fei Li pointed out that, compared to language, spatial perception is a much older and more instinctive ability in the process of human evolution. She shared a personal experience: several years ago, she temporarily lost her three-dimensional vision due to a corneal injury. During that time, she did not dare to drive alone; even on familiar streets, it was difficult to judge the distance to other cars.

This experimental experience made her deeply aware of the fundamental role of three-dimensional perception systems in human actions. For AI, if it cannot establish a three-dimensional world model, it cannot truly understand, manipulate, or reconstruct the real world.

Martin Casado adds that this lack of three-dimensional intelligence is a key reason why robots and embodied intelligence systems have been slow to land. He uses a layman's example to explain that if you take a person into a strange room, blindfold him, describe the space in words alone, and then let him complete the task – it's almost impossible. Once the eyes are opened, the brain automatically reconstructs the spatial model and completes the action. This kind of reconstruction capability is completely lacking in the current mainstream language models.

▍The Technological Critical Point from NeRF to World Models

When talking about why he chose to establish World Labs at this time, Fei-Fei Li believes that this is the result of long-term academic research and accumulation of industrial foundations.

She recalled that four years ago, a breakthrough research called NeRF (Neural Radiance Fields) opened up new pathways for 3D visual modeling. The proposer of NeRF is Ben Mildenhall, one of the current co-founders of World Labs.

The other founder, Christopher, conducted pioneering research in efficient three-dimensional representation, promoting the resurgence of volumetric 3D modeling in the industry.

In addition to Justin Johnson, who applied GAN technology to image style transfer in earlier works, these fragmented research results are now integrated within the same team, revolving around a "North Star" goal: to build the world model capability of AI.

Martin summarized this goal as a deep integration of two systems: one is the AI models, data, and architecture itself, and the other is the engineering system of graphical rendering and spatial reconstruction. Enabling experts from these two worlds to collaborate efficiently on a single platform is, in itself, an important organizational innovation in the technology industry.

▍The language model is not the end, but the prologue

Fei-Fei Li emphasized that her belief in world models does not stem from disappointment in LLMs, but from a deeper understanding of the essence of intelligence.

She pointed out that language is a "lossy compression" of cognition; it abstracts the world but also loses rich physical and perceptual information. In the true real world, there are no words, grammar, or text, only physics, movement, and three-dimensional structures.

This view has also changed her understanding of the form that AI companies should take. The shift from Stanford professor to entrepreneur was driven by her realization that achieving modeling of spatial intelligence requires far more than academic research— it needs industrial-level computing power investment, system-level architecture scheduling, and the collaborative capabilities of top interdisciplinary talent.

And all of this can only truly be realized in a company with a very high level of organization and outstanding full-stack engineering collaboration capabilities.

▍Space intelligent applications far exceed robots

For most people, the term "world model" remains an abstract scientific term. However, Fei-Fei Li and Martin point out that its applications go far beyond autonomous driving and robotics.

Creativity is inherently visual. Industrial design, filmmaking, architectural composition, and even game development all rely on 3D construction and control. And if AI has the ability to model the world, it can not only "understand" the 3D world, but also "generate" and "manipulate" the virtual space.

Martin described that with just a photo of a table, the model can infer the shape and material behind it, thereby constructing a complete spatial scene. On this basis, users can even measure, add, delete, or redesign the space. This is a more intuitive and free form of human-computer interaction than text instructions, and it opens up a whole new dimension for design, creation, and simulation experiments.

Li Feifei further pointed out that the digital space is bringing an unprecedented opportunity for change: "Human beings have so far only lived in a three-dimensional physical world. But the digital world, for the first time, will allow us to enter the 'multiverse'. ”

She listed several examples: some universes are built specifically for robots, some serve human creativity, and some are used for storytelling, communication, and experiencing travel. These spaces, which once only existed in imagination, will now be truly generated and understood, used, and transformed by machines.

▍Next Battle of the Basic Model: 3D Panoramic Modeling

Returning to the technology itself, Fei-Fei Li emphasized that World Labs is not just trying to create an AI that can "see," but rather to enable AI to understand the three-dimensional structure, dynamics, and combinatorial logic of the world. This is not just a more challenging engineering problem, but also a completely new representation philosophy.

She believes that scientific discoveries such as the double helix structure of DNA and fullerenes are the crystallization of spatial intelligence. It is impossible to derive such geometric constructs solely through language. This is also why world models can not only enhance the understanding capabilities of machines but may also open new creative pathways for human science and art.

Martin concluded that the revolution brought by LLM proves a fact: when we find the right data structures and model representations, the capabilities of AI will experience exponential growth. Now, they believe that the "world model" is at a similar tipping point.

▍The Key to Understanding and Building the World

"We are actually moving backwards on the path of evolution." When Martin raised this point, the entire conversation also reached a philosophical level.

Language is one of the latest modules to appear in the evolution of the human brain, while the spatial perception system has existed since the time of arthropods, for over 500 million years. Today's AI, if it merely "learns language," cannot truly be said to "understand the world." Only by constructing human-like spatial models can AI be considered to have truly stepped into the realm of "embodied intelligence."

Li Feifei summarized in her usual firm tone: "I have been waiting for this day. Not because I do not believe in language models, but because I deeply understand: the real world is not made up of text."

The world model is the key for AI to truly understand and build this world. From I/O to iO, Jony Ive will drive a new design movement - AI is rewriting the paradigm of computing and the definition of hardware, and it is also the new battlefield after large models.

A-2.31%

View Original

The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.