Fei-Fei Li redefines the world model: physical simulation is the ultimate of spatial intelligence
Fei-Fei Li first proposed the physical framework and approach of the world model on Substack, emphasizing that the model must learn the spatiotemporal statistical structure rather than being composed solely of text. The framework divides the world model into three components: renderer, simulator, and planner, believing that a simulator capable of predicting physical feedback serves as a bridge between perception and action. In the future, the boundaries among the three will merge into a unified world model. Marble, as an early example, outputs both rendered images and collision meshes with a single model, demonstrating boundary fusion.