Definition

World models are AI systems that learn internal representations of how the physical world works — predicting the next state of an environment given actions within it, rather than predicting the next token in a text sequence. While LLMs model the statistical patterns of language, world models model causality, physics, spatial relationships, and object permanence. In late 2025 and early 2026, world models emerged as the field's most hyped new frontier: Yann LeCun left Meta to launch AMI Labs (seeking €3B valuation), Fei-Fei Li's World Labs shipped Marble, Google DeepMind released Genie 3, and Nvidia's Cosmos platform surpassed 2 million downloads.

The fundamental difference: tokens vs. states

The core distinction between LLMs and world models is what they predict:

Property	Large Language Model	World Model
What it predicts	The next token in a text sequence	The next state of an environment given an action
Learning signal	Statistical co-occurrence of words across text	Causal dynamics — what happens when you push this object
Representation space	Token embeddings in high-dimensional language space	Latent representations of physical state
Understanding of physics	None — describes physics accurately without "feeling" it	Built-in — trained on video and sensor data of real physical interactions
Hallucinations	Common — predicts plausible-sounding text, not grounded truth	Rarer — grounded in physical observations, not statistical text patterns
Best analogy	Extremely well-read librarian who has read every physics textbook	A child who has played with blocks, water, and gravity for years

LeCun's critique of LLMs

Yann LeCun has argued publicly for years that LLMs will never achieve general intelligence: "They predict the next word based on statistics, not the next state of the world based on physics." When GPT-4 generates text about a ball rolling down a hill, it is not simulating physics — it is predicting which words typically follow other words. It has no internal model of gravity, friction, or momentum. World models are designed to close this gap.

The 2026 world models race

In the span of a few months bridging late 2025 and early 2026, world models went from a niche research topic to the industry's most-funded frontier:

Player	Product / Project	Key milestone	Valuation / Investment
AMI Labs (Yann LeCun)	JEPA-based world models	LeCun left Meta (Dec 2025) to found AMI; builds on V-JEPA 2 trained on 1M+ hours of video	€3B valuation pre-product; offices in Paris, NYC, Montreal, Singapore
World Labs (Fei-Fei Li)	Marble	Ships Marble (Nov 2025) — generates navigable 3D worlds from text/images/video; users can move through and interact with generated environments	$5B valuation in talks; $230M seed raised in 2024
Google DeepMind	Genie 3 / Project Genie	First real-time interactive world model; generates navigable 3D worlds at 24fps from text prompts; paired with SIMA 2 agent for in-world training	Part of DeepMind (Alphabet)
Nvidia	Cosmos platform	Trained on 20M hours of real-world data; 2M+ downloads; three model families (Predict, Transfer, Reason); key infrastructure for robotics AI	$100B+ market cap acceleration from AI adoption
Runway	GWM-1 World Model	First world model from a creative AI company; released Dec 2025; targets robotics and gaming beyond its traditional media/VFX market	Est. $4B valuation

JEPA — LeCun's architecture

AMI Labs is built on Joint Embedding Predictive Architecture (JEPA), developed at Meta. Unlike LLMs that process tokens, JEPA-based models operate in abstract latent spaces and predict how the state of the world changes in response to actions. The key insight: predict in representation space, not pixel space — this avoids the exponential complexity of modeling every visual detail, focusing instead on the semantically meaningful changes.

Why world models matter — real applications

Application	How world models help	Who is doing it
Robotics training	Generate infinite simulated environments for robot training without physical hardware; simulate rare or dangerous scenarios safely	Figure AI, Agility Robotics, 1X — all using Nvidia Cosmos
Autonomous vehicles	Simulate rare edge cases (ice, accidents, unusual pedestrian behavior) that are dangerous or rare in real-world data collection	Waymo, Wayve (GAIA-2 model), Uber, XPENG using Cosmos
Video game development	Generate reactive, physically consistent 3D game worlds from text; procedural generation with real physics	Google Project Genie demos, Iconic AI-native game engine
AR / VR / Spatial computing	Maintain coherent 4D (3D + time) models of the user's environment for stable AR overlays; predict object movement	Apple Vision Pro content pipelines, Meta Orion research
Scientific simulation	Simulate protein folding dynamics, fluid dynamics, material properties — with faster-than-physics-engine speed	DeepMind AlphaFold successors, Runway scientific models
Medical / surgical AI	Simulate surgical procedures; train surgical robots without human patients; predict treatment outcomes in 3D	AMI Labs / Nabla partnership focus area

For students: where to start

World models are a frontier research area — most of the best work is in papers, not products. Start with: (1) DreamerV3 (Hafner et al., 2025) — the most complete open-source world model for RL tasks; (2) Nvidia Cosmos — download and experiment with the open models; (3) Genie 3 technical report from DeepMind; (4) LeCun's 2022 position paper "A Path Towards Autonomous Machine Intelligence" (available free) — the theoretical blueprint for everything AMI Labs is building.

Practice questions

What is the difference between a model-free and model-based reinforcement learning agent? (Answer: Model-free: learns a policy (what to do) or value function (how good is each state) directly from experience, without modelling the environment dynamics. Simple but sample-inefficient — needs many environment interactions. Model-based: explicitly learns a transition model P(s' | s, a) (what happens when action a is taken in state s). Can plan by simulating future trajectories without real environment interaction. Sample-efficient but requires accurate world model. World models aim to give RL agents model-based efficiency.)
What is DreamerV3 and how does it use a world model? (Answer: DreamerV3 (Hafner 2023): learns a compact world model in latent space — a Recurrent State Space Model (RSSM) that predicts latent states from current latent state and action. The agent is trained ENTIRELY within imagined rollouts from this world model — never directly interacting with the real environment during policy training. Environment interaction only updates the world model. This enables DreamerV3 to master diverse tasks (Minecraft, robot locomotion, classic games) with orders of magnitude fewer real environment steps than model-free RL.)
Why are world models important for safety in autonomous systems? (Answer: An autonomous car without a world model must learn purely from real experience — including crashes. A car with a world model can: simulate thousands of dangerous scenarios internally without real risk. Test 'what if I miss the red light' in simulation before ever encountering it. Plan by rolling out multiple potential future trajectories and choosing the safest. Predict other agents' behaviours. Real-world failures are catastrophic; a world model allows safety-critical scenarios to be explored in imagination.)
How does the concept of a 'mental model' in cognitive science relate to AI world models? (Answer: Cognitive science: humans maintain mental models of physics, social relationships, causality, and others' mental states. We plan actions by mentally simulating their consequences. Johnson-Laird (1983): mental models are the basis of reasoning and language understanding. AI world models operationalise this: a neural network that represents environment dynamics enables planning by simulation. The connection is deep — both biological and artificial agents that model their environment before acting are more adaptive and efficient than reactive systems.)
What is a 'latent space world model' and why is it more efficient than pixel-space models? (Answer: Pixel-space world model: learns to predict future video frames at full pixel resolution — computationally expensive (high-dimensional output, each step generates thousands of pixels). Latent space world model: compress observation to compact latent representation via VAE/encoder, model dynamics in latent space (small vectors), decode only for visualisation. DreamerV3's RSSM models 32-dimensional latent states. Planning and policy learning happen in this compact space — 100-1000× fewer computations than pixel-space modelling.)

World Models

The fundamental difference: tokens vs. states

The 2026 world models race

Why world models matter — real applications

Practice questions

Try LumiChats for ₹69

Related Terms