Glossary/World Models
AI Fundamentals

World Models

AI that understands physics and reality — not just words.


Definition

World models are AI systems that learn internal representations of how the physical world works — predicting the next state of an environment given actions within it, rather than predicting the next token in a text sequence. While LLMs model the statistical patterns of language, world models model causality, physics, spatial relationships, and object permanence. In late 2025 and early 2026, world models emerged as the field's most hyped new frontier: Yann LeCun left Meta to launch AMI Labs (seeking €3B valuation), Fei-Fei Li's World Labs shipped Marble, Google DeepMind released Genie 3, and Nvidia's Cosmos platform surpassed 2 million downloads.

The fundamental difference: tokens vs. states

The core distinction between LLMs and world models is what they predict:

PropertyLarge Language ModelWorld Model
What it predictsThe next token in a text sequenceThe next state of an environment given an action
Learning signalStatistical co-occurrence of words across textCausal dynamics — what happens when you push this object
Representation spaceToken embeddings in high-dimensional language spaceLatent representations of physical state
Understanding of physicsNone — describes physics accurately without "feeling" itBuilt-in — trained on video and sensor data of real physical interactions
HallucinationsCommon — predicts plausible-sounding text, not grounded truthRarer — grounded in physical observations, not statistical text patterns
Best analogyExtremely well-read librarian who has read every physics textbookA child who has played with blocks, water, and gravity for years

LeCun's critique of LLMs

Yann LeCun has argued publicly for years that LLMs will never achieve general intelligence: "They predict the next word based on statistics, not the next state of the world based on physics." When GPT-4 generates text about a ball rolling down a hill, it is not simulating physics — it is predicting which words typically follow other words. It has no internal model of gravity, friction, or momentum. World models are designed to close this gap.

The 2026 world models race

In the span of a few months bridging late 2025 and early 2026, world models went from a niche research topic to the industry's most-funded frontier:

PlayerProduct / ProjectKey milestoneValuation / Investment
AMI Labs (Yann LeCun)JEPA-based world modelsLeCun left Meta (Dec 2025) to found AMI; builds on V-JEPA 2 trained on 1M+ hours of video€3B valuation pre-product; offices in Paris, NYC, Montreal, Singapore
World Labs (Fei-Fei Li)MarbleShips Marble (Nov 2025) — generates navigable 3D worlds from text/images/video; users can move through and interact with generated environments$5B valuation in talks; $230M seed raised in 2024
Google DeepMindGenie 3 / Project GenieFirst real-time interactive world model; generates navigable 3D worlds at 24fps from text prompts; paired with SIMA 2 agent for in-world trainingPart of DeepMind (Alphabet)
NvidiaCosmos platformTrained on 20M hours of real-world data; 2M+ downloads; three model families (Predict, Transfer, Reason); key infrastructure for robotics AI$100B+ market cap acceleration from AI adoption
RunwayGWM-1 World ModelFirst world model from a creative AI company; released Dec 2025; targets robotics and gaming beyond its traditional media/VFX marketEst. $4B valuation

JEPA — LeCun's architecture

AMI Labs is built on Joint Embedding Predictive Architecture (JEPA), developed at Meta. Unlike LLMs that process tokens, JEPA-based models operate in abstract latent spaces and predict how the state of the world changes in response to actions. The key insight: predict in representation space, not pixel space — this avoids the exponential complexity of modeling every visual detail, focusing instead on the semantically meaningful changes.

Why world models matter — real applications

ApplicationHow world models helpWho is doing it
Robotics trainingGenerate infinite simulated environments for robot training without physical hardware; simulate rare or dangerous scenarios safelyFigure AI, Agility Robotics, 1X — all using Nvidia Cosmos
Autonomous vehiclesSimulate rare edge cases (ice, accidents, unusual pedestrian behavior) that are dangerous or rare in real-world data collectionWaymo, Wayve (GAIA-2 model), Uber, XPENG using Cosmos
Video game developmentGenerate reactive, physically consistent 3D game worlds from text; procedural generation with real physicsGoogle Project Genie demos, Iconic AI-native game engine
AR / VR / Spatial computingMaintain coherent 4D (3D + time) models of the user's environment for stable AR overlays; predict object movementApple Vision Pro content pipelines, Meta Orion research
Scientific simulationSimulate protein folding dynamics, fluid dynamics, material properties — with faster-than-physics-engine speedDeepMind AlphaFold successors, Runway scientific models
Medical / surgical AISimulate surgical procedures; train surgical robots without human patients; predict treatment outcomes in 3DAMI Labs / Nabla partnership focus area

For students: where to start

World models are a frontier research area — most of the best work is in papers, not products. Start with: (1) DreamerV3 (Hafner et al., 2025) — the most complete open-source world model for RL tasks; (2) Nvidia Cosmos — download and experiment with the open models; (3) Genie 3 technical report from DeepMind; (4) LeCun's 2022 position paper "A Path Towards Autonomous Machine Intelligence" (available free) — the theoretical blueprint for everything AMI Labs is building.

Practice questions

  1. What is the difference between a model-free and model-based reinforcement learning agent? (Answer: Model-free: learns a policy (what to do) or value function (how good is each state) directly from experience, without modelling the environment dynamics. Simple but sample-inefficient — needs many environment interactions. Model-based: explicitly learns a transition model P(s' | s, a) (what happens when action a is taken in state s). Can plan by simulating future trajectories without real environment interaction. Sample-efficient but requires accurate world model. World models aim to give RL agents model-based efficiency.)
  2. What is DreamerV3 and how does it use a world model? (Answer: DreamerV3 (Hafner 2023): learns a compact world model in latent space — a Recurrent State Space Model (RSSM) that predicts latent states from current latent state and action. The agent is trained ENTIRELY within imagined rollouts from this world model — never directly interacting with the real environment during policy training. Environment interaction only updates the world model. This enables DreamerV3 to master diverse tasks (Minecraft, robot locomotion, classic games) with orders of magnitude fewer real environment steps than model-free RL.)
  3. Why are world models important for safety in autonomous systems? (Answer: An autonomous car without a world model must learn purely from real experience — including crashes. A car with a world model can: simulate thousands of dangerous scenarios internally without real risk. Test 'what if I miss the red light' in simulation before ever encountering it. Plan by rolling out multiple potential future trajectories and choosing the safest. Predict other agents' behaviours. Real-world failures are catastrophic; a world model allows safety-critical scenarios to be explored in imagination.)
  4. How does the concept of a 'mental model' in cognitive science relate to AI world models? (Answer: Cognitive science: humans maintain mental models of physics, social relationships, causality, and others' mental states. We plan actions by mentally simulating their consequences. Johnson-Laird (1983): mental models are the basis of reasoning and language understanding. AI world models operationalise this: a neural network that represents environment dynamics enables planning by simulation. The connection is deep — both biological and artificial agents that model their environment before acting are more adaptive and efficient than reactive systems.)
  5. What is a 'latent space world model' and why is it more efficient than pixel-space models? (Answer: Pixel-space world model: learns to predict future video frames at full pixel resolution — computationally expensive (high-dimensional output, each step generates thousands of pixels). Latent space world model: compress observation to compact latent representation via VAE/encoder, model dynamics in latent space (small vectors), decode only for visualisation. DreamerV3's RSSM models 32-dimensional latent states. Planning and policy learning happen in this compact space — 100-1000× fewer computations than pixel-space modelling.)

Try LumiChats for ₹69

39+ AI models. Study Mode with page-locked answers. Agent Mode with code execution. Pay only on days you use it.

Get Started — ₹69/day

Related Terms

6 terms