Skip to Content

Meta Releases V-JEPA 2 World Model for Physical AI Systems

Open-source 1.2 billion-parameter model demonstrates breakthrough in robot control and physical reasoning

Meta AI released V-JEPA 2 in June 2025, an open-source world model that represents a significant advance in teaching artificial intelligence systems to understand and predict physical interactions. The 1.2 billion-parameter model, built on Meta's Joint Embedding Predictive Architecture (JEPA), enables AI agents to reason about physical laws and control robots in previously unseen environments.

The model addresses a core limitation in current AI development: creating systems that possess physical intuition and can predict how actions affect real-world environments. While large language models excel at text processing, they typically lack understanding of physical causality and common-sense reasoning about the material world.

V-JEPA 2 learns physical understanding through self-supervised training on over one million hours of video data combined with one million images sourced from internet-scale datasets. This approach allows the system to develop intuitive knowledge about gravity, momentum, object interactions, and other fundamental physical principles without explicit programming.

The architecture comprises two primary components: an encoder that transforms video inputs into meaningful embeddings capturing essential scene information, and a predictor that uses these representations to forecast scene evolution. Unlike traditional generative models that predict individual pixels, V-JEPA 2 operates in abstract representational space, making it significantly more computationally efficient.

Early testing demonstrates the model's capability in robotic control tasks, where it successfully guides robots through manipulation and navigation challenges in environments not seen during training. The system shows particular strength in predicting object behaviour and planning action sequences based on visual input alone.

Meta's decision to open-source V-JEPA 2 through platforms including Hugging Face reflects the company's strategy of accelerating AI research through community collaboration. The release includes pre-trained weights, training code, and benchmark datasets, enabling researchers and developers to build upon the foundation.

Industry experts note that V-JEPA 2's approach of learning from passive video observation, rather than requiring expensive robot interaction data, could significantly reduce the cost and complexity of training physically-aware AI systems. This development potentially accelerates progress in robotics, autonomous vehicles, and other applications requiring real-world understanding.

The model's release comes amid growing industry focus on embodied AI—systems designed to interact meaningfully with physical environments rather than operating purely in digital spaces.