Meta Ai releases iv-japa 2: Earth models are open to supervise the world models of understanding, forecast, and planning

Meta Ai introduced iv-japa 2, the open open open open for reading from the video on the website and enabled a solid understanding, future prediction, and zero-shots. Construction of integrated compositions including (japa), iv-jepa 2 shows that Internet directory, combined with small robot data, can produce the modural foundation of intelligent agents.
Limited order of high control from 1 hours of video history
IV-JEPA 2 Described more than 1 million internet video hosts associated with 1 photos. Using a visual mask that oppose the purpose, the model learns to rebuild the SPIOTECHOOTEMBALLO clips in the Latent Serimentation Space. This method prevents the lack of pixel-level forecasts by focusing on the power of an objective event while ignoring the wrong sound.
Rate the Japa made of this Standard, researchers in the Meta present four strategies:
- Data Rating: The 22M sample dataset (VIDEMIX22) is built from public sources such as SSV2, Kinetics, Howoto100m, YT-Temporel-1b, and Magetet.
- The amazing scale: Increased the volume of 1B parameter using vit-g.
- Training Schedule: He adopted a resolution strategy and extended as if 252k Iterations.
- Temporary extension: Training has been on time and the highest patches, up to 64 frames for 384 × 384 decision.
These make-up decisions lead 88.2% accurate accuracy in six Betchch activities
Understanding by Maskad Settlement Learning
IV-JEPA 2 Bars Skills Skills Significant Skills. In the Northhone – Something of V2 Benchmark, up to 77.3% high-1, EfterformFy models such as an Intervideo and Videoomav2. With the understanding of understanding, the State-the Art-Art-Art-Art-Art-Art-Art models are like dinov2 and Pecoreg.
Cember representation
Temporary thinking about the answer video question
Assessment of a while, V-Japa 2 Encoder is compatible with a large model of language and testing in activities to answer questions for many questions. Despite lack of tongue direction during hypocrisy, the model reaches:
- 84.0% in the imagination
- 76.9% in Tempcoms
- 44.5% on MVP
- 36.7% with Timplaybanch
- 40.3% in tomatomo
These challenges challenging the learning of language learning requires first training, indicating that an impartial encoder can suit the Hoc of a strong generalization.
IV-JEPA 2-AC: Reading Latent World Models for Robotic Planning
Basic establishment in this issue is issued by V-JEPA 2-AC, different from the verb of the congested encoder. Well organized using 62 hours of magnificent robot video from Droid Dataset, IV-Japa 2-AC reads to predict the coming video actualities and pose. Architecture is a 300m transformer with a block-causausal attention, trained using teacher and rollout objective.
This allows for zero shooting with the control of model prediction. The Inters Action Secras model by reducing the distance between future provinces and material purposes using the entry. It reaches the highest success in activities such as access, holding, and taking to choose the invisible arms in different labs – without the reity of the reward or additional collection.

Benchmarks: Powerful performance and well planning
Compared with the foundations such as OCO (Cloning Conduct) and Cosmos (Latent Models of Prof), V-JEPA 2-AC:
- Uses 16 seconds for each step (comparison 4 minutes of Cosmos).
- Average 100% success ratio in accessing jobs.
- The OFTERFORMS others with discretion and deceptive activities in the forms of the item.

Significantly, it works using the Monocular RGB camera without equating or good natural conversion, emphasizing the usual power of the educated country model.
Store
The meta's Iv-jepa's represents an important advancement in good behavioral education. By means of the reading of the reading of the reading from the Action Conditioning and Substance of the Great Assist-Scale Video, IV-JEBa 2 shows that visible material representations show the recognition of the real world.
Look Paper, Models in kissing face including GitHub page. All credit for this study goes to research for this project. Also, feel free to follow it Sane and don't forget to join ours 99k + ml subreddit Then sign up for Our newspaper.

Asphazzaq is a Markteach Media Inc. According to a View Business and Developer, Asifi is committed to integrating a good social intelligence. His latest attempt is launched by the launch of the chemistrylife plan for an intelligence, MarktechPost, a devastating intimate practice of a machine learning and deep learning issues that are clearly and easily understood. The platform is adhering to more than two million moon visits, indicating its popularity between the audience.
