IntPhys2
收藏魔搭社区2025-12-29 更新2025-06-14 收录
下载链接:
https://modelscope.cn/datasets/facebook/IntPhys2
下载链接
链接失效反馈官方服务:
资源简介:
<h1 align="center">
IntPhys 2
</h1>
<h3 align="center">
<a href="https://dl.fbaipublicfiles.com/IntPhys2/IntPhys2.zip">Dataset</a> |
<a href="https://huggingface.co/datasets/facebook/IntPhys2">Hugging Face</a> |
<a href="https://arxiv.org/abs/2506.09849">Paper</a> |
<a href="https://ai.meta.com/blog/v-jepa-2-world-model-benchmarks">Blog</a>
</h3>

IntPhys 2 is a video benchmark designed to evaluate the intuitive physics understanding of deep learning models. Building on the original [IntPhys benchmark](https://intphys.cognitive-ml.fr/), IntPhys 2 focuses on four core principles related to macroscopic objects: Permanence, Immutability, Spatio-Temporal Continuity, and Solidity. These conditions are inspired by research into intuitive physical understanding emerging during early childhood. IntPhys 2 offers a comprehensive suite of tests, based on the violation of expectation framework, that challenge models to differentiate between possible and impossible events within controlled and diverse virtual environments. Alongside the benchmark, we provide performance evaluations of several state-of-the-art models. Our findings indicate that while these models demonstrate basic visual understanding, they face significant challenges in grasping intuitive physics across the four principles in complex scenes, with most models performing at chance levels (50\%), in stark contrast to human performance, which achieves near-perfect accuracy. This underscores the gap between current models and human-like intuitive physics understanding, highlighting the need for advancements in model architectures and training methodologies.
**IntPhys2 benchmark splits**
=====================================
We release three separate splits. The first is intended for debugging only and provide some measurement on the model's sensitivity to the video generation artifacts (such as mp4 compression or cloud moving the background of the scene). The second is the main evaluation set with three different sub-splits ("Easy", "Medium", "Hard"). The third is a held-out split that we release without additional metadata.
| Split | Scenes | Videos | Description | Purpose |
|--------------|--------|--------|-----------------------------------------------------------------------------------------------|----------------------|
| Debug Set | 5 | 60 | Static cameras, bright assets, 3 generations | Model calibration |
| Main Set | 253 | 1,012 | Static and moving cameras: 3 sub-splits:<br>- Easy: Simple environments, colorful shapes<br>- Medium: Diverse backgrounds, textured shapes<br>- Hard: Realistic objects, complex backgrounds | Main evaluation set |
| Held-Out Set | 86 | 344 | Moving cameras, Mirrors hard sub-split, includes distractors | Main test set |
## Downloading the benchmark
IntPhys2 is available on [Hugging Face](https://huggingface.co/datasets/facebook/IntPhys2) or by [direct download](https://dl.fbaipublicfiles.com/IntPhys2/IntPhys2.zip
).
## Running the evals
The evaluation code can be found on [Github](https://github.com/facebookresearch/IntPhys2)
## Evaluating on the Held-Out set
We are not releasing the metadata associated with the held-out set to prevent training data contamination, we invite researchers to upload the results in the following [Leaderboard](https://huggingface.co/spaces/facebook/physical_reasoning_leaderboard). The model_answer column in the resulting jsonl file should contain either 1 if the video is deemed possible by the model or 0 if it's not possible.
## License
IntPhys 2 is licensed under the CC BY-NC 4.0 license. Third party content pulled from other locations are subject to their own licenses and you may have other legal obligations or restrictions that govern your use of that content.
The use of IntPhys 2 is limited to evaluation purposes, where it can be utilized to generate tags for classifying visual content, such as videos and images. All other uses, including generative AI applications that create or automate new content (e.g. audio, visual, or text-based), are prohibited.
## Citing IntPhys2
If you use IntPhys2, please cite:
```
@misc{bordes2025intphys2benchmarkingintuitive,
title={IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments},
author={Florian Bordes and Quentin Garrido and Justine T Kao and Adina Williams and Michael Rabbat and Emmanuel Dupoux},
year={2025},
eprint={2506.09849},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.09849},
}
```
<h1 align="center">IntPhys 2</h1>
<h3 align="center">
<a href="https://dl.fbaipublicfiles.com/IntPhys2/IntPhys2.zip">数据集</a> |
<a href="https://huggingface.co/datasets/facebook/IntPhys2">Hugging Face</a> |
<a href="https://arxiv.org/abs/2506.09849">论文</a> |
<a href="https://ai.meta.com/blog/v-jepa-2-world-model-benchmarks">博客</a>
</h3>

IntPhys 2是一款专为评估深度学习模型直观物理理解能力而打造的视频基准测试集。本项目基于原版[IntPhys基准测试集](https://intphys.cognitive-ml.fr/),聚焦于与宏观物体(macroscopic objects)相关的四项核心准则:恒存性(Permanence)、不变性(Immutability)、时空连续性(Spatio-Temporal Continuity)与固体性(Solidity)。这些准则的设计灵感源自针对幼儿时期出现的直观物理认知的相关研究。
IntPhys 2基于违反预期(violation of expectation)范式,提供了一套全面的测试套件,用于挑战模型在受控且多样化的虚拟环境中区分可能与不可能事件的能力。本基准测试集同时附带了多款当前顶尖模型的性能评估结果。我们的研究结果显示,尽管这些模型具备基础的视觉理解能力,但在复杂场景中掌握上述四项直观物理准则时仍面临显著挑战:多数模型的准确率仅处于随机猜测水平(50%),与人类近乎完美的准确率形成鲜明对比。这凸显了当前模型与类人直观物理认知之间的差距,也凸显了在模型架构与训练方法上进行改进的必要性。
**IntPhys2 基准测试集划分**
=====================================
我们发布了三个独立的划分集。第一个划分集仅用于调试,用于评估模型对视频生成伪影(如mp4压缩或场景背景云端移动)的敏感度。第二个划分集为主评估集,包含三个子划分:「简单(Easy)」「中等(Medium)」与「困难(Hard)」。第三个划分集为保留集,我们未附带额外元数据一并发布。
| 划分类型 | 场景数 | 视频数 | 描述 | 用途 |
|--------------|--------|--------|-----------------------------------------------------------------------------------------------|----------------------|
| 调试集 | 5 | 60 | 固定相机、高亮素材、3种生成方式 | 模型校准 |
| 主测试集 | 253 | 1,012 | 固定与移动相机:包含3个子划分:<br>- 简单(Easy):环境简洁,图形色彩丰富<br>- 中等(Medium):背景多样化,图形带纹理<br>- 困难(Hard):物体写实,背景复杂 | 主评估集 |
| 保留集 | 86 | 344 | 移动相机、镜像困难子划分,包含干扰项 | 正式测试集 |
## 基准测试集下载
IntPhys2 可通过 [Hugging Face](https://huggingface.co/datasets/facebook/IntPhys2) 或 [直接下载](https://dl.fbaipublicfiles.com/IntPhys2/IntPhys2.zip) 获取。
## 运行评估
评估代码可在 [GitHub](https://github.com/facebookresearch/IntPhys2) 获取。
## 针对保留集的评估
为防止训练数据污染,我们未发布与保留集相关的元数据,诚邀研究人员将评估结果上传至以下 [排行榜](https://huggingface.co/spaces/facebook/physical_reasoning_leaderboard)。生成的jsonl文件中,`model_answer` 列需按规则填写:若模型判定视频为可能事件则填1,判定为不可能则填0。
## 许可证
IntPhys 2 采用 CC BY-NC 4.0 许可证进行授权。从其他来源获取的第三方内容需遵循其自身的许可证条款,您可能需遵守其他与该内容使用相关的法律义务与限制。
IntPhys2 的使用范围仅限评估用途,可用于生成用于分类视频、图像等视觉内容的标签。其余所有用途均被禁止,包括用于创建或自动生成新内容的生成式AI(Generative AI)应用(如音频、视觉或文本类应用)。
## 引用IntPhys2
若您使用IntPhys2,请引用:
@misc{bordes2025intphys2benchmarkingintuitive,
title={IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments},
author={Florian Bordes and Quentin Garrido and Justine T Kao and Adina Williams and Michael Rabbat and Emmanuel Dupoux},
year={2025},
eprint={2506.09849},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.09849},
}
提供机构:
maas
创建时间:
2025-06-12



