shi-labs/physical-ai-bench-generation
收藏Hugging Face2025-12-10 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/shi-labs/physical-ai-bench-generation
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
config_name: pbench
features:
- name: image_path
dtype: string
- name: prompt
dtype: string
- name: question
dtype: string
- name: answer
dtype: string
- name: domain
dtype: string
splits:
- name: benchmark
num_bytes: 237000000
num_examples: 1044
download_size: 226000000
dataset_size: 237000000
configs:
- config_name: default
data_files:
- split: benchmark
path: "cosmos_predict2_bench_full_info.json"
task_categories:
- visual-question-answering
- text-generation
language:
- en
license: cc-by-nc-4.0
size_categories:
- 1K<n<10K
tags:
- physical-ai
- world-models
- benchmark
- multimodal
---
# Physical AI Bench - Generation
[Paper](https://huggingface.co/papers/2512.01989) | [Code](https://github.com/SHI-Labs/physical-ai-bench)
## Dataset Description
The PAI-Bench is a benchmark to measure the progress of world models quantitatively.
The predict task contains a list of 1044 samples of text prompts, conditioning images, and qa pairs, covering Physical AI target domains including autonomous vehicle (AV) driving, robotics, industry (smart space), physics, human, and common sense. All the questions are binary questions, and the answer is either Yes or No. Our dataset is a benchmark designed to evaluate world models for Physical AI.
This dataset is ready for non-commercial use.
## License/Terms of Use
The use of this dataset is governed by [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/deed.en).
## Intended Usage
This benchmark dataset is intended to demonstrate and facilitate the understanding and evaluation of world models for Physical AI. It should primarily be used for educational and demonstration purposes.
## Dataset Characterization
This dataset focuses on the following areas: Autonomous Vehicle (AV) driving, Robotics, Industry (smart space), Physics, Human, Common Sense.
### Data Collection Method
- AV: Automatic/Sensors
- Industry: Automatic/Sensors
- Physics: Automatic/Sensors
- Robotics: Automatic/Sensors
- Human: Automatic/Sensors
- Common Sense: Human
### Labeling Method
- AV: Hybrid: Human, Automated
- Industry: Hybrid: Human, Automated
- Physics: Hybrid: Human, Automated
- Robotics: Hybrid: Human, Automated
- Human: Hybrid: Human, Automated
- Common Sense: Hybrid: Human, Automated
## Folder Structure
```text
pbench/
├── condition_image/ # Conditioning images for all domains
├── vqa/ # Visual Question Answering pairs
└── cosmos_predict2_bench_full_info.json # Complete dataset metadata
```
## Dataset Format
- Modality: Image (jpg) and Text
## Dataset Quantification
The dataset is stored in JSON files. The quantity, including the conditioning images, text prompts, and qa pairs, of the Pbench dataset is described in the table below.
| Domain | Quantity |
| ---------------------- | ---------- |
| AV | 118 |
| Common Sense | 239 |
| Human | 299 |
| Industry | 107 |
| Physics | 107 |
| Robotics | 174 |
| **Total Storage Size** | **226 MB** |
## Citation
If you use Physical AI Bench in your research, please cite:
```bibtex
@misc{zhou2025paibenchcomprehensivebenchmarkphysical,
title={PAI-Bench: A Comprehensive Benchmark For Physical AI},
author={Fengzhe Zhou and Jiannan Huang and Jialuo Li and Deva Ramanan and Humphrey Shi},
year={2025},
eprint={2512.01989},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.01989},
}
```
提供机构:
shi-labs



