ABot-PhysWorld_SFT_Training_Data_v1
收藏魔搭社区2026-05-14 更新2026-05-03 收录
下载链接:
https://modelscope.cn/datasets/amap_cvlab/ABot-PhysWorld_SFT_Training_Data_v1
下载链接
链接失效反馈官方服务:
资源简介:
<div align="center">
<h1>🤖 ABot-PhysWorld SFT Training Data (v1)</h1>
<p align="center">
<b>AMAP CV Lab</b>
</p>
<p align="center">
<a href="https://arxiv.org/abs/2603.23376"><img src="https://img.shields.io/static/v1?label=Paper&message=arXiv&color=red&logo=arxiv"></a>
<a href="https://github.com/amap-cvlab/ABot-PhysWorld"><img src="https://img.shields.io/badge/Code-GitHub-blue?logo=github"></a>
<a href="https://modelscope.cn/models/amap_cvlab/Abot-PhysWorld"><img src="https://img.shields.io/static/v1?label=Model&message=ModelScope&color=purple"></a>
</p>
</div>
> **Supervised Fine-Tuning (SFT) data** for [ABot-PhysWorld](https://github.com/amap-cvlab/ABot-PhysWorld) — a physically consistent, action-controllable video world model for robotic manipulation built on a 14B Diffusion Transformer.
## 📊 Dataset Overview
This is the **v1** release. The dataset contains **287,557** curated robotic manipulation video clips paired with dense textual descriptions, aggregated from five real-world sources.
| Source | Samples | Description |
|--------|---------|-------------|
| **AgiBot** | 125,259 | Agile robotic manipulation challenges |
| **OXE** | 77,110 | Open X-Embodiment cross-robot data |
| **RoboMIND** | 32,792 | Multi-robot manipulation with diverse end-effectors |
| **RoboCOIN** | 26,450 | Multi-embodiment collaborative manipulation |
| **Galaxea** | 25,946 | Open-world robotic interaction |
| **Total** | **287,557** | |
## 📁 Structure
```
.
├── ABot-PhysWorld_v1.jsonl # Annotations: video path + text prompt
└── videos/
├── AgiBot/
├── OXE/
├── RoboMIND/
├── RoboCOIN/
└── Galaxea/
```
## 📝 Data Format
Each line in `ABot-PhysWorld_v1.jsonl` is a JSON object:
```json
{
"video": "videos/RoboMIND/.../trajectory_camera_left.mp4",
"prompt": "The video opens with a view of a clean, well-lit industrial workspace..."
}
```
- **`video`**: Relative path to the MP4 video file
- **`prompt`**: Dense, multi-paragraph description covering initial scene setup, step-by-step action progression, final state, and camera perspective
## 🚀 Usage
```python
import json, os
DATA_ROOT = "/path/to/this/dataset"
with open(os.path.join(DATA_ROOT, "ABot-PhysWorld_v1.jsonl")) as f:
for line in f:
record = json.loads(line)
video_path = os.path.join(DATA_ROOT, record["video"])
prompt = record["prompt"]
```
## 📜 Citation
```bibtex
@article{abot-physworld2026,
title={ABot-PhysWorld: Interactive World Foundation Model for Robotic Manipulation with Physics Alignment},
author={Yuzhi Chen, Ronghan Chen, Dongjie Huo, Yandan Yang, Dekang Qi, Haoyun Liu, Tong Lin, Shuang Zeng, Junjin Xiao, Xinyuan Chang, Feng Xiong, Xing Wei, Zhiheng Ma, Mu Xu},
year={2026}
}
```
## License
Released for **research purposes only**. Please refer to the original source datasets for their respective licenses.
## 🙏 Acknowledgement
- [RoboMIND](https://huggingface.co/datasets/x-humanoid-robomind/RoboMIND)
- [RoboCOIN](https://huggingface.co/RoboCOIN)
- [AgiBotWorld](https://huggingface.co/datasets/agibot-world/AgiBotWorld-Beta)
- [Galaxea](https://huggingface.co/datasets/OpenGalaxea/Galaxea-Open-World-Dataset)
- [Open X-Embodiment](https://github.com/google-deepmind/open_x_embodiment)
<div align="center">
<h1>🤖 ABot-PhysWorld 监督微调(Supervised Fine-Tuning, SFT)训练数据集(v1版)</h1>
<p align="center">
<b>AMAP CV Lab</b>
</p>
<p align="center">
<a href="https://arxiv.org/abs/2603.23376"><img src="https://img.shields.io/static/v1?label=Paper&message=arXiv&color=red&logo=arxiv"></a>
<a href="https://github.com/amap-cvlab/ABot-PhysWorld"><img src="https://img.shields.io/badge/Code-GitHub-blue?logo=github"></a>
<a href="https://modelscope.cn/models/amap_cvlab/Abot-PhysWorld"><img src="https://img.shields.io/static/v1?label=Model&message=ModelScope&color=purple"></a>
</p>
</div>
> **监督微调(Supervised Fine-Tuning, SFT)数据** 适配 [ABot-PhysWorld](https://github.com/amap-cvlab/ABot-PhysWorld) —— 一款基于140亿参数扩散Transformer(Diffusion Transformer)构建的、具备物理一致性与动作可控性的机器人操作视频世界模型。
## 📊 数据集概览
本数据集为**v1**版发布,包含**287,557**条经过精选的机器人操作视频片段,并配有稠密文本描述,数据整合自5个真实世界数据源。
| 数据源 | 样本量 | 描述 |
|--------|---------|-------------|
| **AgiBot** | 125,259 | 敏捷机器人操作挑战任务 |
| **OXE** | 77,110 | 开放跨构型机器人(Open X-Embodiment)多机器人数据 |
| **RoboMIND** | 32,792 | 搭载多样化末端执行器的多机器人操作任务 |
| **RoboCOIN** | 26,450 | 多构型协作机器人操作任务 |
| **Galaxea** | 25,946 | 开放世界机器人交互任务 |
| **总计** | **287,557** | |
## 📁 数据集结构
.
├── ABot-PhysWorld_v1.jsonl # 标注文件:包含视频路径与文本提示
└── videos/
├── AgiBot/
├── OXE/
├── RoboMIND/
├── RoboCOIN/
└── Galaxea/
## 📝 数据格式
`ABot-PhysWorld_v1.jsonl` 中的每一行均为一个JSON对象:
json
{
"video": "videos/RoboMIND/.../trajectory_camera_left.mp4",
"prompt": "The video opens with a view of a clean, well-lit industrial workspace..."
}
- **`video`**:MP4视频文件的相对路径
- **`prompt`**:稠密多段落描述,覆盖初始场景搭建、分步动作推进、最终状态与相机视角等内容
## 🚀 使用方法
python
import json, os
DATA_ROOT = "/path/to/this/dataset"
with open(os.path.join(DATA_ROOT, "ABot-PhysWorld_v1.jsonl")) as f:
for line in f:
record = json.loads(line)
video_path = os.path.join(DATA_ROOT, record["video"])
prompt = record["prompt"]
## 📜 引用格式
bibtex
@article{abot-physworld2026,
title={ABot-PhysWorld: Interactive World Foundation Model for Robotic Manipulation with Physics Alignment},
author={Yuzhi Chen, Ronghan Chen, Dongjie Huo, Yandan Yang, Dekang Qi, Haoyun Liu, Tong Lin, Shuang Zeng, Junjin Xiao, Xinyuan Chang, Feng Xiong, Xing Wei, Zhiheng Ma, Mu Xu},
year={2026}
}
## 许可证
本数据集**仅用于科研用途**,请遵循各原始数据源的专属许可协议。
## 🙏 致谢
- [RoboMIND](https://huggingface.co/datasets/x-humanoid-robomind/RoboMIND)
- [RoboCOIN](https://huggingface.co/RoboCOIN)
- [AgiBotWorld](https://huggingface.co/datasets/agibot-world/AgiBotWorld-Beta)
- [Galaxea](https://huggingface.co/datasets/OpenGalaxea/Galaxea-Open-World-Dataset)
- [Open X-Embodiment](https://github.com/google-deepmind/open_x_embodiment)
提供机构:
maas
创建时间:
2026-03-26



