Cosmos-Reason1-RL-Dataset
收藏魔搭社区2026-01-06 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/nv-community/Cosmos-Reason1-RL-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
## Dataset Description:
The data format is a pair of video and text annotations. We summarize the data and annotations in Table 4 (SFT), Table 5 (RL), and Table 6 (Benchmark) of the Cosmos-Reason1 paper. We release the annotations for embodied reasoning tasks for BridgeDatav2, RoboVQA, Agibot, HoloAssist, AV, and the videos for the RoboVQA and AV datasets. We additionally release the annotations and videos for the RoboFail dataset for benchmarks. By releasing the dataset, NVIDIA supports the development of open embodied reasoning models and provides benchmarks to evaluate the progress.
This dataset is ready for commercial/non-commercial use.
## Dataset Owner(s):
NVIDIA Corporation
## Dataset Creation Date:
2025/05/17
## License/Terms of Use:
The use of this dataset is governed by [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/deed.en). Additional Information: [Apache License 2.0](https://github.com/google-deepmind/robovqa/blob/main/LICENSE); [MIT](https://github.com/real-stanford/reflect/blob/main/LICENSE).
## Intended Usage:
This dataset is intended to demonstrate and facilitate understanding and usage of the Cosmos-Reason1 models. It should primarily be used for educational and demonstration purposes.
## Dataset Characterization
The embodied reasoning datasets and benchmarks focus on the following areas: robotics (RoboVQA, BridgeDataV2, Agibot, RobFail), ego-centric human demonstration (HoloAssist), and Autonomous Vehicle (AV) driving video data.
**The AV data is currently unavailable and will be uploaded soon!**
**Data Collection Method**:
* RoboVQA: Hybrid: Automatic/Sensors
* BridgeDataV2: Automatic/Sensors
* AgiBot: Automatic/Sensors
* RoboFail: Automatic/Sensors
* HoloAssist: Human
* AV: Automatic/Sensors
**Labeling Method**:
* RoboVQA: Hybrid: Human,Automated
* BridgeDataV2: Hybrid: Human,Automated
* AgiBot: Hybrid: Human,Automated
* RoboFail: Hybrid: Human,Automated
* HoloAssist: Hybrid: Human,Automated
* AV: Hybrid: Human,Automated
## Dataset Format
* Modality: Video (mp4) and Text
## Dataset Quantification
We release the embodied reasoning data and benchmarks. Each data sample is a pair of video and text. The text annotations include understanding and reasoning annotations described in the Cosmos-Reason1 paper. Each video may have multiple text annotations. The quantity of the video and text pairs is described in the table below.
| Dataset | SFT Data | RL Data | Benchmark Data |
|--------------|---------:|--------:|---------------:|
| [RoboVQA](https://robovqa.github.io/) | 1.14m | 252 | 110 |
| AV | 24.7k | 200 | 100 |
| [BridgeDataV2](https://rail-berkeley.github.io/bridgedata/) | 258k | 240 | 100 |
| [Agibot](https://github.com/OpenDriveLab/AgiBot-World) | 38.9k | 200 | 100 |
| [HoloAssist](https://holoassist.github.io/) | 273k | 200 | 100 |
| [RoboFail](https://robot-reflect.github.io/) | N/A | N/A | 100 |
| **Total Storage Size** | **300.6GB** | **2.6GB** | **1.5GB** | |
We release text annotations for all embodied reasoning datasets and videos for RoboVQA and AV datasets. For other datasets, users may download the source videos from the original data source and find corresponding video sources via the video names. The held-out RoboFail benchmark is released for measuring the generalization capability.
## Reference(s):
[[2503.15558] Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning](https://arxiv.org/abs/2503.15558)
## Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
## 数据集说明
本数据集采用**视频-文本标注对**作为数据格式。我们将数据及标注整理于《Cosmos-Reason1》论文的表4(监督微调(Supervised Fine-Tuning,SFT))、表5(强化学习(Reinforcement Learning,RL))与表6(基准测试)中。
本次发布的内容涵盖BridgeDataV2、RoboVQA、Agibot、HoloAssist、AV五大数据集的具身推理任务标注,以及RoboVQA与AV数据集的对应视频;此外还发布了用于基准测试的RoboFail数据集的标注与视频。NVIDIA通过发布本数据集,助力开源具身推理模型的研发,并提供基准测试工具以评估模型进展。
本数据集可用于商业与非商业用途。
## 数据集所有者
英伟达(NVIDIA)公司
## 数据集创建日期
2025年5月17日
## 使用许可条款
本数据集的使用遵循[CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/deed.en)。补充许可信息:[Apache License 2.0](https://github.com/google-deepmind/robovqa/blob/main/LICENSE)、[MIT协议](https://github.com/real-stanford/reflect/blob/main/LICENSE)。
## 预期用途
本数据集旨在演示并帮助理解、使用Cosmos-Reason1模型,主要用于教学与演示场景。
## 数据集概况
本次发布的具身推理数据集及基准测试覆盖以下领域:机器人学(RoboVQA、BridgeDataV2、Agibot、RoboFail)、第一人称视角人类演示(HoloAssist),以及自动驾驶(Autonomous Vehicle, AV)行驶视频数据。**注:AV数据集目前暂未上线,即将上传!**
### 数据采集方式
* RoboVQA:混合模式:自动采集/传感器采集
* BridgeDataV2:自动采集/传感器采集
* AgiBot:自动采集/传感器采集
* RoboFail:自动采集/传感器采集
* HoloAssist:人工采集
* AV:自动采集/传感器采集
### 标注方式
* RoboVQA:混合模式:人工标注+自动标注
* BridgeDataV2:混合模式:人工标注+自动标注
* AgiBot:混合模式:人工标注+自动标注
* RoboFail:混合模式:人工标注+自动标注
* HoloAssist:混合模式:人工标注+自动标注
* AV:混合模式:人工标注+自动标注
## 数据集格式
* 模态:视频(mp4格式)与文本
## 数据集量化统计
本次发布的具身推理数据及基准测试均采用“视频-文本标注对”作为单样本格式,文本标注包含《Cosmos-Reason1》论文中提及的理解与推理类标注。每个视频可对应多条文本标注。各数据集的视频-文本标注对数量及存储容量如下表所示:
| 数据集 | 监督微调(SFT)数据量 | 强化学习(RL)数据量 | 基准测试数据量 |
|--------------|---------:|--------:|---------------:|
| [RoboVQA](https://robovqa.github.io/) | 114万 | 252 | 110 |
| AV | 2.47万 | 200 | 100 |
| [BridgeDataV2](https://rail-berkeley.github.io/bridgedata/) | 25.8万 | 240 | 100 |
| [Agibot](https://github.com/OpenDriveLab/AgiBot-World) | 3.89万 | 200 | 100 |
| [HoloAssist](https://holoassist.github.io/) | 27.3万 | 200 | 100 |
| [RoboFail](https://robot-reflect.github.io/) | 无数据 | 无数据 | 100 |
| **总存储容量** | **300.6GB** | **2.6GB** | **1.5GB** |
我们已发布所有具身推理数据集的文本标注,以及RoboVQA与AV数据集的对应视频。对于其余数据集,用户可从原始数据源下载源视频,并通过视频名称匹配对应的标注内容。本次发布的预留RoboFail基准测试集用于评估模型的泛化能力。
## 参考文献
[[2503.15558] Cosmos-Reason1:从物理常识到具身推理](https://arxiv.org/abs/2503.15558)
## 伦理考量
英伟达(NVIDIA)认为,可信人工智能是一项共同责任,我们已建立相关政策与实践规范,以支撑各类人工智能应用的研发。开发者在遵循本服务条款下载或使用本数据集时,应与内部模型团队协作,确保所研发的模型符合相关行业与应用场景的要求,并规避可能出现的产品误用问题。
请通过[此链接](https://www.nvidia.com/en-us/support/submit-security-vulnerability/)提交安全漏洞或英伟达人工智能相关问题反馈。
提供机构:
maas
创建时间:
2025-05-20



