Cosmos-Reason1-SFT-Dataset
收藏魔搭社区2025-12-04 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/nv-community/Cosmos-Reason1-SFT-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
## Dataset Description:
The data format is a pair of video and text annotations. We summarize the data and annotations in Table 4 (SFT), Table 5 (RL), and Table 6 (Benchmark) of the Cosmos-Reason1 paper. We release the annotations for embodied reasoning tasks for BridgeDatav2, RoboVQA, Agibot, HoloAssist, AV, and the videos for the RoboVQA and AV datasets. We additionally release the annotations and videos for the RoboFail dataset for benchmarks. By releasing the dataset, NVIDIA supports the development of open embodied reasoning models and provides benchmarks to evaluate the progress.
This dataset is ready for commercial/non-commercial use.
## Dataset Owner(s):
NVIDIA Corporation
## Dataset Creation Date:
2025/05/17
## License/Terms of Use:
The use of this dataset is governed by [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/deed.en). Additional Information: [Apache License 2.0](https://github.com/google-deepmind/robovqa/blob/main/LICENSE); [MIT](https://github.com/real-stanford/reflect/blob/main/LICENSE).
## Intended Usage:
This dataset is intended to demonstrate and facilitate understanding and usage of the Cosmos-Reason1 models. It should primarily be used for educational and demonstration purposes.
## Dataset Characterization
The embodied reasoning datasets and benchmarks focus on the following areas: robotics (RoboVQA, BridgeDataV2, Agibot, RobFail), ego-centric human demonstration (HoloAssist), and Autonomous Vehicle (AV) driving video data.
**The AV data is currently unavailable and will be uploaded soon!**
**Data Collection Method**:
* RoboVQA: Hybrid: Automatic/Sensors
* BridgeDataV2: Automatic/Sensors
* AgiBot: Automatic/Sensors
* RoboFail: Automatic/Sensors
* HoloAssist: Human
* AV: Automatic/Sensors
**Labeling Method**:
* RoboVQA: Hybrid: Human,Automated
* BridgeDataV2: Hybrid: Human,Automated
* AgiBot: Hybrid: Human,Automated
* RoboFail: Hybrid: Human,Automated
* HoloAssist: Hybrid: Human,Automated
* AV: Hybrid: Human,Automated
## Dataset Format
* Modality: Video (mp4) and Text
## Dataset Quantification
We release the embodied reasoning data and benchmarks. Each data sample is a pair of video and text. The text annotations include understanding and reasoning annotations described in the Cosmos-Reason1 paper. Each video may have multiple text annotations. The quantity of the video and text pairs is described in the table below.
| Dataset | SFT Data | RL Data | Benchmark Data |
|--------------|---------:|--------:|---------------:|
| [RoboVQA](https://robovqa.github.io/) | 1.14m | 252 | 110 |
| AV | 24.7k | 200 | 100 |
| [BridgeDataV2](https://rail-berkeley.github.io/bridgedata/) | 258k | 240 | 100 |
| [Agibot](https://github.com/OpenDriveLab/AgiBot-World) | 38.9k | 200 | 100 |
| [HoloAssist](https://holoassist.github.io/) | 273k | 200 | 100 |
| [RoboFail](https://robot-reflect.github.io/) | N/A | N/A | 100 |
| **Total Storage Size** | **300.6GB** | **2.6GB** | **1.5GB** | |
We release text annotations for all embodied reasoning datasets and videos for RoboVQA and AV datasets. For other datasets, users may download the source videos from the original data source and find corresponding video sources via the video names. The held-out RoboFail benchmark is released for measuring the generalization capability.
## Reference(s):
[[2503.15558] Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning](https://arxiv.org/abs/2503.15558)
## Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
本数据集的数据格式为视频与文本标注对。我们将数据集与标注信息汇总于《Cosmos-Reason1》论文的表4(监督微调(Supervised Fine-Tuning,SFT))、表5(强化学习(Reinforcement Learning,RL))以及表6(基准测试集(Benchmark))中。本次发布的内容涵盖BridgeDataV2、RoboVQA、Agibot、HoloAssist、AV的具身推理(Embodied Reasoning)任务标注,以及RoboVQA与AV数据集的视频文件。此外,我们还发布了用于基准测试的RoboFail数据集的标注与视频。本数据集的发布旨在助力开源具身推理模型的研发,并提供可用于评估模型进展的基准测试集。
本数据集可用于商业与非商业用途。
## 数据集所有者
英伟达公司(NVIDIA Corporation)
## 数据集创建日期
2025年5月17日
## 使用许可与条款
本数据集的使用受[CC-BY-4.0协议](https://creativecommons.org/licenses/by/4.0/deed.en)约束。补充许可信息:[Apache License 2.0](https://github.com/google-deepmind/robovqa/blob/main/LICENSE)、[MIT协议](https://github.com/real-stanford/reflect/blob/main/LICENSE)。
## 预期用途
本数据集旨在演示并帮助理解与使用Cosmos-Reason1模型,主要用于教学与演示场景。
## 数据集特征
本次发布的具身推理数据集与基准测试集涵盖以下方向:机器人学(RoboVQA、BridgeDataV2、Agibot、RoboFail)、第一视角(ego-centric)人类演示(HoloAssist)以及自动驾驶(Autonomous Vehicle,AV)驾驶视频数据。**注:AV数据目前暂未上线,将于近期上传!**
### 数据采集方式
* RoboVQA:混合模式:自动采集/传感器采集
* BridgeDataV2:自动采集/传感器采集
* Agibot:自动采集/传感器采集
* RoboFail:自动采集/传感器采集
* HoloAssist:人工采集
* AV:自动采集/传感器采集
### 标注方式
* RoboVQA:混合模式:人工标注+自动标注
* BridgeDataV2:混合模式:人工标注+自动标注
* Agibot:混合模式:人工标注+自动标注
* RoboFail:混合模式:人工标注+自动标注
* HoloAssist:混合模式:人工标注+自动标注
* AV:混合模式:人工标注+自动标注
## 数据集格式
* 模态:视频(mp4格式)与文本
## 数据集量化情况
本次发布的具身推理数据与基准测试集均以视频-文本对为基本样本单元。文本标注涵盖《Cosmos-Reason1》论文中提及的理解类与推理类标注。每个视频可对应多条文本标注。各数据集的视频-文本对数量汇总于下表。
| 数据集 | 监督微调数据 | 强化学习数据 | 基准测试集数据 |
|--------------|-------------:|------------:|---------------:|
| [RoboVQA](https://robovqa.github.io/) | 114万 | 252 | 110 |
| AV | 2.47万 | 200 | 100 |
| [BridgeDataV2](https://rail-berkeley.github.io/bridgedata/) | 25.8万 | 240 | 100 |
| [Agibot](https://github.com/OpenDriveLab/AgiBot-World) | 3.89万 | 200 | 100 |
| [HoloAssist](https://holoassist.github.io/) | 27.3万 | 200 | 100 |
| [RoboFail](https://robot-reflect.github.io/) | 不适用(N/A) | 不适用(N/A) | 100 |
| **总存储容量** | **300.6GB** | **2.6GB** | **1.5GB** |
我们已发布所有具身推理数据集的文本标注,以及RoboVQA与AV数据集的视频文件。对于其余数据集,用户可从原始数据源下载源视频,并通过视频文件名匹配对应素材。本次发布的留出式RoboFail基准测试集用于评估模型的泛化能力。
## 参考文献
[[2503.15558] Cosmos-Reason1:从物理常识到具身推理](https://arxiv.org/abs/2503.15558)
## 伦理考量
英伟达认为,可信人工智能(Trustworthy AI)是一项共同责任,我们已制定相关政策与实践规范,以支持各类人工智能应用的研发。开发者在依照服务条款下载或使用本数据集时,应与内部模型团队协作,确保模型符合相关行业与应用场景的要求,并防范潜在的产品误用风险。
如需报告安全漏洞或英伟达人工智能相关问题,请访问[此链接](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
提供机构:
maas
创建时间:
2025-05-20



