下载链接：

https://modelscope.cn/datasets/behavior-1k/2025-challenge-demos

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset was created using [LeRobot](https://github.com/huggingface/lerobot). ## Dataset Description - **Homepage:** [More Information Needed] - **Paper:** [More Information Needed] - **License:** mit ## Dataset Structure [meta/info.json](meta/info.json): ```json { "codebase_version": "v2.1", "robot_type": "R1Pro", "total_episodes": 10000, "total_frames": 119094660, "total_tasks": 50, "total_videos": 90000, "chunks_size": 10000, "fps": 30, "splits": { "train": "0:10000" }, "data_path": "data/task-{episode_chunk:04d}/episode_{episode_index:08d}.parquet", "video_path": "videos/task-{episode_chunk:04d}/{video_key}/episode_{episode_index:08d}.mp4", "metainfo_path": "meta/episodes/task-{episode_chunk:04d}/episode_{episode_index:08d}.json", "annotation_path": "annotations/task-{episode_chunk:04d}/episode_{episode_index:08d}.json", "features": { "observation.images.rgb.left_wrist": { "dtype": "video", "shape": [ 480, 480, 3 ], "names": [ "height", "width", "rgb" ], "info": { "video.fps": 30.0, "video.height": 480, "video.width": 480, "video.channels": 3, "video.codec": "libx265", "video.pix_fmt": "yuv420p", "video.is_depth_map": false, "has_audio": false } }, "observation.images.rgb.right_wrist": { "dtype": "video", "shape": [ 480, 480, 3 ], "names": [ "height", "width", "rgb" ], "info": { "video.fps": 30.0, "video.height": 480, "video.width": 480, "video.channels": 3, "video.codec": "libx265", "video.pix_fmt": "yuv420p", "video.is_depth_map": false, "has_audio": false } }, "observation.images.rgb.head": { "dtype": "video", "shape": [ 720, 720, 3 ], "names": [ "height", "width", "rgb" ], "info": { "video.fps": 30.0, "video.height": 720, "video.width": 720, "video.channels": 3, "video.codec": "libx265", "video.pix_fmt": "yuv420p", "video.is_depth_map": false, "has_audio": false } }, "observation.images.depth.left_wrist": { "dtype": "video", "shape": [ 480, 480, 3 ], "names": [ "height", "width", "depth" ], "info": { "video.fps": 30.0, "video.height": 480, "video.width": 480, "video.channels": 3, "video.codec": "libx265", "video.pix_fmt": "yuv420p16le", "video.is_depth_map": true, "has_audio": false } }, "observation.images.depth.right_wrist": { "dtype": "video", "shape": [ 480, 480, 3 ], "names": [ "height", "width", "depth" ], "info": { "video.fps": 30.0, "video.height": 480, "video.width": 480, "video.channels": 3, "video.codec": "libx265", "video.pix_fmt": "yuv420p16le", "video.is_depth_map": true, "has_audio": false } }, "observation.images.depth.head": { "dtype": "video", "shape": [ 720, 720, 3 ], "names": [ "height", "width", "depth" ], "info": { "video.fps": 30.0, "video.height": 720, "video.width": 720, "video.channels": 3, "video.codec": "libx265", "video.pix_fmt": "yuv420p16le", "video.is_depth_map": true, "has_audio": false } }, "observation.images.seg_instance_id.left_wrist": { "dtype": "video", "shape": [ 480, 480, 3 ], "names": [ "height", "width", "rgb" ], "info": { "video.fps": 30.0, "video.height": 480, "video.width": 480, "video.channels": 3, "video.codec": "libx265", "video.pix_fmt": "yuv420p", "video.is_depth_map": false, "has_audio": false } }, "observation.images.seg_instance_id.right_wrist": { "dtype": "video", "shape": [ 480, 480, 3 ], "names": [ "height", "width", "rgb" ], "info": { "video.fps": 30.0, "video.height": 480, "video.width": 480, "video.channels": 3, "video.codec": "libx265", "video.pix_fmt": "yuv420p", "video.is_depth_map": false, "has_audio": false } }, "observation.images.seg_instance_id.head": { "dtype": "video", "shape": [ 720, 720, 3 ], "names": [ "height", "width", "rgb" ], "info": { "video.fps": 30.0, "video.height": 720, "video.width": 720, "video.channels": 3, "video.codec": "libx265", "video.pix_fmt": "yuv420p", "video.is_depth_map": false, "has_audio": false } }, "action": { "dtype": "float32", "shape": [ 23 ], "names": null }, "timestamp": { "dtype": "float64", "shape": [ 1 ], "names": null }, "episode_index": { "dtype": "int64", "shape": [ 1 ], "names": null }, "index": { "dtype": "int64", "shape": [ 1 ], "names": null }, "observation.cam_rel_poses": { "dtype": "float32", "shape": [ 21 ], "names": null }, "observation.state": { "dtype": "float32", "shape": [ 256 ], "names": null }, "observation.task_info": { "dtype": "float32", "shape": [ null ], "names": null } } } ``` ## Citation **BibTeX:** ```bibtex @article{li2024behavior, title={Behavior-1k: A human-centered, embodied ai benchmark with 1,000 everyday activities and realistic simulation}, author={Li, Chengshu and Zhang, Ruohan and Wong, Josiah and Gokmen, Cem and Srivastava, Sanjana and Mart{'i}n-Mart{'i}n, Roberto and Wang, Chen and Levine, Gabrael and Ai, Wensi and Martinez, Benjamin and Yin, Hang and Lingelbach, Michael and Hwang, Minjune and Hiranaka, Ayano and Garlanka, Sujay and Aydin, Arman and Lee, Sharon and Sun, Jiankai and Anvari, Mona and Sharma, Manasi and Bansal, Dhruva and Hunter, Samuel and Kim, Kyu-Young and Lou, Alan and Matthews, Caleb R. and Villa-Renteria, Ivan and Tang, Jerry Huayang and Tang, Claire and Xia, Fei and Li, Yunzhu and Savarese, Silvio and Gweon, Hyowon and Liu, C. Karen and Wu, Jiajun and Fei-Fei, Li}, journal={arXiv preprint arXiv:2403.09227}, year={2024} } ```

本数据集基于[LeRobot](https://github.com/huggingface/lerobot)构建。 ## 数据集描述 - **主页**：[更多信息待补充] - **论文**：[更多信息待补充] - **许可证**：MIT ## 数据集结构本数据集的元数据文件为`meta/info.json`，其内容如下： json { "codebase_version": "v2.1", "robot_type": "R1Pro", "total_episodes": 10000, "total_frames": 119094660, "total_tasks": 50, "total_videos": 90000, "chunks_size": 10000, "fps": 30, "splits": { "train": "0:10000" }, "data_path": "data/task-{episode_chunk:04d}/episode_{episode_index:08d}.parquet", "video_path": "videos/task-{episode_chunk:04d}/{video_key}/episode_{episode_index:08d}.mp4", "metainfo_path": "meta/episodes/task-{episode_chunk:04d}/episode_{episode_index:08d}.json", "annotation_path": "annotations/task-{episode_chunk:04d}/episode_{episode_index:08d}.json", "features": { "observation.images.rgb.left_wrist": { "dtype": "video", "shape": [ 480, 480, 3 ], "names": [ "height", "width", "rgb" ], "info": { "video.fps": 30.0, "video.height": 480, "video.width": 480, "video.channels": 3, "video.codec": "libx265", "video.pix_fmt": "yuv420p", "video.is_depth_map": false, "has_audio": false } }, "observation.images.rgb.right_wrist": { "dtype": "video", "shape": [ 480, 480, 3 ], "names": [ "height", "width", "rgb" ], "info": { "video.fps": 30.0, "video.height": 480, "video.width": 480, "video.channels": 3, "video.codec": "libx265", "video.pix_fmt": "yuv420p", "video.is_depth_map": false, "has_audio": false } }, "observation.images.rgb.head": { "dtype": "video", "shape": [ 720, 720, 3 ], "names": [ "height", "width", "rgb" ], "info": { "video.fps": 30.0, "video.height": 720, "video.width": 720, "video.channels": 3, "video.codec": "libx265", "video.pix_fmt": "yuv420p", "video.is_depth_map": false, "has_audio": false } }, "observation.images.depth.left_wrist": { "dtype": "video", "shape": [ 480, 480, 3 ], "names": [ "height", "width", "depth" ], "info": { "video.fps": 30.0, "video.height": 480, "video.width": 480, "video.channels": 3, "video.codec": "libx265", "video.pix_fmt": "yuv420p16le", "video.is_depth_map": true, "has_audio": false } }, "observation.images.depth.right_wrist": { "dtype": "video", "shape": [ 480, 480, 3 ], "names": [ "height", "width", "depth" ], "info": { "video.fps": 30.0, "video.height": 480, "video.width": 480, "video.channels": 3, "video.codec": "libx265", "video.pix_fmt": "yuv420p16le", "video.is_depth_map": true, "has_audio": false } }, "observation.images.depth.head": { "dtype": "video", "shape": [ 720, 720, 3 ], "names": [ "height", "width", "depth" ], "info": { "video.fps": 30.0, "video.height": 720, "video.width": 720, "video.channels": 3, "video.codec": "libx265", "video.pix_fmt": "yuv420p16le", "video.is_depth_map": true, "has_audio": false } }, "observation.images.seg_instance_id.left_wrist": { "dtype": "video", "shape": [ 480, 480, 3 ], "names": [ "height", "width", "rgb" ], "info": { "video.fps": 30.0, "video.height": 480, "video.width": 480, "video.channels": 3, "video.codec": "libx265", "video.pix_fmt": "yuv420p", "video.is_depth_map": false, "has_audio": false } }, "observation.images.seg_instance_id.right_wrist": { "dtype": "video", "shape": [ 480, 480, 3 ], "names": [ "height", "width", "rgb" ], "info": { "video.fps": 30.0, "video.height": 480, "video.width": 480, "video.channels": 3, "video.codec": "libx265", "video.pix_fmt": "yuv420p", "video.is_depth_map": false, "has_audio": false } }, "observation.images.seg_instance_id.head": { "dtype": "video", "shape": [ 720, 720, 3 ], "names": [ "height", "width", "rgb" ], "info": { "video.fps": 30.0, "video.height": 720, "video.width": 720, "video.channels": 3, "video.codec": "libx265", "video.pix_fmt": "yuv420p", "video.is_depth_map": false, "has_audio": false } }, "action": { "dtype": "float32", "shape": [ 23 ], "names": null }, "timestamp": { "dtype": "float64", "shape": [ 1 ], "names": null }, "episode_index": { "dtype": "int64", "shape": [ 1 ], "names": null }, "index": { "dtype": "int64", "shape": [ 1 ], "names": null }, "observation.cam_rel_poses": { "dtype": "float32", "shape": [ 21 ], "names": null }, "observation.state": { "dtype": "float32", "shape": [ 256 ], "names": null }, "observation.task_info": { "dtype": "float32", "shape": [ null ], "names": null } } } 其中各字段的中文含义说明如下： - `codebase_version`：代码库版本，值为v2.1 - `robot_type`：机器人型号，值为R1Pro - `total_episodes`：总任务片段数，共10000个 - `total_frames`：总图像帧数，共计119094660帧 - `total_tasks`：涵盖任务总数，共50项 - `total_videos`：总视频文件数，共90000个 - `chunks_size`：数据分块大小，为10000 - `fps`：采样帧率，为30 - `splits`：数据集划分规则，其中训练集对应编号0至10000的片段 - `data_path`：数据文件路径模板 - `video_path`：视频文件路径模板 - `metainfo_path`：片段元信息文件路径模板 - `annotation_path`：标注文件路径模板 - `features`：数据集特征集合，各特征详情如下： 1. `observation.images.rgb.left_wrist`：左腕相机RGB图像观测，数据类型为视频，张量形状为[480, 480, 3]，维度分别对应高度、宽度与RGB通道，视频帧率30.0，分辨率480×480，通道数3，编码格式libx265，像素格式yuv420p，非深度图，无音频 2. `observation.images.rgb.right_wrist`：右腕相机RGB图像观测，参数同上 3. `observation.images.rgb.head`：头部相机RGB图像观测，张量形状为[720, 720, 3]，分辨率720×720，其余参数同上 4. `observation.images.depth.left_wrist`：左腕相机深度图观测，数据类型为视频，张量形状[480, 480, 3]，维度对应高度、宽度与深度，编码格式libx265，像素格式yuv420p16le，为深度图，无音频 5. `observation.images.depth.right_wrist`：右腕相机深度图观测，参数同上 6. `observation.images.depth.head`：头部相机深度图观测，张量形状[720, 720, 3]，分辨率720×720，其余参数同上 7. `observation.images.seg_instance_id.left_wrist`：左腕相机实例分割ID图像观测，参数与左腕RGB图像一致 8. `observation.images.seg_instance_id.right_wrist`：右腕相机实例分割ID图像观测，参数与右腕RGB图像一致 9. `observation.images.seg_instance_id.head`：头部相机实例分割ID图像观测，参数与头部RGB图像一致 10. `action`：机器人动作数据，数据类型为32位浮点型，张量形状为[23]，无维度命名 11. `timestamp`：时间戳数据，数据类型为64位浮点型，张量形状为[1]，无维度命名 12. `episode_index`：片段索引，数据类型为64位整型，张量形状为[1]，无维度命名 13. `index`：数据索引，数据类型为64位整型，张量形状为[1]，无维度命名 14. `observation.cam_rel_poses`：相机相对位姿观测，数据类型为32位浮点型，张量形状为[21]，无维度命名 15. `observation.state`：机器人状态观测，数据类型为32位浮点型，张量形状为[256]，无维度命名 16. `observation.task_info`：任务信息观测，数据类型为32位浮点型，张量形状可变，无维度命名 ## 引用 **BibTeX格式：** bibtex @article{li2024behavior, title={Behavior-1k: A human-centered, embodied ai benchmark with 1,000 everyday activities and realistic simulation}, author={Li, Chengshu and Zhang, Ruohan and Wong, Josiah and Gokmen, Cem and Srivastava, Sanjana and Mart{'i}n-Mart{'i}n, Roberto and Wang, Chen and Levine, Gabrael and Ai, Wensi and Martinez, Benjamin and Yin, Hang and Lingelbach, Michael and Hwang, Minjune and Hiranaka, Ayano and Garlanka, Sujay and Aydin, Arman and Lee, Sharon and Sun, Jiankai and Anvari, Mona and Sharma, Manasi and Bansal, Dhruva and Hunter, Samuel and Kim, Kyu-Young and Lou, Alan and Matthews, Caleb R. and Villa-Renteria, Ivan and Tang, Jerry Huayang and Tang, Claire and Xia, Fei and Li, Yunzhu and Savarese, Silvio and Gweon, Hyowon and Liu, C. Karen and Wu, Jiajun and Fei-Fei, Li}, journal={arXiv preprint arXiv:2403.09227}, year={2024} } 该论文标题的中文译名为：《Behavior-1k：以人为中心的具身人工智能基准数据集，涵盖1000项日常活动与真实仿真环境》

应用场景：