面向示教视频理解的综合性数据集

Name: 面向示教视频理解的综合性数据集
Creator: 中国科学院计算技术研究所
License: 暂无描述

国家基础学科公共科学数据中心2026-02-21 收录

下载链接：

https://nbsdc.cn/general/dataDetail?id=69949090195d2627ec69a154&type=1

下载链接

链接失效反馈

官方服务：

资源简介：

本数据集面向机器人示教视频理解任务构建，基于大规模机器人操作任务室内场景操控数据集 DROID 筛选加工得到（原始DROID数据集的大小为8.7TB），总大小35GB，主要包含以下两部分： 1. 示教视频多视角关键帧图像：从 DROID 中筛选得到约 49,935 段独立示教视频，对每段示教视频从不同视角均匀采样裁剪为最多 16 帧关键帧，用于视频理解/动作计划推断等任务。 2. 文本与结构化标注：对齐每段示教视频的人类指令，并基于关键帧参考生成简要动作序列标注（“plan”和“code”两类信息），以 JSON形式提供。

This dataset is constructed for robot teaching video understanding tasks, and is screened and processed based on the large-scale indoor scene manipulation dataset DROID for robotic manipulation tasks (the original DROID dataset has a size of 8.7 TB). The total size of this dataset is 35 GB, which mainly includes the following two parts: 1. Multi-view keyframe images of teaching videos: Approximately 49,935 independent teaching video clips are screened from DROID. For each teaching video clip, up to 16 keyframes are uniformly sampled and cropped from different perspectives, which are suitable for tasks such as video understanding and action plan inference. 2. Text and structured annotations: Human instructions corresponding to each teaching video clip are aligned, and concise action sequence annotations with two types of information, "plan" and "code", are generated based on the keyframe references. All annotations are provided in JSON format.

提供机构：

中国科学院计算技术研究所

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是为机器人示教视频理解任务构建的综合性资源，基于大规模DROID数据集筛选加工而成，总大小为35GB。它包含从约49,935段示教视频中提取的多视角关键帧图像，以及对齐的文本指令和动作序列标注，以支持视频理解和动作计划推断等应用。

以上内容由遇见数据集搜集并总结生成