five

TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset

收藏
Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - video-classification - object-detection language: - en tags: - robotics - egocentric-vision - pov - action-segmentation - VLM - fine-grained-annotations pretty_name: 'Train Them AI: Fine-Grained POV Egocentric Activity Dataset' size_categories: - n<1K configs: - config_name: default data_files: - split: train path: data/annotations.jsonl dataset_info: features: - name: video_id dtype: string - name: video_filename dtype: string - name: main_activity dtype: string - name: action_index dtype: int32 - name: start_seconds dtype: float32 - name: end_seconds dtype: float32 - name: duration_seconds dtype: float32 - name: action_text dtype: string - name: objects dtype: string splits: - name: train num_examples: 178 --- # Train Them AI: Fine-Grained Egocentric Activity Dataset (v1.0) This dataset provides high-fidelity egocentric (POV) video sequences and meticulous manual annotations designed for training Vision-Language Models (VLM), spatial reasoning systems, and service robotics. ## Dataset Description Unlike general-purpose datasets, **Train Them AI** focuses on **annotation density**. We provide extremely granular temporal segmentation for complex, multi-step household tasks, capturing detailed object-hand interactions and state transitions. - **Perspective:** 100% First-Person View (POV). - **Annotations:** Dual-layered (AI-generated base + Rigorous expert manual review for logical consistency). - **Density:** High-frequency labels with an average action duration of < 5 seconds. - **178** total annotated actions across 4 videos - **~18 min** of annotated footage - **~4.2 sec** average action segment duration - **100+** unique objects labeled ## Use Cases - **Robotics training data** — fine-grained hand-object interaction sequences for manipulation tasks - **VLM fine-tuning** — paired video + natural language action descriptions - **Activity recognition** — temporal segmentation benchmarks for household tasks - **Custom dataset orders** — contact us to commission annotated footage for your specific domain ## Videos Raw video files are available in the [Files tab](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/tree/main) for direct viewing and annotation fact-checking. | Video | Direct Link | | :--- | :--- | | Folding & Storing Pants | [folding_storing_pants.mp4](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/blob/main/folding_storing_pants.mp4) | | Loading Washing Machine | [loading_washing_machine.mp4](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/blob/main/loading_washing_machine.mp4) | | Cleaning Cutlery Drawer | [cleaning_cutlery_drawer.MOV](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/blob/main/cleaning_cutlery_drawer.MOV) | | Wiping Wet Dishes | [wiping_wet_dishes.mp4](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/blob/main/wiping_wet_dishes.mp4) | ## Content | Video | Main Activity | Key Labels | | :--- | :--- | :--- | | folding_storing_pants.mp4 | Textile Maintenance | Fabric manipulation, drawer organization. | | loading_washing_machine.mp4 | Appliance Interaction | Clothing types, detergent dosing, mechanical dials. | | cleaning_cutlery_drawer.MOV | Kitchen Organization | Tool sorting, surface wiping, liquid dispensing. | | wiping_wet_dishes.mp4 | Post-Prep Cleanup | Utensil assembly, precision drying maneuvers. | ## Data Structure Annotations are in `data/annotations.jsonl` — one row per action: | Field | Type | Description | | :--- | :--- | :--- | | `video_id` | string | Unique identifier for the video | | `video_filename` | string | Original video filename | | `main_activity` | string | High-level description of the full activity | | `action_index` | int | Sequential index of the action within the video | | `start_seconds` | float | Action start timestamp in seconds | | `end_seconds` | float | Action end timestamp in seconds | | `duration_seconds` | float | Duration of the action in seconds | | `action_text` | string | Natural language description of the sub-action | | `objects` | string | Comma-separated list of objects interacted with | ## About Train Them AI At **Train Them AI**, we are building the data foundations for the next generation of robotic spatial intelligence. We specialize in creating high-quality, structured datasets that bridge the gap between raw egocentric video and actionable robotic execution logic. For technical feedback or collaboration inquiries, reach out to diego.pousa@trainthemai.com [trainthemai.com](https://trainthemai.com)

许可证:MIT 任务类别: - 视频分类 - 目标检测 语言: - 英语 标签: - 机器人学 - 自我中心视觉(egocentric vision) - POV(第一人称视角) - 动作分割 - 视觉语言模型(VLM, Vision-Language Model) - 细粒度标注 漂亮名称:《Train Them AI:细粒度第一人称视角自我中心活动数据集》 规模类别: - 样本数少于1000 配置项: - 配置名称:default 数据文件: - 拆分:训练集 路径:data/annotations.jsonl 数据集信息: 特征: - 名称:video_id 数据类型:字符串 - 名称:video_filename 数据类型:字符串 - 名称:main_activity 数据类型:字符串 - 名称:action_index 数据类型:int32 - 名称:start_seconds 数据类型:float32 - 名称:end_seconds 数据类型:float32 - 名称:duration_seconds 数据类型:float32 - 名称:action_text 数据类型:字符串 - 名称:objects 数据类型:字符串 拆分: - 名称:train 样本数:178 # Train Them AI:细粒度自我中心活动数据集(v1.0) 本数据集提供高保真自我中心(POV,第一人称视角)视频序列与精细人工标注,专为训练视觉语言模型(VLM)、空间推理系统与服务机器人打造。 ## 数据集说明 与通用数据集不同,**Train Them AI** 聚焦**标注密度**。我们针对复杂多步骤家务任务提供极致精细的时间分割标注,捕捉细致的手物交互与状态变化过程。 - **视角**:100%第一人称视角(POV) - **标注体系**:双层标注架构(AI生成基础标注 + 专家严格人工审核以保障逻辑一致性) - **标注密度**:高频标注,平均动作时长小于5秒 - **总规模**:4段视频共178个标注动作 - **标注总时长**:约18分钟 - **平均动作片段时长**:约4.2秒 - **标注独特物体数量**:100+ ## 应用场景 - **机器人训练数据**:面向操控任务的细粒度手物交互序列 - **视觉语言模型微调**:配对的视频与自然语言动作描述 - **活动识别**:面向家务任务的时间分割基准数据集 - **定制数据集服务**:联系我们为您的特定领域定制标注视频 ## 视频资源 原始视频文件可在[文件标签页](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/tree/main)直接查看与核对标注准确性。 | 视频文件名 | 直接下载链接 | | :--- | :--- | | 折叠收纳长裤 | [folding_storing_pants.mp4](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/blob/main/folding_storing_pants.mp4) | | 加载洗衣机 | [loading_washing_machine.mp4](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/blob/main/loading_washing_machine.mp4) | | 清洁餐具抽屉 | [cleaning_cutlery_drawer.MOV](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/blob/main/cleaning_cutlery_drawer.MOV) | | 擦干湿餐具 | [wiping_wet_dishes.mp4](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/blob/main/wiping_wet_dishes.mp4) | ## 视频内容详情 | 视频文件名 | 核心活动类型 | 关键标注标签 | | :--- | :--- | :--- | | folding_storing_pants.mp4 | 织物养护 | 织物操作、抽屉收纳整理 | | loading_washing_machine.mp4 | 家电交互 | 衣物分类、洗涤剂投放、机械旋钮操作 | | cleaning_cutlery_drawer.MOV | 厨房收纳 | 工具分类整理、表面擦拭、液体投放 | | wiping_wet_dishes.mp4 | 餐后清洁 | 餐具组装、精准擦干操作 | ## 数据结构 标注数据存储于`data/annotations.jsonl`,每行对应一个动作: | 字段名 | 数据类型 | 字段说明 | | :--- | :--- | :--- | | `video_id` | 字符串 | 视频唯一标识符 | | `video_filename` | 字符串 | 原始视频文件名 | | `main_activity` | 字符串 | 完整活动的高层级描述 | | `action_index` | 整数 | 该动作在视频内的序列索引 | | `start_seconds` | 浮点数 | 动作起始时间戳(单位:秒) | | `end_seconds` | 浮点数 | 动作结束时间戳(单位:秒) | | `duration_seconds` | 浮点数 | 动作持续时长(单位:秒) | | `action_text` | 字符串 | 子动作的自然语言描述 | | `objects` | 字符串 | 交互物体的逗号分隔列表 | ## 关于Train Them AI **Train Them AI** 致力于构建下一代机器人空间智能的数据基础。我们专注于创建高质量结构化数据集,架起原始自我中心视频与可落地机器人执行逻辑之间的桥梁。 如需技术反馈或合作咨询,请联系 diego.pousa@trainthemai.com [trainthemai.com](https://trainthemai.com)
提供机构:
TrainThemAI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作