TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset

Name: TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset
Creator: TrainThemAI
Published: 2026-03-26 05:14:43
License: 暂无描述

Hugging Face2026-03-26 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - video-classification - object-detection language: - en tags: - robotics - egocentric-vision - pov - action-segmentation - VLM - fine-grained-annotations pretty_name: 'Train Them AI: Fine-Grained POV Egocentric Activity Dataset' size_categories: - n<1K configs: - config_name: default data_files: - split: train path: data/annotations.jsonl dataset_info: features: - name: video_id dtype: string - name: video_filename dtype: string - name: main_activity dtype: string - name: action_index dtype: int32 - name: start_seconds dtype: float32 - name: end_seconds dtype: float32 - name: duration_seconds dtype: float32 - name: action_text dtype: string - name: objects dtype: string splits: - name: train num_examples: 178 --- # Train Them AI: Fine-Grained Egocentric Activity Dataset (v1.0) This dataset provides high-fidelity egocentric (POV) video sequences and meticulous manual annotations designed for training Vision-Language Models (VLM), spatial reasoning systems, and service robotics. ## Dataset Description Unlike general-purpose datasets, **Train Them AI** focuses on **annotation density**. We provide extremely granular temporal segmentation for complex, multi-step household tasks, capturing detailed object-hand interactions and state transitions. - **Perspective:** 100% First-Person View (POV). - **Annotations:** Dual-layered (AI-generated base + Rigorous expert manual review for logical consistency). - **Density:** High-frequency labels with an average action duration of < 5 seconds. - **178** total annotated actions across 4 videos - **~18 min** of annotated footage - **~4.2 sec** average action segment duration - **100+** unique objects labeled ## Use Cases - **Robotics training data** — fine-grained hand-object interaction sequences for manipulation tasks - **VLM fine-tuning** — paired video + natural language action descriptions - **Activity recognition** — temporal segmentation benchmarks for household tasks - **Custom dataset orders** — contact us to commission annotated footage for your specific domain ## Videos Raw video files are available in the [Files tab](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/tree/main) for direct viewing and annotation fact-checking. | Video | Direct Link | | :--- | :--- | | Folding & Storing Pants | [folding_storing_pants.mp4](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/blob/main/folding_storing_pants.mp4) | | Loading Washing Machine | [loading_washing_machine.mp4](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/blob/main/loading_washing_machine.mp4) | | Cleaning Cutlery Drawer | [cleaning_cutlery_drawer.MOV](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/blob/main/cleaning_cutlery_drawer.MOV) | | Wiping Wet Dishes | [wiping_wet_dishes.mp4](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/blob/main/wiping_wet_dishes.mp4) | ## Content | Video | Main Activity | Key Labels | | :--- | :--- | :--- | | folding_storing_pants.mp4 | Textile Maintenance | Fabric manipulation, drawer organization. | | loading_washing_machine.mp4 | Appliance Interaction | Clothing types, detergent dosing, mechanical dials. | | cleaning_cutlery_drawer.MOV | Kitchen Organization | Tool sorting, surface wiping, liquid dispensing. | | wiping_wet_dishes.mp4 | Post-Prep Cleanup | Utensil assembly, precision drying maneuvers. | ## Data Structure Annotations are in `data/annotations.jsonl` — one row per action: | Field | Type | Description | | :--- | :--- | :--- | | `video_id` | string | Unique identifier for the video | | `video_filename` | string | Original video filename | | `main_activity` | string | High-level description of the full activity | | `action_index` | int | Sequential index of the action within the video | | `start_seconds` | float | Action start timestamp in seconds | | `end_seconds` | float | Action end timestamp in seconds | | `duration_seconds` | float | Duration of the action in seconds | | `action_text` | string | Natural language description of the sub-action | | `objects` | string | Comma-separated list of objects interacted with | ## About Train Them AI At **Train Them AI**, we are building the data foundations for the next generation of robotic spatial intelligence. We specialize in creating high-quality, structured datasets that bridge the gap between raw egocentric video and actionable robotic execution logic. For technical feedback or collaboration inquiries, reach out to diego.pousa@trainthemai.com [trainthemai.com](https://trainthemai.com)

提供机构：

TrainThemAI

5,000+

优质数据集

54 个

任务类型

进入经典数据集