TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset
收藏Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- video-classification
- object-detection
language:
- en
tags:
- robotics
- egocentric-vision
- pov
- action-segmentation
- VLM
- fine-grained-annotations
pretty_name: 'Train Them AI: Fine-Grained POV Egocentric Activity Dataset'
size_categories:
- n<1K
configs:
- config_name: default
data_files:
- split: train
path: data/annotations.jsonl
dataset_info:
features:
- name: video_id
dtype: string
- name: video_filename
dtype: string
- name: main_activity
dtype: string
- name: action_index
dtype: int32
- name: start_seconds
dtype: float32
- name: end_seconds
dtype: float32
- name: duration_seconds
dtype: float32
- name: action_text
dtype: string
- name: objects
dtype: string
splits:
- name: train
num_examples: 178
---
# Train Them AI: Fine-Grained Egocentric Activity Dataset (v1.0)
This dataset provides high-fidelity egocentric (POV) video sequences and meticulous manual annotations designed for training Vision-Language Models (VLM), spatial reasoning systems, and service robotics.
## Dataset Description
Unlike general-purpose datasets, **Train Them AI** focuses on **annotation density**. We provide extremely granular temporal segmentation for complex, multi-step household tasks, capturing detailed object-hand interactions and state transitions.
- **Perspective:** 100% First-Person View (POV).
- **Annotations:** Dual-layered (AI-generated base + Rigorous expert manual review for logical consistency).
- **Density:** High-frequency labels with an average action duration of < 5 seconds.
- **178** total annotated actions across 4 videos
- **~18 min** of annotated footage
- **~4.2 sec** average action segment duration
- **100+** unique objects labeled
## Use Cases
- **Robotics training data** — fine-grained hand-object interaction sequences for manipulation tasks
- **VLM fine-tuning** — paired video + natural language action descriptions
- **Activity recognition** — temporal segmentation benchmarks for household tasks
- **Custom dataset orders** — contact us to commission annotated footage for your specific domain
## Videos
Raw video files are available in the [Files tab](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/tree/main) for direct viewing and annotation fact-checking.
| Video | Direct Link |
| :--- | :--- |
| Folding & Storing Pants | [folding_storing_pants.mp4](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/blob/main/folding_storing_pants.mp4) |
| Loading Washing Machine | [loading_washing_machine.mp4](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/blob/main/loading_washing_machine.mp4) |
| Cleaning Cutlery Drawer | [cleaning_cutlery_drawer.MOV](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/blob/main/cleaning_cutlery_drawer.MOV) |
| Wiping Wet Dishes | [wiping_wet_dishes.mp4](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/blob/main/wiping_wet_dishes.mp4) |
## Content
| Video | Main Activity | Key Labels |
| :--- | :--- | :--- |
| folding_storing_pants.mp4 | Textile Maintenance | Fabric manipulation, drawer organization. |
| loading_washing_machine.mp4 | Appliance Interaction | Clothing types, detergent dosing, mechanical dials. |
| cleaning_cutlery_drawer.MOV | Kitchen Organization | Tool sorting, surface wiping, liquid dispensing. |
| wiping_wet_dishes.mp4 | Post-Prep Cleanup | Utensil assembly, precision drying maneuvers. |
## Data Structure
Annotations are in `data/annotations.jsonl` — one row per action:
| Field | Type | Description |
| :--- | :--- | :--- |
| `video_id` | string | Unique identifier for the video |
| `video_filename` | string | Original video filename |
| `main_activity` | string | High-level description of the full activity |
| `action_index` | int | Sequential index of the action within the video |
| `start_seconds` | float | Action start timestamp in seconds |
| `end_seconds` | float | Action end timestamp in seconds |
| `duration_seconds` | float | Duration of the action in seconds |
| `action_text` | string | Natural language description of the sub-action |
| `objects` | string | Comma-separated list of objects interacted with |
## About Train Them AI
At **Train Them AI**, we are building the data foundations for the next generation of robotic spatial intelligence. We specialize in creating high-quality, structured datasets that bridge the gap between raw egocentric video and actionable robotic execution logic.
For technical feedback or collaboration inquiries, reach out to diego.pousa@trainthemai.com
[trainthemai.com](https://trainthemai.com)
许可证:MIT
任务类别:
- 视频分类
- 目标检测
语言:
- 英语
标签:
- 机器人学
- 自我中心视觉(egocentric vision)
- POV(第一人称视角)
- 动作分割
- 视觉语言模型(VLM, Vision-Language Model)
- 细粒度标注
漂亮名称:《Train Them AI:细粒度第一人称视角自我中心活动数据集》
规模类别:
- 样本数少于1000
配置项:
- 配置名称:default
数据文件:
- 拆分:训练集
路径:data/annotations.jsonl
数据集信息:
特征:
- 名称:video_id
数据类型:字符串
- 名称:video_filename
数据类型:字符串
- 名称:main_activity
数据类型:字符串
- 名称:action_index
数据类型:int32
- 名称:start_seconds
数据类型:float32
- 名称:end_seconds
数据类型:float32
- 名称:duration_seconds
数据类型:float32
- 名称:action_text
数据类型:字符串
- 名称:objects
数据类型:字符串
拆分:
- 名称:train
样本数:178
# Train Them AI:细粒度自我中心活动数据集(v1.0)
本数据集提供高保真自我中心(POV,第一人称视角)视频序列与精细人工标注,专为训练视觉语言模型(VLM)、空间推理系统与服务机器人打造。
## 数据集说明
与通用数据集不同,**Train Them AI** 聚焦**标注密度**。我们针对复杂多步骤家务任务提供极致精细的时间分割标注,捕捉细致的手物交互与状态变化过程。
- **视角**:100%第一人称视角(POV)
- **标注体系**:双层标注架构(AI生成基础标注 + 专家严格人工审核以保障逻辑一致性)
- **标注密度**:高频标注,平均动作时长小于5秒
- **总规模**:4段视频共178个标注动作
- **标注总时长**:约18分钟
- **平均动作片段时长**:约4.2秒
- **标注独特物体数量**:100+
## 应用场景
- **机器人训练数据**:面向操控任务的细粒度手物交互序列
- **视觉语言模型微调**:配对的视频与自然语言动作描述
- **活动识别**:面向家务任务的时间分割基准数据集
- **定制数据集服务**:联系我们为您的特定领域定制标注视频
## 视频资源
原始视频文件可在[文件标签页](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/tree/main)直接查看与核对标注准确性。
| 视频文件名 | 直接下载链接 |
| :--- | :--- |
| 折叠收纳长裤 | [folding_storing_pants.mp4](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/blob/main/folding_storing_pants.mp4) |
| 加载洗衣机 | [loading_washing_machine.mp4](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/blob/main/loading_washing_machine.mp4) |
| 清洁餐具抽屉 | [cleaning_cutlery_drawer.MOV](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/blob/main/cleaning_cutlery_drawer.MOV) |
| 擦干湿餐具 | [wiping_wet_dishes.mp4](https://huggingface.co/datasets/TrainThemAI/Fine_Grained_POV_Egocentric_Activity_Dataset/blob/main/wiping_wet_dishes.mp4) |
## 视频内容详情
| 视频文件名 | 核心活动类型 | 关键标注标签 |
| :--- | :--- | :--- |
| folding_storing_pants.mp4 | 织物养护 | 织物操作、抽屉收纳整理 |
| loading_washing_machine.mp4 | 家电交互 | 衣物分类、洗涤剂投放、机械旋钮操作 |
| cleaning_cutlery_drawer.MOV | 厨房收纳 | 工具分类整理、表面擦拭、液体投放 |
| wiping_wet_dishes.mp4 | 餐后清洁 | 餐具组装、精准擦干操作 |
## 数据结构
标注数据存储于`data/annotations.jsonl`,每行对应一个动作:
| 字段名 | 数据类型 | 字段说明 |
| :--- | :--- | :--- |
| `video_id` | 字符串 | 视频唯一标识符 |
| `video_filename` | 字符串 | 原始视频文件名 |
| `main_activity` | 字符串 | 完整活动的高层级描述 |
| `action_index` | 整数 | 该动作在视频内的序列索引 |
| `start_seconds` | 浮点数 | 动作起始时间戳(单位:秒) |
| `end_seconds` | 浮点数 | 动作结束时间戳(单位:秒) |
| `duration_seconds` | 浮点数 | 动作持续时长(单位:秒) |
| `action_text` | 字符串 | 子动作的自然语言描述 |
| `objects` | 字符串 | 交互物体的逗号分隔列表 |
## 关于Train Them AI
**Train Them AI** 致力于构建下一代机器人空间智能的数据基础。我们专注于创建高质量结构化数据集,架起原始自我中心视频与可落地机器人执行逻辑之间的桥梁。
如需技术反馈或合作咨询,请联系 diego.pousa@trainthemai.com
[trainthemai.com](https://trainthemai.com)
提供机构:
TrainThemAI



