Egocentric-10K
收藏魔搭社区2026-05-24 更新2025-11-22 收录
下载链接:
https://modelscope.cn/datasets/builddotai/Egocentric-10K
下载链接
链接失效反馈官方服务:
资源简介:

Egocentric-10K is the largest egocentric dataset. It is the first dataset collected exclusively in real factories.
<video width="100%" autoplay loop muted playsinline style="border-radius: 8px;">
<source src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/lvA-v9UG-Xs77rd4JImJl.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
Egocentric-10K is state-of-the-art in hand visibility and active manipulation density compared to previous in-the-wild egocentric datasets. The complete 30,000 frame evaluation set is available at [Egocentric-10K-Evaluation](https://huggingface.co/datasets/builddotai/Egocentric-10K-Evaluation).

## Dataset Statistics
| Attribute | Value |
|-----------|-------|
| **Total Hours** | 10,000 |
| **Total Frames** | 1.08 billion |
| **Video Clips** | 192,900 |
| **Median Clip Length** | 180.0 seconds |
| **Mean Hours per Worker** | 4.68 |
| **Storage Size** | 16.4 TB |
| **Format** | H.265/MP4 |
| **Resolution** | 1080p (1920x1080) |
| **Frame Rate** | 30 fps |
| **Field of View** | 128° horizontal, 67° vertical |
| **Camera Type** | Monocular head-mounted |
| **Audio** | No |
| **Device** | Build AI Gen 1 |
## Camera Intrinsics
Each worker folder contains an `intrinsics.json` file with calibrated camera parameters.
The intrinsics use the **OpenCV fisheye model** (Kannala-Brandt equidistant projection) with 4 distortion coefficients (k1-k4). All values are calibrated for the 1920x1080 resolution.
Example `intrinsics.json`:
```json
{
"model": "fisheye",
"image_width": 1920,
"image_height": 1080,
"fx": 1030.59,
"fy": 1032.82,
"cx": 966.69,
"cy": 539.69,
"k1": -0.1166,
"k2": -0.0236,
"k3": 0.0694,
"k4": -0.0463
}
```
## Dataset Structure
Egocentric-10K is structured in **[WebDataset format](https://huggingface.co/docs/hub/en/datasets-webdataset)**.
```
builddotai/Egocentric-10K/
├── factory_001/
│ └── workers/
│ ├── worker_001/
│ │ ├── intrinsics.json # Camera intrinsics for this worker
│ │ ├── factory001_worker001_part00.tar # Shard 0 (≤1GB)
│ │ └── factory001_worker001_part01.tar # Shard 1 (if needed)
│ ├── worker_002/
│ │ ├── intrinsics.json
│ │ └── factory001_worker002_part00.tar
│ └── worker_011/
│ ├── intrinsics.json
│ └── factory001_worker011_part00.tar
│
├── factory_002/
│ └── workers/
│ ├── worker_001/
│ │ ├── intrinsics.json
│ │ └── factory002_worker001_part00.tar
│ └── ...
│
├── factory_003/
│ └── workers/
│ └── ...
│
└── ... (factories 001-085)
```
Each TAR file contains pairs of video and metadata files:
```
factory001_worker001_part00.tar
├── factory001_worker001_00001.mp4 # Video 1
├── factory001_worker001_00001.json # Metadata for video 1
├── factory001_worker001_00002.mp4 # Video 2
├── factory001_worker001_00002.json # Metadata for video 2
├── factory001_worker001_00003.mp4 # Video 3
├── factory001_worker001_00003.json # Metadata for video 3
└── ... # Additional video/metadata pairs
```
Each JSON metadata file has the following fields:
```json
{
"factory_id": "factory_002", // Unique identifier for the factory location
"worker_id": "worker_002", // Unique identifier for the worker within factory
"video_index": 0, // Sequential index for videos from this worker
"duration_sec": 1200.0, // Video duration in seconds
"width": 1920, // Video width in pixels
"height": 1080, // Video height in pixels
"fps": 30.0, // Frames per second
"size_bytes": 599697350, // File size in bytes
"codec": "h265" // Video codec
}
```
### Loading the Dataset
```python
from datasets import load_dataset, Features, Value
# Define features
features = Features({
'mp4': Value('binary'),
'json': {
'factory_id': Value('string'),
'worker_id': Value('string'),
'video_index': Value('int64'),
'duration_sec': Value('float64'),
'width': Value('int64'),
'height': Value('int64'),
'fps': Value('float64'),
'size_bytes': Value('int64'),
'codec': Value('string')
},
'__key__': Value('string'),
'__url__': Value('string')
})
# Load entire dataset
dataset = load_dataset(
"builddotai/Egocentric-10K",
streaming=True,
features=features
)
# Load specific factories
dataset = load_dataset(
"builddotai/Egocentric-10K",
data_files=["factory_001/**/*.tar", "factory_002/**/*.tar"],
streaming=True,
features=features
)
# Load specific workers
dataset = load_dataset(
"builddotai/Egocentric-10K",
data_files=[
"factory_001/workers/worker_001/*.tar",
"factory_001/workers/worker_002/*.tar"
],
streaming=True,
features=features
)
```
### Loading Intrinsics
```python
from huggingface_hub import hf_hub_download
import json
# Download intrinsics for a specific worker
intrinsics_path = hf_hub_download(
repo_id="builddotai/Egocentric-10K",
filename="factory_001/workers/worker_001/intrinsics.json",
repo_type="dataset"
)
with open(intrinsics_path) as f:
intrinsics = json.load(f)
```
## License
Licensed under the Apache 2.0 License.
## Citation
```
@dataset{buildaiegocentric10k2025,
author = {Build AI},
title = {Egocentric-10k},
year = {2025},
publisher = {Hugging Face Datasets},
url = {https://huggingface.co/datasets/builddotai/Egocentric-10K}
}
```

Egocentric-10K是目前规模最大的第一人称视角数据集(egocentric dataset),也是首个完全在真实工厂环境中采集的数据集。
<video width="100%" autoplay loop muted playsinline style="border-radius: 8px;">
<source src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/lvA-v9UG-Xs77rd4JImJl.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
相较于此前的野外第一人称视角数据集,Egocentric-10K在手部可见度与主动操作密度方面处于当前最优水平。完整的30000帧评估集可在[Egocentric-10K-Evaluation](https://huggingface.co/datasets/builddotai/Egocentric-10K-Evaluation)获取。

## 数据集统计信息
| 属性 | 数值 |
|-----------|-------|
| **总时长** | 10,000 小时 |
| **总帧数** | 10.8 亿 |
| **视频片段数** | 192,900 |
| **片段中位时长** | 180.0 秒 |
| **参与工人数量** | 2,138 |
| **单工人平均作业时长** | 4.68 小时 |
| **存储总容量** | 16.4 TB |
| **编码格式** | H.265/MP4 |
| **分辨率** | 1080p(1920×1080) |
| **帧率** | 30 fps |
| **视场角** | 水平128°,垂直67° |
| **相机类型** | 单目头戴式相机 |
| **音频支持** | 无 |
| **采集设备** | Build AI Gen 1 |
## 相机内参
每个工人对应的文件夹中均包含一个`intrinsics.json`文件,存储已校准的相机参数。该内参采用**OpenCV鱼眼模型**(Kannala-Brandt等距投影),包含4个畸变系数(k1~k4),所有参数均针对1920×1080分辨率完成校准。
示例`intrinsics.json`文件如下:
json
{
"model": "fisheye",
"image_width": 1920,
"image_height": 1080,
"fx": 1030.59,
"fy": 1032.82,
"cx": 966.69,
"cy": 539.69,
"k1": -0.1166,
"k2": -0.0236,
"k3": 0.0694,
"k4": -0.0463
}
## 数据集结构
Egocentric-10K采用**WebDataset格式**(https://huggingface.co/docs/hub/en/datasets-webdataset)进行组织。
builddotai/Egocentric-10K/
├── factory_001/
│ └── workers/
│ ├── worker_001/
│ │ ├── intrinsics.json # 该工人的相机内参文件
│ │ ├── factory001_worker001_part00.tar # 数据分片0(≤1GB)
│ │ └── factory001_worker001_part01.tar # 数据分片1(按需生成)
│ ├── worker_002/
│ │ ├── intrinsics.json
│ │ └── factory001_worker002_part00.tar
│ └── worker_011/
│ ├── intrinsics.json
│ └── factory001_worker011_part00.tar
│
├── factory_002/
│ └── workers/
│ ├── worker_001/
│ │ ├── intrinsics.json
│ │ └── factory002_worker001_part00.tar
│ └── ...
│
├── factory_003/
│ └── workers/
│ └── ...
│
└── ...(共001至085号工厂)
每个TAR数据分片包含视频与元数据文件对,示例结构如下:
factory001_worker001_part00.tar
├── factory001_worker001_00001.mp4 # 视频片段1
├── factory001_worker001_00001.json # 视频片段1的元数据
├── factory001_worker001_00002.mp4 # 视频片段2
├── factory001_worker001_00002.json # 视频片段2的元数据
├── factory001_worker001_00003.mp4 # 视频片段3
├── factory001_worker001_00003.json # 视频片段3的元数据
└── ... # 更多视频与元数据对
每个JSON元数据文件包含以下字段:
json
{
"factory_id": "factory_002", // 工厂位置唯一标识符
"worker_id": "worker_002", // 工厂内工人唯一标识符
"video_index": 0, // 该工人产出视频的连续索引
"duration_sec": 1200.0, // 视频时长,单位:秒
"width": 1920, // 视频宽度,单位:像素
"height": 1080, // 视频高度,单位:像素
"fps": 30.0, // 帧率,单位:帧每秒
"size_bytes": 599697350, // 文件大小,单位:字节
"codec": "h265" // 视频编码格式
}
### 数据集加载方法
python
from datasets import load_dataset, Features, Value
# 定义数据特征
features = Features({
'mp4': Value('binary'),
'json': {
'factory_id': Value('string'),
'worker_id': Value('string'),
'video_index': Value('int64'),
'duration_sec': Value('float64'),
'width': Value('int64'),
'height': Value('int64'),
'fps': Value('float64'),
'size_bytes': Value('int64'),
'codec': Value('string')
},
'__key__': Value('string'),
'__url__': Value('string')
})
# 加载完整数据集
dataset = load_dataset(
"builddotai/Egocentric-10K",
streaming=True,
features=features
)
# 加载指定工厂的数据集
dataset = load_dataset(
"builddotai/Egocentric-10K",
data_files=["factory_001/**/*.tar", "factory_002/**/*.tar"],
streaming=True,
features=features
)
# 加载指定工人的数据集
dataset = load_dataset(
"builddotai/Egocentric-10K",
data_files=[
"factory_001/workers/worker_001/*.tar",
"factory_001/workers/worker_002/*.tar"
],
streaming=True,
features=features
)
### 内参文件加载方法
python
from huggingface_hub import hf_hub_download
import json
# 下载指定工人的相机内参文件
intrinsics_path = hf_hub_download(
repo_id="builddotai/Egocentric-10K",
filename="factory_001/workers/worker_001/intrinsics.json",
repo_type="dataset"
)
with open(intrinsics_path) as f:
intrinsics = json.load(f)
## 许可证
本数据集采用Apache 2.0许可证开源。
## 引用格式
@dataset{buildaiegocentric10k2025,
author = {Build AI},
title = {Egocentric-10k},
year = {2025},
publisher = {Hugging Face Datasets},
url = {https://huggingface.co/datasets/builddotai/Egocentric-10K}
}
提供机构:
maas
创建时间:
2025-11-11
搜集汇总
数据集介绍

背景与挑战
背景概述
Egocentric-10K是目前最大的第一视角数据集,专门在真实工厂环境中采集,具有10,000小时的总时长和1.08亿帧的高质量视频数据。该数据集在手部可见性和主动操作密度方面表现优异,适用于计算机视觉和机器人技术的研究。
以上内容由遇见数据集搜集并总结生成



