five

ego-1k

收藏
魔搭社区2026-05-21 更新2026-05-17 收录
下载链接:
https://modelscope.cn/datasets/facebook/ego-1k
下载链接
链接失效反馈
官方服务:
资源简介:
# [Ego-1K — A Large-Scale Multiview Video Dataset for Egocentric Vision](https://arxiv.org/abs/2603.13741) Jae Yong Lee, Daniel Scharstein, Akash Bapat, Hao Hu, Andrew Fu, Haoru Zhao, Paul Sammut, Xiang Li, Stephen Jeapes, Anik Gupta, Lior David, Saketh Madhuvarasu, Jay Girish Joshi, and Jason Wither [arxiv.org/abs/2603.13741](https://arxiv.org/abs/2603.13741); to appear in CVPR 2026 We present Ego-1K, a large-scale collection of time-synchronized egocentric multiview videos designed to advance neural 3D video synthesis and dynamic scene understanding. The dataset contains 956 short (6.7-9.7s) egocentric videos taken with a custom rig with 12 synchronous cameras surrounding a VR headset worn by the user, for a total of 491K frames and 5.9M images. Scene content focuses on hand motions and hand-object interactions in different settings. Our dataset enables new ways to benchmark egocentric scene reconstruction methods, and presents unique challenges for existing 3D and 4D novel view synthesis methods due to high disparities and image motion caused by close dynamic objects and rig egomotion. ![Ego-1k-figure](figures/rig.png) ## License [FAIR Noncommercial Research License](https://huggingface.co/facebook/fair-noncommercial-research-license/blob/main/LICENSE) ## Getting Started See [`quickstart.ipynb`](quickstart.ipynb) for a runnable walkthrough that loads metadata, streams images, and visualizes a 12-camera multiview frame. ## Key Statistics | Property | Value | |----------|-------| | Total recordings | 956 | | Train split | 860 recordings | | Test split | 96 recordings | | Duration per recording | 6.7-9.7 seconds at 60 Hz | | Frames per recording | 404-583 (mean 514, median 530) | | Cameras | 12 (6 rectified stereo pairs) | | Image resolution | 1280 x 1280 pixels (rectified pinhole, 120 deg HFOV) | | Total frames | 490,966 | | Total images | 5,891,592 | | Total duration | 2.3 hours | | Total shards | 10,216 | | Total dataset size | ~18 TB (WebDataset tar shards) | | Typical shard size | ~1.5 GB | ![distributions](figures/distributions.png) ## Dataset Structure ```text ego-1k/ ├── data/ │ ├── train-<scene_id>.parquet # Per-scene metadata index │ └── test-<scene_id>.parquet └── shards/ ├── train/ │ └── <scene_id>/ │ ├── <scene_id>-0000.tar # WebDataset tar shards (~1.5 GB each) │ ├── <scene_id>-0001.tar │ └── ... └── test/ └── <scene_id>/ └── <scene_id>-0000.tar ``` ### Tar Shard Contents Each tar sample represents one frame across all 12 cameras: ```text <scene_id>/<frame_id:06d>.200-1.png # Raw PNG bytes (1280x1280) <scene_id>/<frame_id:06d>.200-2.png ... <scene_id>/<frame_id:06d>.200-12.png <scene_id>/<frame_id:06d>.metadata.json # Pose, rig calibration, scene info ``` The `metadata.json` per sample contains: | Field | Type | Description | |-------|------|-------------| | `scene_id` | string | Recording identifier | | `frame_id` | int | Frame index (0-indexed) | | `timestamp_ns` | int | Frame timestamp in nanoseconds | | `pose` | list | 4x4 device-to-world transform (key absent if unavailable) | | `rig_calibration` | object | Per-camera intrinsics (`K`) and extrinsics (`E`) | | `source` | string | Capture campaign: `OVD_M1` (lab, 513 recordings), `OVD_M2` (apartment, 414), `DD4` (29) | | `lux_bins` | string | Lighting level: `51-75`, `76-100`, `101-200`, `201-400`, `401-1000`, `1001+` | | `tags` | list | Scene diversity tags | ## Parquet Schema Each row represents a single frame (one timestamp across all 12 cameras): | Column | Type | Description | |--------|------|-------------| | `scene_id` | string | Recording identifier | | `frame_id` | int32 | Frame index within the recording (0-indexed; number of frames varies per scene, range 404-583) | | `timestamp_ns` | int64 | Frame timestamp in nanoseconds | | `source` | string | Capture campaign: `OVD_M1` (lab), `OVD_M2` (apartment), `DD4` | | `lux_bins` | string | Lighting level: `51-75`, `76-100`, `101-200`, `201-400`, `401-1000`, `1001+` | | `tags` | string | JSON list of scene diversity tags (85 unique tags covering garments, furnishings, lighting, pose, objects) | | `shard_name` | string | Relative path to the tar shard containing this frame's images (e.g., `shards/train/<scene_id>/<scene_id>-0002.tar`) | | `pose` | string | JSON: 4x4 device-to-world transform matrix for this frame (null if pose unavailable) | | `rig_calibration` | string | JSON: per-camera intrinsics (`K`: 3x3) and extrinsics (`E`: 4x4), static per scene (repeated for each frame for convenience) | ### Calibration Details The `rig_calibration` column contains a JSON object keyed by camera name (`200-1` through `200-12`), each with: - **`K`**: 3x3 intrinsic matrix (rectified pinhole projection, 120 deg horizontal FOV) - **`E`**: 4x4 extrinsic matrix (camera-to-device transform) The `pose` column contains the 4x4 device-to-world transform, which changes per frame as the headset moves. ## Usage `load_dataset` returns frame-level **metadata only** (poses, calibration, scene info). Images are stored in WebDataset tar shards — use the `webdataset` library to stream them. See [`quickstart.ipynb`](quickstart.ipynb) for a full working example. ### WebDataset (Recommended for Training) Stream tar shards for high-throughput sequential access — no per-file API calls. See the notebook for the full `decode_sample` implementation. To wrap it in a PyTorch DataLoader: ```python dataset = wds.WebDataset(shard_urls, nodesplitter=wds.split_by_node, shardshuffle=True).map(decode_sample) loader = torch.utils.data.DataLoader(dataset, batch_size=4, num_workers=4) for batch in loader: images = batch["images"] # (B, N_cams, 3, 1280, 1280) break ``` ### Parquet Metadata (Random Access) The Parquet files contain frame-level metadata only (poses, calibration, scene info) — images are stored in the tar shards. Use the `shard_name` column to locate which tar file contains a given frame's images. ```python shard_url = f"https://huggingface.co/datasets/facebook/ego-1k/resolve/main/{example['shard_name']}" ``` ## Citation ```bibtex @inproceedings{ego1k2026, title={{Ego-1K}: A Large-Scale Multiview Video Dataset for Egocentric Vision}, author={Jae Yong Lee and Daniel Scharstein and Akash Bapat and Hao Hu and Andrew Fu and Haoru Zhao and Paul Sammut and Xiang Li and Stephen Jeapes and Anik Gupta and Lior David and Saketh Madhuvarasu and Jay Girish Joshi and Jason Wither}, booktitle={CVPR}, year={2026} } ```
提供机构:
maas
创建时间:
2026-03-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作