dynpose-100k

Name: dynpose-100k
Creator: maas
Published: 2025-12-05 22:24:00
License: 暂无描述

魔搭社区2025-12-05 更新2025-04-26 收录

下载链接：

https://modelscope.cn/datasets/nv-community/dynpose-100k

下载链接

链接失效反馈

官方服务：

资源简介：

# DynPose-100K **[Dynamic Camera Poses and Where to Find Them](https://research.nvidia.com/labs/dir/dynpose-100k)** \ [Chris Rockwell1,2](https://crockwell.github.io), [Joseph Tung3](https://jot-jt.github.io/), [Tsung-Yi Lin1](https://tsungyilin.info/), [Ming-Yu Liu1](https://mingyuliu.net/), [David F. Fouhey3](https://cs.nyu.edu/~fouhey/), [Chen-Hsuan Lin1](https://chenhsuanlin.bitbucket.io/) \ 1NVIDIA 2University of Michigan 3New York University ## 🎉 Updates - **[2025.05]** We have released the Lightspeed benchmark, a new dataset with ground-truth camera poses for validating DynPose-100K's pose annotation method. See [download instructions](#lightspeed-benchmark-download) below. - **[2025.04]** We have made the initial release of DynPose-100K, a large-scale dataset of diverse, dynamic videos with camera annotations. See [download instructions](#dynpose-100k-download) below. [![Project Website](https://img.shields.io/static/v1?label=Project&message=Website&color=red)](https://research.nvidia.com/labs/dir/dynpose-100k) [![arXiv](https://img.shields.io/badge/arXiv-2504.17788-b31b1b.svg)](https://arxiv.org/abs/2504.17788) ![DynPose-100K Teaser](teaser.png) ## Overview DynPose-100K is a large-scale dataset of diverse, dynamic videos with camera annotations. We curate 100K videos containing dynamic content while ensuring cameras can be accurately estimated (including intrinsics and poses), addressing two key challenges: 1. Identifying videos suitable for camera estimation 2. Improving camera estimation algorithms for dynamic videos | Characteristic | Value | | --- | --- | | **Size** | 100K videos | | **Resolution** | 1280×720 (720p) | | **Annotation type** | Camera poses (world-to-camera), intrinsics | | **Format** | MP4 (videos), PKL (camera data), JPG (frames) | | **Frame rate** | 12 fps (extracted frames) | | **Storage** | ~200 GB (videos) + ~400 GB (frames) + 0.7 GB (annotations) | | **License** | NVIDIA License (for DynPose-100K) | ## DynPose-100K Download DynPose-100K contains diverse Internet videos annotated with state-of-the-art camera pose estimation. Videos were selected from 3.2M candidates through advanced filtering. ### 1. Camera annotation download (0.7 GB) ```bash git clone https://huggingface.co/datasets/nvidia/dynpose-100k cd dynpose-100k unzip dynpose_100k.zip export DYNPOSE_100K_ROOT=$(pwd)/dynpose_100k ``` ### 2. Video download (~200 GB for all videos at 720p) ```bash git clone https://github.com/snap-research/Panda-70M.git pip install -e Panda-70M/dataset_dataloading/video2dataset ``` - For experiments we use (1280, 720) video resolution rather than the default (640, 360). To download at this resolution (optional), modify [download size](https://github.com/snap-research/Panda-70M/blob/main/dataset_dataloading/video2dataset/video2dataset/configs/panda70m.yaml#L5) to 720 ```bash video2dataset --url_list="${DYNPOSE_100K_ROOT}/metadata.csv" --output_folder="${DYNPOSE_100K_ROOT}/video" \ --url_col="url" --caption_col="caption" --clip_col="timestamp" \ --save_additional_columns="[matching_score,desirable_filtering,shot_boundary_detection]" \ --config="video2dataset/video2dataset/configs/panda70m.yaml" ``` ### 3. Video frame extraction (~400 GB for 12 fps over all videos at 720p) ```bash python scripts/extract_frames.py --input_video_dir ${DYNPOSE_100K_ROOT}/video \ --output_frame_parent ${DYNPOSE_100K_ROOT}/frames-12fps \ --url_list ${DYNPOSE_100K_ROOT}/metadata.csv \ --uid_mapping ${DYNPOSE_100K_ROOT}/uid_mapping.csv ``` ### 4. Camera pose visualization Create a conda environment if you haven't done so: ```bash conda env create -f environment.yml conda activate dynpose-100k ``` Run the below under the `dynpose-100k` environment: ```bash python scripts/visualize_pose.py --dset dynpose_100k --dset_parent ${DYNPOSE_100K_ROOT} ``` ### Dataset structure ``` dynpose_100k ├── cameras | ├── 00011ee6-cbc1-4ec4-be6f-292bfa698fc6.pkl {uid} | ├── poses {camera poses (all frames) ([N',3,4])} | ├── intrinsics {camera intrinsic matrix ([3,3])} | ├── frame_idxs {corresponding frame indices ([N']), values within [0,N-1]} | ├── mean_reproj_error {average reprojection error from SfM ([N'])} | ├── num_points {number of reprojected points ([N'])} | ├── num_frames {number of video frames N (scalar)} | # where N' is number of registered frames | ├── 00031466-5496-46fa-a992-77772a118b17.pkl | ├── poses # camera poses (all frames) ([N',3,4]) | └── ... | └── ... ├── video | ├── 00011ee6-cbc1-4ec4-be6f-292bfa698fc6.mp4 {uid} | ├── 00031466-5496-46fa-a992-77772a118b17.mp4 | └── ... ├── frames-12fps | ├── 00011ee6-cbc1-4ec4-be6f-292bfa698fc6 {uid} | ├── 00001.jpg {frame id} | ├── 00002.jpg | └── ... | ├── 00031466-5496-46fa-a992-77772a118b17 | ├── 00001.jpg | └── ... | └── ... ├── metadata.csv {used to download video & extract frames} | ├── uid | ├── 00031466-5496-46fa-a992-77772a118b17 | └── ... ├── uid_mapping.csv {used to download video & extract frames} | ├── videoID,url,timestamp,caption,matching_score,desirable_filtering,shot_boundary_detection | ├── --106WvnIhc,https://www.youtube.com/watch?v=--106WvnIhc,"[['0:13:34.029', '0:13:40.035']]",['A man is swimming in a pool with an inflatable mattress.'],[0.44287109375],['desirable'],"[[['0:00:00.000', '0:00:05.989']]]" | └── ... ├── viz_list.txt {used as index for pose visualization} | ├── 004cd3b5-8af4-4613-97a0-c51363d80c31 {uid} | ├── 0c3e06ae-0d0e-4c41-999a-058b4ea6a831 | └── ... ``` ## Lightspeed Benchmark Download Lightspeed is a challenging, photorealistic benchmark for dynamic pose estimation with **ground-truth** camera poses. It is used to validate DynPose-100K's pose annotation method. Original video clips can be found here: https://www.youtube.com/watch?v=AsykNkUMoNU&t=1s ### 1. Downloading cameras, videos and frames (8.1 GB) ```bash git clone https://huggingface.co/datasets/nvidia/dynpose-100k cd dynpose-100k unzip lightspeed.zip export LIGHTSPEED_PARENT=$(pwd)/lightspeed ``` ### 2. Dataset structure ``` lightspeed ├── poses.pkl | ├── 0120_LOFT {id_setting} | ├── poses {camera poses (all frames) ([N,3,4])} | # where N is number of frames | ├── 0180_DUST | ├── poses {camera poses (all frames) ([N,3,4])} | └── ... ├── video | ├── 0120_LOFT.mp4 {id_setting} | ├── 0180_DUST.mp4 | └── ... ├── frames-24fps | ├── 0120_LOFT/images {id_setting} | ├── 00000.png {frame id} | ├── 00001.png | └── ... | ├── 0180_DUST/images | ├── 00000.png | └── ... | └── ... ├── viz_list.txt {used as index for pose visualization} | ├── 0120_LOFT.mp4 {id_setting} | ├── 0180_DUST.mp4 | └── ... ``` ### 3. Camera pose visualization Create a conda environment if you haven't done so: ```bash conda env create -f environment.yml conda activate dynpose-100k ``` Run the below under the `dynpose-100k` environment: ```bash python scripts/visualize_pose.py --dset lightspeed --dset_parent ${LIGHTSPEED_PARENT} ``` ## FAQ **Q: What coordinate system do the camera poses use?** A: Camera poses are world-to-camera and follow OpenCV "RDF" convention (same as COLMAP): X axis points to the right, the Y axis to the bottom, and the Z axis to the front as seen from the image. **Q: How do I map between frame indices and camera poses?** A: The `frame_idxs` field in each camera PKL file contains the corresponding frame indices for the poses. **Q: How can I contribute to this dataset?** A: Please contact the authors for collaboration opportunities. ## Citation If you find this dataset useful in your research, please cite our paper: ```bibtex @inproceedings{rockwell2025dynpose, author = {Rockwell, Chris and Tung, Joseph and Lin, Tsung-Yi and Liu, Ming-Yu and Fouhey, David F. and Lin, Chen-Hsuan}, title = {Dynamic Camera Poses and Where to Find Them}, booktitle = {CVPR}, year = 2025 } ``` ## Acknowledgements We thank Gabriele Leone and the NVIDIA Lightspeed Content Tech team for sharing the original 3D assets and scene data for creating the Lightspeed benchmark. We thank Yunhao Ge, Zekun Hao, Yin Cui, Xiaohui Zeng, Zhaoshuo Li, Hanzi Mao, Jiahui Huang, Justin Johnson, JJ Park and Andrew Owens for invaluable inspirations, discussions and feedback on this project.

# DynPose-100K **[动态相机位姿及其获取方法](https://research.nvidia.com/labs/dir/dynpose-100k)** [克里斯·罗克韦尔1,2](https://crockwell.github.io), [约瑟夫·滕3](https://jot-jt.github.io/), [林宗毅1](https://tsungyilin.info/), [刘明宇1](https://mingyuliu.net/), [大卫·F·福伊3](https://cs.nyu.edu/~fouhey/), [林辰轩1](https://chenhsuanlin.bitbucket.io/) 1英伟达 2密歇根大学 3纽约大学 ## 🎉 更新 - **[2025.05]** 我们发布了Lightspeed基准测试集，这是一个带有真实标注（ground-truth）相机位姿的全新数据集，用于验证DynPose-100K的位姿标注方法。详见下文的[下载说明](#lightspeed-benchmark-download)。 - **[2025.04]** 我们首次发布了DynPose-100K，这是一个包含多样动态视频与相机标注的大规模数据集。详见下文的[下载说明](#dynpose-100k-download)。 [![项目主页](https://img.shields.io/static/v1?label=Project&message=Website&color=red)](https://research.nvidia.com/labs/dir/dynpose-100k) [![arXiv](https://img.shields.io/badge/arXiv-2504.17788-b31b1b.svg)](https://arxiv.org/abs/2504.17788) ![DynPose-100K 示意图](teaser.png) ## 概述 DynPose-100K是一个包含多样动态视频与相机标注的大规模数据集。我们从动态内容中精选了10万条视频，并确保可精准估算相机参数（包括内参与位姿），该数据集解决了两大核心挑战： 1. 筛选适用于相机参数估算的视频 2. 优化针对动态视频的相机参数估算算法 | 数据集属性 | 参数值 | | --- | --- | | **规模** | 10万条视频 | | **分辨率** | 1280×720（720p） | | **标注类型** | 相机位姿（世界坐标系到相机坐标系变换）、内参 | | **文件格式** | MP4（视频）、PKL（相机数据）、JPG（帧图像） | | **帧率** | 12 fps（提取的帧图像） | | **存储占用** | 约200 GB（视频） + 约400 GB（帧图像） + 0.7 GB（标注数据） | | **授权协议** | 英伟达授权协议（DynPose-100K） | ## DynPose-100K 下载 DynPose-100K包含经先进相机位姿估算模型标注的多样互联网视频，其视频来源为320万条候选视频，经过严格筛选得到。 ### 1. 相机标注数据下载（0.7 GB） bash git clone https://huggingface.co/datasets/nvidia/dynpose-100k cd dynpose-100k unzip dynpose_100k.zip export DYNPOSE_100K_ROOT=$(pwd)/dynpose_100k ### 2. 视频下载（720p全量视频约200 GB） bash git clone https://github.com/snap-research/Panda-70M.git pip install -e Panda-70M/dataset_dataloading/video2dataset - 实验中我们采用(1280, 720)分辨率而非默认的(640, 360)。若需按该分辨率下载（可选），请将[下载配置](https://github.com/snap-research/Panda-70M/blob/main/dataset_dataloading/video2dataset/video2dataset/configs/panda70m.yaml#L5)中的尺寸修改为720 bash video2dataset --url_list="${DYNPOSE_100K_ROOT}/metadata.csv" --output_folder="${DYNPOSE_100K_ROOT}/video" --url_col="url" --caption_col="caption" --clip_col="timestamp" --save_additional_columns="[matching_score,desirable_filtering,shot_boundary_detection]" --config="video2dataset/video2dataset/configs/panda70m.yaml" ### 3. 视频帧提取（720p全量视频按12 fps提取，约400 GB） bash python scripts/extract_frames.py --input_video_dir ${DYNPOSE_100K_ROOT}/video --output_frame_parent ${DYNPOSE_100K_ROOT}/frames-12fps --url_list ${DYNPOSE_100K_ROOT}/metadata.csv --uid_mapping ${DYNPOSE_100K_ROOT}/uid_mapping.csv ### 4. 相机位姿可视化若尚未创建conda环境，请先执行以下命令创建： bash conda env create -f environment.yml conda activate dynpose-100k 在`dynpose-100k`环境中运行以下命令： bash python scripts/visualize_pose.py --dset dynpose_100k --dset_parent ${DYNPOSE_100K_ROOT} ### 数据集目录结构 dynpose_100k ├── cameras | ├── 00011ee6-cbc1-4ec4-be6f-292bfa698fc6.pkl {唯一标识符uid} | ├── poses {所有帧的相机位姿 ([N',3,4])} | ├── intrinsics {相机内参矩阵 ([3,3])} | ├── frame_idxs {对应帧索引 ([N']), 取值范围为[0,N-1]} | ├── mean_reproj_error {基于运动恢复结构（Structure from Motion，SfM）的平均重投影误差 ([N'])} | ├── num_points {每个位姿对应的重投影点数量 ([N'])} | ├── num_frames {视频总帧数N（标量）} | # 其中N'为已配准的帧数 | ├── 00031466-5496-46fa-a992-77772a118b17.pkl | ├── poses # 所有帧的相机位姿 ([N',3,4]) | └── ... | └── ... ├── video | ├── 00011ee6-cbc1-4ec4-be6f-292bfa698fc6.mp4 {唯一标识符uid} | ├── 00031466-5496-46fa-a992-77772a118b17.mp4 | └── ... ├── frames-12fps | ├── 00011ee6-cbc1-4ec4-be6f-292bfa698fc6 {唯一标识符uid} | ├── 00001.jpg {帧ID} | ├── 00002.jpg | └── ... | ├── 00031466-5496-46fa-a992-77772a118b17 | ├── 00001.jpg | └── ... | └── ... ├── metadata.csv {用于下载视频与提取帧的元数据文件} | ├── uid | ├── 00031466-5496-46fa-a992-77772a118b17 | └── ... ├── uid_mapping.csv {用于下载视频与提取帧的映射文件} | ├── videoID,url,timestamp,caption,matching_score,desirable_filtering,shot_boundary_detection | ├── --106WvnIhc,https://www.youtube.com/watch?v=--106WvnIhc,"[['0:13:34.029', '0:13:40.035']]",["A man is swimming in a pool with an inflatable mattress."],[0.44287109375],["desirable"]],"[[['0:00:00.000', '0:00:05.989']]]" | └── ... ├── viz_list.txt {用于位姿可视化的索引文件} | ├── 004cd3b5-8af4-4613-97a0-c51363d80c31 {唯一标识符uid} | ├── 0c3e06ae-0d0e-4c41-999a-058b4ea6a831 | └── ... ## Lightspeed 基准测试集下载 Lightspeed是一个用于动态相机位姿估算的高难度、照片级真实感基准测试集，带有真实标注（ground-truth）的相机位姿，用于验证DynPose-100K的位姿标注方法。原始视频片段可在此处获取：https://www.youtube.com/watch?v=AsykNkUMoNU&t=1s ### 1. 下载相机、视频与帧数据（8.1 GB） bash git clone https://huggingface.co/datasets/nvidia/dynpose-100k cd dynpose-100k unzip lightspeed.zip export LIGHTSPEED_PARENT=$(pwd)/lightspeed ### 2. 数据集目录结构 lightspeed ├── poses.pkl | ├── 0120_LOFT {场景设置ID} | ├── poses {所有帧的相机位姿 ([N,3,4])} | # 其中N为总帧数 | ├── 0180_DUST | ├── poses {所有帧的相机位姿 ([N,3,4])} | └── ... ├── video | ├── 0120_LOFT.mp4 {场景设置ID} | ├── 0180_DUST.mp4 | └── ... ├── frames-24fps | ├── 0120_LOFT/images {场景设置ID} | ├── 00000.png {帧ID} | ├── 00001.png | └── ... | ├── 0180_DUST/images | ├── 00000.png | └── ... | └── ... ├── viz_list.txt {用于位姿可视化的索引文件} | ├── 0120_LOFT.mp4 {场景设置ID} | ├── 0180_DUST.mp4 | └── ... ### 3. 相机位姿可视化若尚未创建conda环境，请先执行以下命令创建： bash conda env create -f environment.yml conda activate dynpose-100k 在`dynpose-100k`环境中运行以下命令： bash python scripts/visualize_pose.py --dset lightspeed --dset_parent ${LIGHTSPEED_PARENT} ## 常见问题（FAQ） **Q: 相机位姿采用何种坐标系？** **A: 相机位姿为世界坐标系到相机坐标系的变换，遵循OpenCV的「RDF」约定（与COLMAP一致）：从图像视角出发，X轴指向右侧，Y轴指向下方，Z轴指向前方。** **Q: 如何将帧索引与相机位姿进行映射？** **A: 每个相机PKL文件中的`frame_idxs`字段包含了位姿对应的帧索引。** **Q: 如何为该数据集贡献内容？** **A: 请联系作者获取合作机会。** ## 引用格式若您在研究中使用该数据集，请引用我们的论文： bibtex @inproceedings{rockwell2025dynpose, author = {Rockwell, Chris and Tung, Joseph and Lin, Tsung-Yi and Liu, Ming-Yu and Fouhey, David F. and Lin, Chen-Hsuan}, title = {Dynamic Camera Poses and Where to Find Them}, booktitle = {CVPR}, year = 2025 } ## 致谢我们感谢Gabriele Leone与英伟达Lightspeed内容技术团队分享原始3D资产与场景数据，用于构建Lightspeed基准测试集。我们感谢Yunhao Ge、Zekun Hao、Yin Cui、Xiaohui Zeng、Zhaoshuo Li、Hanzi Mao、Jiahui Huang、Justin Johnson、JJ Park与Andrew Owens为本项目提供的宝贵灵感、讨论与反馈。

提供机构：

maas

创建时间：

2025-04-25

搜集汇总

数据集介绍