SynCamVideo-Dataset
收藏魔搭社区2025-12-05 更新2025-11-22 收录
下载链接:
https://modelscope.cn/datasets/KwaiVGI/SynCamVideo-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
[Github](https://github.com/KwaiVGI/SynCamMaster)
[Project Page](https://jianhongbai.github.io/SynCamMaster/)
[Paper](https://arxiv.org/abs/2412.07760)
## 📷 Dataset: SynCamVideo Dataset
- __[2025.04.15]__: Release a new version of the SynCamVideo Dataset with improved quality and greater diversity.
- __[2025.04.15]__: Please also check our [MultiCamVideo](https://huggingface.co/datasets/KwaiVGI/MultiCamVideo-Dataset) Dataset.
### 1. Dataset Introduction
**TL;DR:** The SynCamVideo Dataset is a multi-camera synchronized video dataset rendered using Unreal Engine 5. It includes synchronized multi-camera videos and their corresponding camera poses. The SynCamVideo Dataset can be valuable in fields such as camera-controlled video generation, synchronized video production, and 3D/4D reconstruction. The camera is stationary in the SynCamVideo Dataset. If you require footage with moving cameras rather than stationary ones, please explore our [MultiCamVideo](https://huggingface.co/datasets/KwaiVGI/MultiCamVideo-Dataset) Dataset.
<div align="center">
<video controls autoplay style="width: 70%;" src="https://cdn-uploads.huggingface.co/production/uploads/6530bf50f145530101ec03a2/qEUQstpMa3-6UjbG_0ytq.mp4"></video>
</div>
The SynCamVideo Dataset is a multi-camera synchronized video dataset rendered using Unreal Engine 5. It includes synchronized multi-camera videos and their corresponding camera poses.
It consists of 3.4K different dynamic scenes, each captured by 10 cameras, resulting in a total of 34K videos. Each dynamic scene is composed of four elements: {3D environment, character, animation, camera}. Specifically, we use animation to drive the character
and position the animated character within the 3D environment. Then, Time-synchronized cameras are set up to render the multi-camera video data.
<p align="center">
<img src="https://github.com/user-attachments/assets/107c9607-e99b-4493-b715-3e194fcb3933" alt="Example Image" width="70%">
</p>
**3D Environment:** We collect 37 high-quality 3D environments assets from [Fab](https://www.fab.com). To minimize the domain gap between rendered data and real-world videos, we primarily select visually realistic 3D scenes, while choosing a few stylized or surreal 3D scenes as a supplement. To ensure data diversity, the selected scenes cover a variety of indoor and outdoor settings, such as city streets, shopping malls, cafes, office rooms, and the countryside.
**Character:** We collect 66 different human 3D models as characters from [Fab](https://www.fab.com) and [Mixamo](https://www.mixamo.com).
**Animation:** We collect 93 different animations from [Fab](https://www.fab.com) and [Mixamo](https://www.mixamo.com), including common actions such as waving, dancing, and cheering. We use these animations to drive the collected characters and create diverse datasets through various combinations.
**Camera:** To enhance the diversity of the dataset, each camera is randomly sampled on a hemispherical surface centered around the character.
### 2. Statistics and Configurations
Dataset Statistics:
| Number of Dynamic Scenes | Camera per Scene | Total Videos |
|:------------------------:|:----------------:|:------------:|
| 3400 | 10 | 34,000 |
Video Configurations:
| Resolution | Frame Number | FPS |
|:-----------:|:------------:|:------------------------:|
| 1280x1280 | 81 | 15 |
Note: You can use 'center crop' to adjust the video's aspect ratio to fit your video generation model, such as 16:9, 9:16, 4:3, or 3:4.
Camera Configurations:
| Focal Length | Aperture | Sensor Height | Sensor Width |
|:-----------------------:|:------------------:|:-------------:|:------------:|
| 24mm | 5.0 | 23.76mm | 23.76mm |
### 3. File Structure
```
SynCamVideo-Dataset
├── train
│ └── f24_aperture5
│ ├── scene1 # one dynamic scene
│ │ ├── videos
│ │ │ ├── cam01.mp4 # synchronized 81-frame videos at 1280x1280 resolution
│ │ │ ├── cam02.mp4
│ │ │ ├── ...
│ │ │ └── cam10.mp4
│ │ └── cameras
│ │ └── camera_extrinsics.json # 81-frame camera extrinsics of the 10 cameras
│ ├── ...
│ └── scene3400
└── val
└── basic
├── videos
│ ├── cam01.mp4 # example videos corresponding to the validation cameras
│ ├── cam02.mp4
│ ├── ...
│ └── cam10.mp4
└── cameras
└── camera_extrinsics.json # 10 cameras for validation
```
### 3. Useful scripts
- Data Extraction
```bash
tar -xzvf SynCamVideo-Dataset.tar.gz
```
- Camera Visualization
```python
python vis_cam.py
```
<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/6530bf50f145530101ec03a2/3WCWS0Axlnu5MyOBqMoVC.png" alt="Example Image" width="40%">
</p>
## Acknowledgments
We thank Jinwen Cao, Yisong Guo, Haowen Ji, Jichao Wang, and Yi Wang from Kuaishou Technology for their invaluable help in constructing the SynCamVideo-Dataset.
## 🌟 Citation
Please cite our paper if you find our dataset helpful.
```
@misc{bai2024syncammaster,
title={SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints},
author={Jianhong Bai and Menghan Xia and Xintao Wang and Ziyang Yuan and Xiao Fu and Zuozhu Liu and Haoji Hu and Pengfei Wan and Di Zhang},
year={2024},
eprint={2412.07760},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.07760},
}
```
## Contact
[Jianhong Bai](https://jianhongbai.github.io/)
[Github](https://github.com/KwaiVGI/SynCamMaster)
[项目主页](https://jianhongbai.github.io/SynCamMaster/)
[论文](https://arxiv.org/abs/2412.07760)
## 📷 数据集:SynCamVideo 数据集(SynCamVideo Dataset)
- __[2025.04.15]__: 发布SynCamVideo 数据集的新版本,画质优化且多样性进一步提升。
- __[2025.04.15]__: 请同时查阅我们的MultiCamVideo 数据集(MultiCamVideo Dataset)。
### 1. 数据集简介
**TL;DR:** SynCamVideo 数据集是基于虚幻引擎5(Unreal Engine 5)渲染的多相机同步视频数据集,包含同步多相机视频及其对应的相机位姿。该数据集可应用于相机可控视频生成、同步视频制作以及3D/4D重建等领域。本数据集中相机均为静止状态,若需要包含运动相机的镜头素材,请参考我们的MultiCamVideo 数据集(MultiCamVideo Dataset)。
<div align="center">
<video controls autoplay style="width: 70%;" src="https://cdn-uploads.huggingface.co/production/uploads/6530bf50f145530101ec03a2/qEUQstpMa3-6UjbG_0ytq.mp4"></video>
</div>
SynCamVideo 数据集是基于虚幻引擎5(Unreal Engine 5)渲染的多相机同步视频数据集,包含同步多相机视频及其对应的相机位姿。
该数据集包含3400个不同的动态场景,每个场景由10台相机采集,总计生成34000段视频。每个动态场景由四大要素构成:{3D环境、角色、动画、相机}。具体而言,我们通过动画驱动角色,并将该动画角色放置于3D环境中。随后,设置时间同步的多台相机以渲染多相机视频数据。
<p align="center">
<img src="https://github.com/user-attachments/assets/107c9607-e99b-4493-b715-3e194fcb3933" alt="示例图像" width="70%">
</p>
**3D环境:** 我们从[Fab](https://www.fab.com)平台收集了37个高质量3D环境资产。为缩小渲染数据与真实世界视频的域间隙,我们优先选择视觉效果写实的3D场景,同时辅以少量风格化或超现实3D场景作为补充。为保证数据多样性,所选场景覆盖多种室内外场景,包括城市街道、购物中心、咖啡馆、办公室以及乡村场景等。
**角色:** 我们从[Fab](https://www.fab.com)和[Mixamo](https://www.mixamo.com)平台收集了66个不同的人类3D模型作为角色。
**动画:** 我们从[Fab](https://www.fab.com)和[Mixamo](https://www.mixamo.com)平台收集了93种不同的动画,包括挥手、跳舞、欢呼等常见动作。我们使用这些动画驱动收集到的角色,并通过多种组合方式构建多样化数据集。
**相机:** 为增强数据集的多样性,每个相机均在以角色为中心的半球面上随机采样得到。
### 2. 统计信息与配置参数
数据集统计:
| 动态场景数量 | 单场景相机数 | 总视频数 |
|:------------------------:|:----------------:|:------------:|
| 3400 | 10 | 34,000 |
视频配置:
| 分辨率 | 帧数 | 帧率 |
|:-----------:|:------------:|:------------------------:|
| 1280x1280 | 81 | 15 |
注:您可通过**中心裁剪(center crop)**调整视频宽高比,以适配各类视频生成模型,例如16:9、9:16、4:3或3:4。
相机配置:
| 焦距 | 光圈 | 传感器高度 | 传感器宽度 |
|:-----------------------:|:------------------:|:-------------:|:------------:|
| 24mm | 5.0 | 23.76mm | 23.76mm |
### 3. 文件结构
SynCamVideo-Dataset
├── train
│ └── f24_aperture5
│ ├── scene1 # 单个动态场景
│ │ ├── videos
│ │ │ ├── cam01.mp4 # 分辨率1280x1280、含81帧的同步视频
│ │ │ ├── cam02.mp4
│ │ │ ├── ...
│ │ │ └── cam10.mp4
│ │ └── cameras
│ │ └── camera_extrinsics.json # 10台相机的81帧相机外参
│ ├── ...
│ └── scene3400
└── val
└── basic
├── videos
│ ├── cam01.mp4 # 验证相机对应的示例视频
│ ├── cam02.mp4
│ ├── ...
│ └── cam10.mp4
└── cameras
└── camera_extrinsics.json # 用于验证的10台相机
### 3. 实用脚本
- 数据解压
bash
tar -xzvf SynCamVideo-Dataset.tar.gz
- 相机可视化
python
python vis_cam.py
<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/6530bf50f145530101ec03a2/3WCWS0Axlnu5MyOBqMoVC.png" alt="示例图像" width="40%">
</p>
## 致谢
感谢来自快手科技(Kuaishou Technology)的曹晋文、郭义松、纪浩文、王吉超以及王毅在构建SynCamVideo 数据集过程中提供的宝贵帮助。
## 🌟 引用
若您认为本数据集对您的研究有所帮助,请引用我们的论文:
@misc{bai2024syncammaster,
title={SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints},
author={Jianhong Bai and Menghan Xia and Xintao Wang and Ziyang Yuan and Xiao Fu and Zuozhu Liu and Haoji Hu and Pengfei Wan and Di Zhang},
year={2024},
eprint={2412.07760},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.07760},
}
## 联系方式
[白建宏](https://jianhongbai.github.io/)
提供机构:
maas
创建时间:
2025-09-03



