CameraClone-Dataset

Name: CameraClone-Dataset
Creator: maas
Published: 2025-12-05 16:49:55
License: 暂无描述

魔搭社区2025-12-05 更新2025-09-13 收录

下载链接：

https://modelscope.cn/datasets/KwaiVGI/CameraClone-Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

* Paper：[https://arxiv.org/abs/2506.03140](https://arxiv.org/abs/2506.03140) * Project Page：[https://camclonemaster.github.io/](https://camclonemaster.github.io/) * Dataset：[https://huggingface.co/datasets/KwaiVGI/CameraClone-Dataset](https://huggingface.co/datasets/KwaiVGI/CameraClone-Dataset) * Training & Inference Code：[https://github.com/KwaiVGI/CamCloneMaster](https://github.com/KwaiVGI/CamCloneMaster) # Camera Clone Dataset ## 1. Dataset Introduction **TL;DR:** The Camera Clone Dataset, introduced in [CamCloneMaster](https://arxiv.org/pdf/2506.03140), is a large-scale synthetic dataset designed for camera clone learning, encompassing diverse scenes, subjects, and camera movements. It consists of triple video sets: a camera motion reference video \\(V_{cam}\\), a content reference video \\(V_{cont}\\), and a target video \\(V\\), which recaptures the scene in \\(V_{cont}\\) with the same camera movement as \\(V_{cam}\\). <div align="center"> <video controls autoplay style="width: 70%;" src="https://huggingface.co/datasets/KwaiVGI/CameraClone-Dataset/resolve/main/dataset.mp4"></video> </div> The Camera Clone Dataset is rendered using Unreal Engine 5. We collect 40 3D scenes as backgrounds, and we also collect 66 characters and put them into the 3D scenes as main subjects, each character is combined with one random animation, such as running and dancing. To construct the triple set, camera trajectories must satisfy two key requirements: 1) *Simultaneous Multi-View Capture*: Multiple cameras must film the same scene concurrently, each following a distinct trajectory. 2) *Paired Trajectories*: paired shots with the same camera trajectories across different locations. Our implementation strategy addresses these needs as follows: Within any single location, 10 synchronized cameras operate simultaneously, each following one of ten unique, pre-defined trajectories to capture diverse views. To create paired trajectories, we group 3D locations in scenes into sets of four, ensuring that the same ten camera trajectories are replicated across all locations within each set. The camera trajectories themselves are automatically generated using designed rules. These rules encompass various types, including basic movements, circular arcs, and more complex camera paths. In total, Camera Clone Dataset comprises 391K visually authentic videos shooting from 39.1K different locations in 40 scenes with 97.75K diverse camera trajectories, and 1,155K triple video sets are constructed based on these videos. Each video has a resolution of 576 x 1,008 and 77 frames. **3D Environment:** We collect 40 high-quality 3D environments assets from [Fab](https://www.fab.com). To minimize the domain gap between rendered data and real-world videos, we primarily select visually realistic 3D scenes, while choosing a few stylized or surreal 3D scenes as a supplement. To ensure data diversity, the selected scenes cover a variety of indoor and outdoor settings, such as city streets, shopping malls, cafes, office rooms, and the countryside. **Character:** We collect 66 different human 3D models as characters from [Fab](https://www.fab.com) and [Mixamo](https://www.mixamo.com). **Animation:** We collect 93 different animations from [Fab](https://www.fab.com) and [Mixamo](https://www.mixamo.com), including common actions such as waving, dancing, and cheering. We use these animations to drive the collected characters and create diverse datasets through various combinations. **Camera Trajectories:** To prevent clipping, trajectories are constrained by a maximum movement distance \\(d_{max}\\), determined by the initial shot position in the scene. The types of trajectories contain: * **Basic**: Simple pans/tilts (5°-75°), rolls (20°-340°), and translations along cardinal axes. * **Arc**: Orbital paths, combining a primary rotation (10°-75°) with smaller, secondary rotations (5°-15°). * **Random**: Smooth splines interpolated between 2-4 random keypoints. Half of these splines also incorporated with multi-axis rotations. ## 2. Statistics and Configurations Dataset Statistics: | Number of Dynamic Scenes | Camera per Scene | Total Videos | Number of Triple Sets | |:------------------------:|:----------------:|:------------:|:------------:| | 39,100 | 10 | 391,000 |1154,819 | Video Configurations: | Resolution | Frame Number | FPS | |:-----------:|:------------:|:------------------------:| | 1344x768 | 77 | 15 | | 1008x576 | 77 | 15 | Note: You can use 'center crop' to adjust the video's aspect ratio to fit your video generation model, such as 16:9, 9:16, 4:3, or 3:4. ## 3. File Structure ``` Camera-Clone-Dataset ├──data ├── 0316 │ └── traj_1_01 │ ├── scene1_01.mp4 │ ├── scene550_01.mp4 │ ├── scene935_01.mp4 │ └── scene1224_01.mp4 ├── 0317 ├── 0401 ├── 0402 ├── 0404 ├── 0407 └── 0410 ``` ## 4. Use Dataset ```bash sudo apt-get install git-lfs git lfs install git clone https://huggingface.co/datasets/KwaiVGI/CameraClone-Dataset cd CameraClone-Dataset cat CamCloneDataset.part* > CamCloneDataset.tar.gz tar --zstd -xvf CamCloneDataset.tar.gz ``` The "Triple Sets" information is located in the [CamCloneDataset.csv](https://huggingface.co/datasets/KwaiVGI/CameraClone-Dataset/blob/main/CamCloneDataset.csv) file, which contains the following columns: * video_path: The path to the target video. * caption: A description of the target video. * ref_video_path: The path to the camera reference video. * content_video_path: The path to the content reference video. ## Citation If you found this dataset useful, please cite our [paper](https://arxiv.org/abs/2506.03140). ```bibtex @misc{luo2025camclonemaster, title={CamCloneMaster: Enabling Reference-based Camera Control for Video Generation}, author={Yawen Luo and Jianhong Bai and Xiaoyu Shi and Menghan Xia and Xintao Wang and Pengfei Wan and Di Zhang and Kun Gai and Tianfan Xue}, year={2025}, eprint={2506.03140}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2506.03140}, } ``` ## Contact [Yawen Luo](https://luo0207.github.io/yawenluo/) luoyw0207@gmail.com

* 论文：[https://arxiv.org/abs/2506.03140](https://arxiv.org/abs/2506.03140) * 项目主页：[https://camclonemaster.github.io/](https://camclonemaster.github.io/) * 数据集：[https://huggingface.co/datasets/KwaiVGI/CameraClone-Dataset](https://huggingface.co/datasets/KwaiVGI/CameraClone-Dataset) * 训练与推理代码：[https://github.com/KwaiVGI/CamCloneMaster](https://github.com/KwaiVGI/CamCloneMaster) # 相机克隆数据集（Camera Clone Dataset） ## 1. 数据集简介 **核心概览（TL;DR）：** 收录于[CamCloneMaster](https://arxiv.org/pdf/2506.03140)的相机克隆数据集是一款面向相机克隆学习的大规模合成数据集，涵盖多样化场景、主体与相机运动形式。该数据集包含三类视频集合：相机运动参考视频(V_{cam})、内容参考视频(V_{cont})，以及采用(V_{cam})的相机运动重捕获(V_{cont})中场景的目标视频(V)。 <div align="center"> <video controls autoplay style="width: 70%;" src="https://huggingface.co/datasets/KwaiVGI/CameraClone-Dataset/resolve/main/dataset.mp4"></video> </div> 相机克隆数据集基于虚幻引擎5（Unreal Engine 5）渲染。我们采集了40个3D场景作为背景，同时收集66个角色作为场景主体，每个角色搭配一段随机动画，例如奔跑、舞蹈等。为构建三重视频集合，相机轨迹需满足两项核心要求：1）*同步多视图采集*：多台相机需同时拍摄同一场景，且各自遵循独特的运动轨迹；2）*配对轨迹*：不同场景位置下需采用一致的相机轨迹进行配对拍摄。我们的实现策略如下：在单个场景位置内，同时运行10台同步相机，每台相机遵循10种唯一的预定义轨迹以采集多样化视角。为创建配对轨迹，我们将场景中的3D位置划分为四组，确保每组内所有位置均复用相同的10种相机轨迹。相机轨迹通过预设规则自动生成，涵盖基础运动、圆弧轨迹及更复杂的相机路径等多种类型。相机克隆数据集总计包含来自40个场景的39.1万个不同位置的真实感视频，搭配97.75万种多样相机轨迹，并基于这些视频构建了115.5万个三重视频集。每个视频分辨率为576×1008，包含77帧。 **3D环境：** 我们从[Fab](https://www.fab.com)采集了40个高质量3D环境资源。为缩小渲染数据与真实世界视频的域差距，我们优先选择视觉写实的3D场景，辅以少量风格化或超现实3D场景作为补充。为保证数据多样性，所选场景涵盖各类室内外环境，例如城市街道、购物中心、咖啡馆、办公室及乡村场景。 **角色：** 我们从[Fab](https://www.fab.com)和[Mixamo](https://www.mixamo.com)采集了66个不同的人类3D模型作为角色。 **动画：** 我们从[Fab](https://www.fab.com)和[Mixamo](https://www.mixamo.com)采集了93种不同的动画，包括挥手、舞蹈、欢呼等常见动作。我们使用这些动画驱动采集到的角色，并通过多样组合构建多样化数据集。 **相机轨迹：** 为避免场景穿模，轨迹受最大移动距离(d_{max})约束，该值由场景中的初始拍摄位置决定。轨迹类型包含： * **基础型**：简单的摇摄/俯仰（5°~75°）、滚转（20°~340°）及沿坐标轴的平移。 * **圆弧型**：轨道路径，结合主旋转（10°~75°）与小幅次级旋转（5°~15°）。 * **随机型**：在2~4个随机关键点间插值生成的平滑样条曲线，其中半数样条还集成了多轴旋转。 ## 2. 统计信息与配置参数 ### 数据集统计量 | 动态场景数量 | 单场景相机数 | 总视频数 | 三重集总数 | |:------------------------:|:----------------:|:------------:|:------------:| | 39,100 | 10 | 391,000 | 1,154,819 | ### 视频配置参数 | 分辨率 | 帧数 | 帧率 | |:-----------:|:------------:|:------------------------:| | 1344×768 | 77 | 15 | | 1008×576 | 77 | 15 | 注：您可使用中心裁剪（center crop）调整视频宽高比以适配您的视频生成模型，例如16:9、9:16、4:3或3:4。 ## 3. 文件结构 Camera-Clone-Dataset ├──data ├── 0316 │ └── traj_1_01 │ ├── scene1_01.mp4 │ ├── scene550_01.mp4 │ ├── scene935_01.mp4 │ └── scene1224_01.mp4 ├── 0317 ├── 0401 ├── 0402 ├── 0404 ├── 0407 └── 0410 ## 4. 数据集使用方法 bash sudo apt-get install git-lfs git lfs install git clone https://huggingface.co/datasets/KwaiVGI/CameraClone-Dataset cd CameraClone-Dataset cat CamCloneDataset.part* > CamCloneDataset.tar.gz tar --zstd -xvf CamCloneDataset.tar.gz "三重集"信息存放在[CamCloneDataset.csv](https://huggingface.co/datasets/KwaiVGI/CameraClone-Dataset/blob/main/CamCloneDataset.csv)文件中，该文件包含以下列： * video_path：目标视频的路径。 * caption：目标视频的描述文本。 * ref_video_path：相机参考视频的路径。 * content_video_path：内容参考视频的路径。 ## 引用如果您认为本数据集对您的研究有所帮助，请引用我们的[论文](https://arxiv.org/abs/2506.03140)。 bibtex @misc{luo2025camclonemaster, title={CamCloneMaster: Enabling Reference-based Camera Control for Video Generation}, author={Yawen Luo and Jianhong Bai and Xiaoyu Shi and Menghan Xia and Xintao Wang and Pengfei Wan and Di Zhang and Kun Gai and Tianfan Xue}, year={2025}, eprint={2506.03140}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2506.03140}, } ## 联系方式 [Yawen Luo](https://luo0207.github.io/yawenluo/) luoyw0207@gmail.com

提供机构：

maas

创建时间：

2025-09-08

5,000+

优质数据集

54 个

任务类型

进入经典数据集