KlingTeam/Scene-Decoupled-Video-dataset
收藏Hugging Face2026-03-08 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/KlingTeam/Scene-Decoupled-Video-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- video-generation
- text-to-video
language:
- en
tags:
- video
- synthetic
- cinematic
- panoramic image
pretty_name: Scene-Decoupled Video Dataset
size_categories:
- 150G<n<200G
arxiv: 2602.06959
---
# CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation
CVPR 2026: [Arxiv](https://arxiv.org/pdf/2602.06959) | [Project Page](https://karine-huang.github.io/CineScene/)
## Scene-Decoupled Video Dataset
TL;DR: The Scene-Decoupled Video Dataset, introduced in CineScene, is a large-scale synthetic dataset for **video generation with decoupled scene**, which encompasses diverse scenes, subjects, and camera movements. This dataset contains camera trajectories, equirectangular panorama (scene image), and videos with/without dynamic subject. The data is organized into "With Human" (whuman) and "Without Human" (wohuman) categories, while panoramas are scene-decoupled and shared across both.
## 1. Directory Tree
```text
.
├── camera/ # Camera trajectories and metadata
│ ├── whuman/ # Sequences containing human characters
│ │ └── <scene_id>/ # e.g., scene1_3x3_loc1_scene_AncientTempleEnv/
│ │ └── <scene_id>_cam.json # Camera parameters
│ └── wohuman/ # Sequences with environment only
│ └── <scene_id>/
│ └── <scene_id>_cam.json
│
├── panorama/ # Scene-decoupled environment maps
│ └── <scene_id>/ # Shared between whuman and wohuman
│ └── <scene_id>_pano.jpeg # 360° Equirectangular panoramic image
│
└── video/ # Rendered video sequences (MP4)
├── whuman/ # Videos with human characters
│ └── <scene_id>/
│ ├── <scene_id>_01_24mm.mp4 # Sub-sequences (01, 02, etc.)
│ ├── <scene_id>_02_24mm.mp4
│ └── ...
└── wohuman/ # Videos without human characters
└── <scene_id>/
├── <scene_id>_01_24mm.mp4
├── ...
```
## 2. Dataset Statistics
* **Total Scale**: 46,816 videos.
* **Scenes**: 3,400 scenes (comprising both *whuman* and *wohuman* scenes) across 35 high-quality 3D environments.
* **Trajectories**: 46,816 camera paths (7 distinct camera trajectories per scene).
* **Panorama**: 360° Equirectangular images for every scene, providing a complete background reference for scene conditioning.
| Property | Value |
| :--- | :--- |
| **Video Resolution** | 672 x 384 |
| **Frame Count** | 81 frames per video |
| **Frame Rate** | 15 FPS |
| **View Change Range** | Up to 75° |
| **Decoupled Scene** | 360° Equirectangular (Panorama) |
| **Panorama Resolution** | 2048 x 1024 |
## 3. Dataset Construction
We follow the asset collection pipeline established by **RecamMaster**, but introduce three significant enhancements to support more complex generative tasks:
1. **Decoupled Scenes**: We provide static 360° panoramic images (Equirectangular) for every scene. This allows for explicit background conditioning and facilitates novel view synthesis from any angle.
2. **Extended Camera Range**: Our dataset covers significantly larger view changes (approx. **75°**) compared to the 5–60° range provided in [previous datasets](https://huggingface.co/datasets/KlingTeam/MultiCamVideo-Dataset).
3. **Paired Subject/Background Data**: Every scene includes both "with-subject" (*whuman*) and "background-only" (*wohuman*) video sequences. This paired data is ideal for training models on subject-background decoupling, motion transfer, and cinematic composition.
## 4. useful script
- download
```bash
sudo apt-get install git-lfs
git lfs install
git clone https://huggingface.co/datasets/KlingTeam/Scene-Decoupled-Video-Dataset
cat Scene-Decoupled-Video-Dataset.part* > Scene-Decoupled-Video-Dataset.tar.gz
tar -xvf Scene-Decoupled-Video-Dataset.tar.gz
```
- camera visualization
To visualize the camera, please refer to [here.](https://huggingface.co/datasets/KlingTeam/MultiCamVideo-Dataset/blob/main/vis_cam.py)
- Perspective Projection
To extract perspective frames from the panoramic images:
```
python extract_scene_from_panorama.py
```
### 数据集元信息
任务类别:
- 视频生成
- 文本到视频
语言:
- 英语
标签:
- 视频
- 合成数据
- 电影感
- 全景图像
数据集名称:场景解耦视频数据集(Scene-Decoupled Video Dataset)
数据规模:150G < 数据量 < 200G
ArXiv编号:2602.06959
---
# CineScene: 以隐式3D作为高效场景表征实现电影级视频生成
CVPR 2026:[ArXiv](https://arxiv.org/pdf/2602.06959) | [项目主页](https://karine-huang.github.io/CineScene/)
## 场景解耦视频数据集
简要概述:CineScene工作中提出的场景解耦视频数据集,是一款面向**场景解耦视频生成**的大规模合成数据集,涵盖多样化场景、主体与相机运动类型。该数据集包含相机轨迹、等距柱状全景图(equirectangular panorama,场景图像)以及含/不含动态主体的视频。数据按“含人类主体”(whuman)与“不含人类主体”(wohuman)两类组织,而全景图采用场景解耦形式,可在两类数据中共享使用。
## 1. 目录结构
text
.
├── camera/ # 相机轨迹与元数据
│ ├── whuman/ # 包含人类角色的序列
│ │ └── <scene_id>/ # 示例:scene1_3x3_loc1_scene_AncientTempleEnv/
│ │ └── <scene_id>_cam.json # 相机参数文件
│ └── wohuman/ # 仅包含环境的序列
│ └── <scene_id>/
│ └── <scene_id>_cam.json
│
├── panorama/ # 场景解耦环境贴图
│ └── <scene_id>/ # 可在whuman与wohuman中共享
│ └── <scene_id>_pano.jpeg # 360°等距柱状全景图像
│
└── video/ # 渲染的视频序列(MP4格式)
├── whuman/ # 包含人类角色的视频
│ └── <scene_id>/
│ ├── <scene_id>_01_24mm.mp4 # 子序列(01、02等)
│ ├── <scene_id>_02_24mm.mp4
│ └── ...
└── wohuman/ # 不含人类角色的视频
└── <scene_id>/
├── <scene_id>_01_24mm.mp4
├── ...
## 2. 数据集统计
* **总规模**:46,816个视频。
* **场景总数**:3,400个场景(涵盖whuman与wohuman两类场景),源自35个高质量3D环境。
* **相机轨迹**:46,816条相机路径(每个场景对应7种不同相机轨迹)。
* **全景图**:每个场景均配备360°等距柱状图像,为场景条件调节提供完整背景参考。
| 属性 | 参数值 |
| :--- | :--- |
| **视频分辨率** | 672 × 384 |
| **单视频帧数** | 每视频81帧 |
| **帧率** | 15 FPS |
| **视角变化范围** | 最高可达75° |
| **解耦场景格式** | 360°等距柱状(全景图) |
| **全景图分辨率** | 2048 × 1024 |
## 3. 数据集构建流程
我们沿用了RecamMaster的资产收集流水线,但引入三项重要改进以支持更复杂的生成任务:
1. **场景解耦设计**:为每个场景提供静态360°等距柱状全景图像,支持显式背景条件调节,便于从任意角度实现新视角合成。
2. **扩展相机视角范围**:相较于[此前数据集](https://huggingface.co/datasets/KlingTeam/MultiCamVideo-Dataset)的5°~60°视角范围,本数据集覆盖了更大的视角变化(约75°)。
3. **主体-背景配对数据**:每个场景同时包含“含主体”(whuman)与“仅背景”(wohuman)视频序列。这类配对数据非常适合训练用于主体-背景解耦、运动迁移与电影构图的模型。
## 4. 实用脚本
### 下载脚本
bash
sudo apt-get install git-lfs
git lfs install
git clone https://huggingface.co/datasets/KlingTeam/Scene-Decoupled-Video-Dataset
cat Scene-Decoupled-Video-Dataset.part* > Scene-Decoupled-Video-Dataset.tar.gz
tar -xvf Scene-Decoupled-Video-Dataset.tar.gz
### 相机可视化
如需可视化相机轨迹,请参考[此处脚本](https://huggingface.co/datasets/KlingTeam/MultiCamVideo-Dataset/blob/main/vis_cam.py)。
### 透视投影
如需从全景图像中提取透视帧,请运行以下命令:
python extract_scene_from_panorama.py
提供机构:
KlingTeam



