KlingTeam/Scene-Decoupled-Video-dataset

Name: KlingTeam/Scene-Decoupled-Video-dataset
Creator: KlingTeam
Published: 2026-03-08 05:03:02
License: 暂无描述

Hugging Face2026-03-08 更新2026-04-05 收录

下载链接：

https://hf-mirror.com/datasets/KlingTeam/Scene-Decoupled-Video-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- task_categories: - video-generation - text-to-video language: - en tags: - video - synthetic - cinematic - panoramic image pretty_name: Scene-Decoupled Video Dataset size_categories: - 150G<n<200G arxiv: 2602.06959 --- # CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation CVPR 2026: [Arxiv](https://arxiv.org/pdf/2602.06959) | [Project Page](https://karine-huang.github.io/CineScene/) ## Scene-Decoupled Video Dataset TL;DR: The Scene-Decoupled Video Dataset, introduced in CineScene, is a large-scale synthetic dataset for **video generation with decoupled scene**, which encompasses diverse scenes, subjects, and camera movements. This dataset contains camera trajectories, equirectangular panorama (scene image), and videos with/without dynamic subject. The data is organized into "With Human" (whuman) and "Without Human" (wohuman) categories, while panoramas are scene-decoupled and shared across both. ## 1. Directory Tree ```text . ├── camera/ # Camera trajectories and metadata │ ├── whuman/ # Sequences containing human characters │ │ └── <scene_id>/ # e.g., scene1_3x3_loc1_scene_AncientTempleEnv/ │ │ └── <scene_id>_cam.json # Camera parameters │ └── wohuman/ # Sequences with environment only │ └── <scene_id>/ │ └── <scene_id>_cam.json │ ├── panorama/ # Scene-decoupled environment maps │ └── <scene_id>/ # Shared between whuman and wohuman │ └── <scene_id>_pano.jpeg # 360° Equirectangular panoramic image │ └── video/ # Rendered video sequences (MP4) ├── whuman/ # Videos with human characters │ └── <scene_id>/ │ ├── <scene_id>_01_24mm.mp4 # Sub-sequences (01, 02, etc.) │ ├── <scene_id>_02_24mm.mp4 │ └── ... └── wohuman/ # Videos without human characters └── <scene_id>/ ├── <scene_id>_01_24mm.mp4 ├── ... ``` ## 2. Dataset Statistics * **Total Scale**: 46,816 videos. * **Scenes**: 3,400 scenes (comprising both *whuman* and *wohuman* scenes) across 35 high-quality 3D environments. * **Trajectories**: 46,816 camera paths (7 distinct camera trajectories per scene). * **Panorama**: 360° Equirectangular images for every scene, providing a complete background reference for scene conditioning. | Property | Value | | :--- | :--- | | **Video Resolution** | 672 x 384 | | **Frame Count** | 81 frames per video | | **Frame Rate** | 15 FPS | | **View Change Range** | Up to 75° | | **Decoupled Scene** | 360° Equirectangular (Panorama) | | **Panorama Resolution** | 2048 x 1024 | ## 3. Dataset Construction We follow the asset collection pipeline established by **RecamMaster**, but introduce three significant enhancements to support more complex generative tasks: 1. **Decoupled Scenes**: We provide static 360° panoramic images (Equirectangular) for every scene. This allows for explicit background conditioning and facilitates novel view synthesis from any angle. 2. **Extended Camera Range**: Our dataset covers significantly larger view changes (approx. **75°**) compared to the 5–60° range provided in [previous datasets](https://huggingface.co/datasets/KlingTeam/MultiCamVideo-Dataset). 3. **Paired Subject/Background Data**: Every scene includes both "with-subject" (*whuman*) and "background-only" (*wohuman*) video sequences. This paired data is ideal for training models on subject-background decoupling, motion transfer, and cinematic composition. ## 4. useful script - download ```bash sudo apt-get install git-lfs git lfs install git clone https://huggingface.co/datasets/KlingTeam/Scene-Decoupled-Video-Dataset cat Scene-Decoupled-Video-Dataset.part* > Scene-Decoupled-Video-Dataset.tar.gz tar -xvf Scene-Decoupled-Video-Dataset.tar.gz ``` - camera visualization To visualize the camera, please refer to [here.](https://huggingface.co/datasets/KlingTeam/MultiCamVideo-Dataset/blob/main/vis_cam.py) - Perspective Projection To extract perspective frames from the panoramic images: ``` python extract_scene_from_panorama.py ```

### 数据集元信息任务类别： - 视频生成 - 文本到视频语言： - 英语标签： - 视频 - 合成数据 - 电影感 - 全景图像数据集名称：场景解耦视频数据集（Scene-Decoupled Video Dataset）数据规模：150G < 数据量 < 200G ArXiv编号：2602.06959 --- # CineScene: 以隐式3D作为高效场景表征实现电影级视频生成 CVPR 2026：[ArXiv](https://arxiv.org/pdf/2602.06959) | [项目主页](https://karine-huang.github.io/CineScene/) ## 场景解耦视频数据集简要概述：CineScene工作中提出的场景解耦视频数据集，是一款面向**场景解耦视频生成**的大规模合成数据集，涵盖多样化场景、主体与相机运动类型。该数据集包含相机轨迹、等距柱状全景图（equirectangular panorama，场景图像）以及含/不含动态主体的视频。数据按“含人类主体”（whuman）与“不含人类主体”（wohuman）两类组织，而全景图采用场景解耦形式，可在两类数据中共享使用。 ## 1. 目录结构 text . ├── camera/ # 相机轨迹与元数据 │ ├── whuman/ # 包含人类角色的序列 │ │ └── <scene_id>/ # 示例：scene1_3x3_loc1_scene_AncientTempleEnv/ │ │ └── <scene_id>_cam.json # 相机参数文件 │ └── wohuman/ # 仅包含环境的序列 │ └── <scene_id>/ │ └── <scene_id>_cam.json │ ├── panorama/ # 场景解耦环境贴图 │ └── <scene_id>/ # 可在whuman与wohuman中共享 │ └── <scene_id>_pano.jpeg # 360°等距柱状全景图像 │ └── video/ # 渲染的视频序列（MP4格式） ├── whuman/ # 包含人类角色的视频 │ └── <scene_id>/ │ ├── <scene_id>_01_24mm.mp4 # 子序列（01、02等） │ ├── <scene_id>_02_24mm.mp4 │ └── ... └── wohuman/ # 不含人类角色的视频 └── <scene_id>/ ├── <scene_id>_01_24mm.mp4 ├── ... ## 2. 数据集统计 * **总规模**：46,816个视频。 * **场景总数**：3,400个场景（涵盖whuman与wohuman两类场景），源自35个高质量3D环境。 * **相机轨迹**：46,816条相机路径（每个场景对应7种不同相机轨迹）。 * **全景图**：每个场景均配备360°等距柱状图像，为场景条件调节提供完整背景参考。 | 属性 | 参数值 | | :--- | :--- | | **视频分辨率** | 672 × 384 | | **单视频帧数** | 每视频81帧 | | **帧率** | 15 FPS | | **视角变化范围** | 最高可达75° | | **解耦场景格式** | 360°等距柱状（全景图） | | **全景图分辨率** | 2048 × 1024 | ## 3. 数据集构建流程我们沿用了RecamMaster的资产收集流水线，但引入三项重要改进以支持更复杂的生成任务： 1. **场景解耦设计**：为每个场景提供静态360°等距柱状全景图像，支持显式背景条件调节，便于从任意角度实现新视角合成。 2. **扩展相机视角范围**：相较于[此前数据集](https://huggingface.co/datasets/KlingTeam/MultiCamVideo-Dataset)的5°~60°视角范围，本数据集覆盖了更大的视角变化（约75°）。 3. **主体-背景配对数据**：每个场景同时包含“含主体”（whuman）与“仅背景”（wohuman）视频序列。这类配对数据非常适合训练用于主体-背景解耦、运动迁移与电影构图的模型。 ## 4. 实用脚本 ### 下载脚本 bash sudo apt-get install git-lfs git lfs install git clone https://huggingface.co/datasets/KlingTeam/Scene-Decoupled-Video-Dataset cat Scene-Decoupled-Video-Dataset.part* > Scene-Decoupled-Video-Dataset.tar.gz tar -xvf Scene-Decoupled-Video-Dataset.tar.gz ### 相机可视化如需可视化相机轨迹，请参考[此处脚本](https://huggingface.co/datasets/KlingTeam/MultiCamVideo-Dataset/blob/main/vis_cam.py)。 ### 透视投影如需从全景图像中提取透视帧，请运行以下命令： python extract_scene_from_panorama.py

提供机构：

KlingTeam

5,000+

优质数据集

54 个

任务类型

进入经典数据集