five

KlingTeam/Scene-Decoupled-Video-dataset

收藏
Hugging Face2026-03-08 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/KlingTeam/Scene-Decoupled-Video-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - video-generation - text-to-video language: - en tags: - video - synthetic - cinematic - panoramic image pretty_name: Scene-Decoupled Video Dataset size_categories: - 150G<n<200G arxiv: 2602.06959 --- # CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation CVPR 2026: [Arxiv](https://arxiv.org/pdf/2602.06959) | [Project Page](https://karine-huang.github.io/CineScene/) ## Scene-Decoupled Video Dataset TL;DR: The Scene-Decoupled Video Dataset, introduced in CineScene, is a large-scale synthetic dataset for **video generation with decoupled scene**, which encompasses diverse scenes, subjects, and camera movements. This dataset contains camera trajectories, equirectangular panorama (scene image), and videos with/without dynamic subject. The data is organized into "With Human" (whuman) and "Without Human" (wohuman) categories, while panoramas are scene-decoupled and shared across both. ## 1. Directory Tree ```text . ├── camera/ # Camera trajectories and metadata │ ├── whuman/ # Sequences containing human characters │ │ └── <scene_id>/ # e.g., scene1_3x3_loc1_scene_AncientTempleEnv/ │ │ └── <scene_id>_cam.json # Camera parameters │ └── wohuman/ # Sequences with environment only │ └── <scene_id>/ │ └── <scene_id>_cam.json │ ├── panorama/ # Scene-decoupled environment maps │ └── <scene_id>/ # Shared between whuman and wohuman │ └── <scene_id>_pano.jpeg # 360° Equirectangular panoramic image │ └── video/ # Rendered video sequences (MP4) ├── whuman/ # Videos with human characters │ └── <scene_id>/ │ ├── <scene_id>_01_24mm.mp4 # Sub-sequences (01, 02, etc.) │ ├── <scene_id>_02_24mm.mp4 │ └── ... └── wohuman/ # Videos without human characters └── <scene_id>/ ├── <scene_id>_01_24mm.mp4 ├── ... ``` ## 2. Dataset Statistics * **Total Scale**: 46,816 videos. * **Scenes**: 3,400 scenes (comprising both *whuman* and *wohuman* scenes) across 35 high-quality 3D environments. * **Trajectories**: 46,816 camera paths (7 distinct camera trajectories per scene). * **Panorama**: 360° Equirectangular images for every scene, providing a complete background reference for scene conditioning. | Property | Value | | :--- | :--- | | **Video Resolution** | 672 x 384 | | **Frame Count** | 81 frames per video | | **Frame Rate** | 15 FPS | | **View Change Range** | Up to 75° | | **Decoupled Scene** | 360° Equirectangular (Panorama) | | **Panorama Resolution** | 2048 x 1024 | ## 3. Dataset Construction We follow the asset collection pipeline established by **RecamMaster**, but introduce three significant enhancements to support more complex generative tasks: 1. **Decoupled Scenes**: We provide static 360° panoramic images (Equirectangular) for every scene. This allows for explicit background conditioning and facilitates novel view synthesis from any angle. 2. **Extended Camera Range**: Our dataset covers significantly larger view changes (approx. **75°**) compared to the 5–60° range provided in [previous datasets](https://huggingface.co/datasets/KlingTeam/MultiCamVideo-Dataset). 3. **Paired Subject/Background Data**: Every scene includes both "with-subject" (*whuman*) and "background-only" (*wohuman*) video sequences. This paired data is ideal for training models on subject-background decoupling, motion transfer, and cinematic composition. ## 4. useful script - download ```bash sudo apt-get install git-lfs git lfs install git clone https://huggingface.co/datasets/KlingTeam/Scene-Decoupled-Video-Dataset cat Scene-Decoupled-Video-Dataset.part* > Scene-Decoupled-Video-Dataset.tar.gz tar -xvf Scene-Decoupled-Video-Dataset.tar.gz ``` - camera visualization To visualize the camera, please refer to [here.](https://huggingface.co/datasets/KlingTeam/MultiCamVideo-Dataset/blob/main/vis_cam.py) - Perspective Projection To extract perspective frames from the panoramic images: ``` python extract_scene_from_panorama.py ```

### 数据集元信息 任务类别: - 视频生成 - 文本到视频 语言: - 英语 标签: - 视频 - 合成数据 - 电影感 - 全景图像 数据集名称:场景解耦视频数据集(Scene-Decoupled Video Dataset) 数据规模:150G < 数据量 < 200G ArXiv编号:2602.06959 --- # CineScene: 以隐式3D作为高效场景表征实现电影级视频生成 CVPR 2026:[ArXiv](https://arxiv.org/pdf/2602.06959) | [项目主页](https://karine-huang.github.io/CineScene/) ## 场景解耦视频数据集 简要概述:CineScene工作中提出的场景解耦视频数据集,是一款面向**场景解耦视频生成**的大规模合成数据集,涵盖多样化场景、主体与相机运动类型。该数据集包含相机轨迹、等距柱状全景图(equirectangular panorama,场景图像)以及含/不含动态主体的视频。数据按“含人类主体”(whuman)与“不含人类主体”(wohuman)两类组织,而全景图采用场景解耦形式,可在两类数据中共享使用。 ## 1. 目录结构 text . ├── camera/ # 相机轨迹与元数据 │ ├── whuman/ # 包含人类角色的序列 │ │ └── <scene_id>/ # 示例:scene1_3x3_loc1_scene_AncientTempleEnv/ │ │ └── <scene_id>_cam.json # 相机参数文件 │ └── wohuman/ # 仅包含环境的序列 │ └── <scene_id>/ │ └── <scene_id>_cam.json │ ├── panorama/ # 场景解耦环境贴图 │ └── <scene_id>/ # 可在whuman与wohuman中共享 │ └── <scene_id>_pano.jpeg # 360°等距柱状全景图像 │ └── video/ # 渲染的视频序列(MP4格式) ├── whuman/ # 包含人类角色的视频 │ └── <scene_id>/ │ ├── <scene_id>_01_24mm.mp4 # 子序列(01、02等) │ ├── <scene_id>_02_24mm.mp4 │ └── ... └── wohuman/ # 不含人类角色的视频 └── <scene_id>/ ├── <scene_id>_01_24mm.mp4 ├── ... ## 2. 数据集统计 * **总规模**:46,816个视频。 * **场景总数**:3,400个场景(涵盖whuman与wohuman两类场景),源自35个高质量3D环境。 * **相机轨迹**:46,816条相机路径(每个场景对应7种不同相机轨迹)。 * **全景图**:每个场景均配备360°等距柱状图像,为场景条件调节提供完整背景参考。 | 属性 | 参数值 | | :--- | :--- | | **视频分辨率** | 672 × 384 | | **单视频帧数** | 每视频81帧 | | **帧率** | 15 FPS | | **视角变化范围** | 最高可达75° | | **解耦场景格式** | 360°等距柱状(全景图) | | **全景图分辨率** | 2048 × 1024 | ## 3. 数据集构建流程 我们沿用了RecamMaster的资产收集流水线,但引入三项重要改进以支持更复杂的生成任务: 1. **场景解耦设计**:为每个场景提供静态360°等距柱状全景图像,支持显式背景条件调节,便于从任意角度实现新视角合成。 2. **扩展相机视角范围**:相较于[此前数据集](https://huggingface.co/datasets/KlingTeam/MultiCamVideo-Dataset)的5°~60°视角范围,本数据集覆盖了更大的视角变化(约75°)。 3. **主体-背景配对数据**:每个场景同时包含“含主体”(whuman)与“仅背景”(wohuman)视频序列。这类配对数据非常适合训练用于主体-背景解耦、运动迁移与电影构图的模型。 ## 4. 实用脚本 ### 下载脚本 bash sudo apt-get install git-lfs git lfs install git clone https://huggingface.co/datasets/KlingTeam/Scene-Decoupled-Video-Dataset cat Scene-Decoupled-Video-Dataset.part* > Scene-Decoupled-Video-Dataset.tar.gz tar -xvf Scene-Decoupled-Video-Dataset.tar.gz ### 相机可视化 如需可视化相机轨迹,请参考[此处脚本](https://huggingface.co/datasets/KlingTeam/MultiCamVideo-Dataset/blob/main/vis_cam.py)。 ### 透视投影 如需从全景图像中提取透视帧,请运行以下命令: python extract_scene_from_panorama.py
提供机构:
KlingTeam
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作