five

zhening/CamxTime

收藏
Hugging Face2026-04-15 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/zhening/CamxTime
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - video-to-video language: - en tags: - video-generation - camera-control - space-time - 4d - evaluation - cvpr2026 pretty_name: CamxTime Evaluation Benchmark size_categories: - 1B<n<10B --- # CamxTime Evaluation Benchmark This is the evaluation dataset for the **Cam×Time** benchmark introduced in: > **SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time** > Zhening Huang, Hyeonho Jeong, Xuelin Chen, Yulia Gryaditskaya, Tuanfeng Y. Wang, Joan Lasenby, Chun-Hao Huang > *CVPR 2026* > [![arXiv](https://img.shields.io/badge/arXiv-2512.25075-b31b1b.svg)](https://arxiv.org/abs/2512.25075) [![Project Page](https://img.shields.io/badge/Project-SpaceTimePilot-blue.svg)](https://zheninghuang.github.io/Space-Time-Pilot/) ## What is this dataset? The Cam×Time benchmark evaluates a model's ability to simultaneously control **camera viewpoint** and **temporal motion** in a dynamic scene — the core task of SpaceTimePilot. The dataset contains **32 dynamic scenes**, each rendered across a full 120×120 camera×time grid. From this grid, ground-truth videos are extracted for 5 moving-camera evaluation patterns and preprocessed to match the SpaceTimePilot inference format. --- ## Folder Structure ``` CamxTime_eval/ ├── full_grid_renders/ Raw full-grid renders (source) ├── eval_input/ Source input videos + camera files for inference ├── eval_gt/ Ground-truth pattern videos (native resolution) ├── eval_gt_wan2.1_format/ GT videos preprocessed to match network output ├── process_full_grid_to_gt.py Script: full_grid_renders → eval_gt └── preprocess_gt_videos.py Script: eval_gt → eval_gt_wan2.1_format ``` ### `full_grid_renders/` Raw renders from a 120×120 camera×time grid per scene. - **32 scenes**, each with 120 camera positions along an arc trajectory - Per camera: one 120-frame MP4 (1080×1080, 30fps) + `camera_data.json` with c2w/w2c poses and intrinsics ### `eval_input/` Source data used as input to the SpaceTimePilot model during inference. - `videos/` — 32 source MP4s (one per scene) - `src_cam/` — per-scene source camera poses (`camera_data.json`) - `metadata.csv` — scene list with text captions ### `eval_gt/` Ground-truth pattern videos at native resolution (1080×1080, 81 frames), extracted from `full_grid_renders` by slicing the camera×time grid along 5 trajectories: | Pattern | Camera axis | Time axis | |---|---|---| | `moving_forward` | cam 0 → 80 | frame 0 → 80 | | `moving_backward` | cam 0 → 80 | frame 80 → 0 | | `moving_zigzag` | cam 0 → 80 | 0 → 40 → 0 | | `moving_bullettime` | cam 0 → 80 | frame 40 (frozen) | | `moving_slowmo` | cam 0 → 80 | 0, 0, 1, 1, …, 40 | Generated by [`process_full_grid_to_gt.py`](#generating-eval_gt). ### `eval_gt_wan2.1_format/` GT videos preprocessed to exactly match SpaceTimePilot network output format: **832×480**, 81 frames, 30fps H264 (aspect-ratio crop then center-crop from 1080×1080). Generated by [`preprocess_gt_videos.py`](#generating-eval_gt_wan21_format). --- ## Downloading the dataset Each folder is distributed as a single zip archive. Download and unzip with: ```bash # Install huggingface_hub if needed: pip install huggingface_hub python - <<'EOF' from huggingface_hub import hf_hub_download import zipfile, os REPO = "zhening/CamxTime" zips = [ "full_grid_renders.zip", "eval_gt.zip", "eval_gt_wan2.1_format.zip", "eval_input.zip", ] for z in zips: print(f"Downloading {z} ...") path = hf_hub_download(repo_id=REPO, filename=z, repo_type="dataset") print(f"Extracting {z} ...") with zipfile.ZipFile(path, "r") as zf: zf.extractall(".") print(f" done → {z.replace('.zip', '/')}") EOF ``` Or download individually via the CLI: ```bash huggingface-cli download zhening/CamxTime full_grid_renders.zip --repo-type dataset --local-dir . huggingface-cli download zhening/CamxTime eval_gt.zip --repo-type dataset --local-dir . huggingface-cli download zhening/CamxTime eval_gt_wan2.1_format.zip --repo-type dataset --local-dir . huggingface-cli download zhening/CamxTime eval_input.zip --repo-type dataset --local-dir . ``` Then unzip: ```bash unzip full_grid_renders.zip unzip eval_gt.zip unzip eval_gt_wan2.1_format.zip unzip eval_input.zip ``` > **Note:** All paths inside each zip are relative to `CamxTime_eval/`, so extract from the repo root and the folder structure will be restored automatically. --- ## Generating `eval_gt` **Script:** `CamxTime_eval/process_full_grid_to_gt.py` Extracts the 5 GT pattern videos per scene from the full-grid renders. Run from the repo root: ```bash python CamxTime_eval/process_full_grid_to_gt.py \ --input CamxTime_eval/full_grid_renders \ --output CamxTime_eval/eval_gt \ --src_cam CamxTime_eval/eval_input/src_cam ``` | Flag | Default | Description | |---|---|---| | `--workers N` | `ncpu // 8` | Parallel scene processes | | `--threads N` | `8` | ffmpeg threads per scene | | `--scenes s1 s2` | all | Limit to specific scenes | Output per scene: `moving_{pattern}.mp4` + `.json` + `.txt` + `camera_data.json` --- ## Generating `eval_gt_wan2.1_format` **Script:** `CamxTime_eval/preprocess_gt_videos.py` Applies the same spatial transforms as the SpaceTimePilot inference pipeline to the GT videos: scale to cover 832×480 → CenterCrop → pad to 81 frames → 30fps H264. ```bash python CamxTime_eval/preprocess_gt_videos.py \ --input CamxTime_eval/eval_gt \ --output CamxTime_eval/eval_gt_wan2.1_format ``` | Flag | Default | Description | |---|---|---| | `--workers N` | `min(32, ncpu)` | Parallel scene processes | | `--scenes s1 s2` | all | Limit to specific scenes | Both scripts are **resumable** — already completed scenes are skipped automatically. --- ## Citation ```bibtex @inproceedings{huang2026spacetimopilot, title={SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time}, author={Huang, Zhening and Jeong, Hyeonho and Chen, Xuelin and Gryaditskaya, Yulia and Wang, Tuanfeng Y. and Lasenby, Joan and Huang, Chun-Hao}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2026} } ```
提供机构:
zhening
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作