zhening/CamxTime
收藏Hugging Face2026-04-15 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/zhening/CamxTime
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- video-to-video
language:
- en
tags:
- video-generation
- camera-control
- space-time
- 4d
- evaluation
- cvpr2026
pretty_name: CamxTime Evaluation Benchmark
size_categories:
- 1B<n<10B
---
# CamxTime Evaluation Benchmark
This is the evaluation dataset for the **Cam×Time** benchmark introduced in:
> **SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time**
> Zhening Huang, Hyeonho Jeong, Xuelin Chen, Yulia Gryaditskaya, Tuanfeng Y. Wang, Joan Lasenby, Chun-Hao Huang
> *CVPR 2026*
> [](https://arxiv.org/abs/2512.25075) [](https://zheninghuang.github.io/Space-Time-Pilot/)
## What is this dataset?
The Cam×Time benchmark evaluates a model's ability to simultaneously control **camera viewpoint** and **temporal motion** in a dynamic scene — the core task of SpaceTimePilot.
The dataset contains **32 dynamic scenes**, each rendered across a full 120×120 camera×time grid. From this grid, ground-truth videos are extracted for 5 moving-camera evaluation patterns and preprocessed to match the SpaceTimePilot inference format.
---
## Folder Structure
```
CamxTime_eval/
├── full_grid_renders/ Raw full-grid renders (source)
├── eval_input/ Source input videos + camera files for inference
├── eval_gt/ Ground-truth pattern videos (native resolution)
├── eval_gt_wan2.1_format/ GT videos preprocessed to match network output
├── process_full_grid_to_gt.py Script: full_grid_renders → eval_gt
└── preprocess_gt_videos.py Script: eval_gt → eval_gt_wan2.1_format
```
### `full_grid_renders/`
Raw renders from a 120×120 camera×time grid per scene.
- **32 scenes**, each with 120 camera positions along an arc trajectory
- Per camera: one 120-frame MP4 (1080×1080, 30fps) + `camera_data.json` with c2w/w2c poses and intrinsics
### `eval_input/`
Source data used as input to the SpaceTimePilot model during inference.
- `videos/` — 32 source MP4s (one per scene)
- `src_cam/` — per-scene source camera poses (`camera_data.json`)
- `metadata.csv` — scene list with text captions
### `eval_gt/`
Ground-truth pattern videos at native resolution (1080×1080, 81 frames), extracted from `full_grid_renders` by slicing the camera×time grid along 5 trajectories:
| Pattern | Camera axis | Time axis |
|---|---|---|
| `moving_forward` | cam 0 → 80 | frame 0 → 80 |
| `moving_backward` | cam 0 → 80 | frame 80 → 0 |
| `moving_zigzag` | cam 0 → 80 | 0 → 40 → 0 |
| `moving_bullettime` | cam 0 → 80 | frame 40 (frozen) |
| `moving_slowmo` | cam 0 → 80 | 0, 0, 1, 1, …, 40 |
Generated by [`process_full_grid_to_gt.py`](#generating-eval_gt).
### `eval_gt_wan2.1_format/`
GT videos preprocessed to exactly match SpaceTimePilot network output format:
**832×480**, 81 frames, 30fps H264 (aspect-ratio crop then center-crop from 1080×1080).
Generated by [`preprocess_gt_videos.py`](#generating-eval_gt_wan21_format).
---
## Downloading the dataset
Each folder is distributed as a single zip archive. Download and unzip with:
```bash
# Install huggingface_hub if needed: pip install huggingface_hub
python - <<'EOF'
from huggingface_hub import hf_hub_download
import zipfile, os
REPO = "zhening/CamxTime"
zips = [
"full_grid_renders.zip",
"eval_gt.zip",
"eval_gt_wan2.1_format.zip",
"eval_input.zip",
]
for z in zips:
print(f"Downloading {z} ...")
path = hf_hub_download(repo_id=REPO, filename=z, repo_type="dataset")
print(f"Extracting {z} ...")
with zipfile.ZipFile(path, "r") as zf:
zf.extractall(".")
print(f" done → {z.replace('.zip', '/')}")
EOF
```
Or download individually via the CLI:
```bash
huggingface-cli download zhening/CamxTime full_grid_renders.zip --repo-type dataset --local-dir .
huggingface-cli download zhening/CamxTime eval_gt.zip --repo-type dataset --local-dir .
huggingface-cli download zhening/CamxTime eval_gt_wan2.1_format.zip --repo-type dataset --local-dir .
huggingface-cli download zhening/CamxTime eval_input.zip --repo-type dataset --local-dir .
```
Then unzip:
```bash
unzip full_grid_renders.zip
unzip eval_gt.zip
unzip eval_gt_wan2.1_format.zip
unzip eval_input.zip
```
> **Note:** All paths inside each zip are relative to `CamxTime_eval/`, so extract from the repo root and the folder structure will be restored automatically.
---
## Generating `eval_gt`
**Script:** `CamxTime_eval/process_full_grid_to_gt.py`
Extracts the 5 GT pattern videos per scene from the full-grid renders.
Run from the repo root:
```bash
python CamxTime_eval/process_full_grid_to_gt.py \
--input CamxTime_eval/full_grid_renders \
--output CamxTime_eval/eval_gt \
--src_cam CamxTime_eval/eval_input/src_cam
```
| Flag | Default | Description |
|---|---|---|
| `--workers N` | `ncpu // 8` | Parallel scene processes |
| `--threads N` | `8` | ffmpeg threads per scene |
| `--scenes s1 s2` | all | Limit to specific scenes |
Output per scene: `moving_{pattern}.mp4` + `.json` + `.txt` + `camera_data.json`
---
## Generating `eval_gt_wan2.1_format`
**Script:** `CamxTime_eval/preprocess_gt_videos.py`
Applies the same spatial transforms as the SpaceTimePilot inference pipeline to the GT videos:
scale to cover 832×480 → CenterCrop → pad to 81 frames → 30fps H264.
```bash
python CamxTime_eval/preprocess_gt_videos.py \
--input CamxTime_eval/eval_gt \
--output CamxTime_eval/eval_gt_wan2.1_format
```
| Flag | Default | Description |
|---|---|---|
| `--workers N` | `min(32, ncpu)` | Parallel scene processes |
| `--scenes s1 s2` | all | Limit to specific scenes |
Both scripts are **resumable** — already completed scenes are skipped automatically.
---
## Citation
```bibtex
@inproceedings{huang2026spacetimopilot,
title={SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time},
author={Huang, Zhening and Jeong, Hyeonho and Chen, Xuelin and Gryaditskaya, Yulia and Wang, Tuanfeng Y. and Lasenby, Joan and Huang, Chun-Hao},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2026}
}
```
提供机构:
zhening



