Eyeline-Labs/Vista4D-Eval-Data
收藏Hugging Face2026-04-24 更新2026-05-10 收录
下载链接:
https://hf-mirror.com/datasets/Eyeline-Labs/Vista4D-Eval-Data
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
language:
- en
---
# Vista4D: Video Reshooting with 4D Point Clouds (CVPR 2026 Highlight) – Evaluation Dataset
[](https://eyeline-labs.github.io/Vista4D)
[](https://arxiv.org/abs/2604.21915)
[](https://huggingface.co/Eyeline-Labs/Vista4D)
[](https://huggingface.co/datasets/Eyeline-Labs/Vista4D-Eval-Data)
[Kuan Heng Lin](https://kuanhenglin.github.io)<sup>1,3∗</sup>, [Zhizheng Liu](https://bosmallear.github.io)<sup>1,4∗</sup>, [Pablo Salamanca](https://pablosalaman.ca)<sup>1,2</sup>, [Yash Kant](https://yashkant.github.io)<sup>1,2</sup>, [Ryan Burgert](https://ryanndagreat.github.io)<sup>1,2,5∗</sup>, [Yuancheng Xu](https://yuancheng-xu.github.io)<sup>1,2</sup>, [Koichi Namekata](https://kmcode1.github.io)<sup>1,2,6∗</sup>, [Yiwei Zhao](https://zhaoyw007.github.io)<sup>2</sup>, [Bolei Zhou](https://boleizhou.github.io)<sup>4</sup>, [Micah Goldblum](https://goldblum.github.io)<sup>3</sup>, [Paul Debevec](https://www.pauldebevec.com)<sup>1,2</sup>, [Ning Yu](https://ningyu1991.github.io)<sup>1,2</sup> <br/>
<sup>1</sup>Eyeline Labs, <sup>2</sup>Netflix, <sup>3</sup>Columbia University, <sup>4</sup>UCLA, <sup>5</sup>Stony Brook University, <sup>6</sup>University of Oxford<br>
<sup>∗</sup>*Work done during an internship at Eyeline Labs*
<div align="center">
<video controls autoplay muted style="width: 100%;" src="https://media.githubusercontent.com/media/Eyeline-Labs/Vista4D/website/media/vista4d.mp4"></video>
</div>
**Vista4D** is a *video reshooting* framework which synthesizes the dynamic scene represented by an input source video from novel camera trajectories and viewpoints. We bridge the distribution shift between training and inference for point-cloud-grounded video reshooting, as Vista4D is robust to point cloud artifacts from imprecise 4D reconstruction of real-world videos by training on noisy, reconstructed multiview videos. Our 4D point cloud with temporally-persistent static points also explicitly preserves scene content and improved camera control. Vista4D generalizes to real-world applications such as dynamic scene expansion (casual video capture of scene as background reference), 4D scene recomposition (point cloud editing), and long video inference with memory.
This is the Hugging Face repository containing our evaluation dataset. We provide 110 video-camera pairs to evaluate Vista4D. We select 13 videos from [DAVIS](https://davischallenge.org/) and 38 videos from [Pexels](https://www.pexels.com/). We use [Pi3](https://yyfz.github.io/pi3/) for 4D reconstruction and [Grounded SAM 2](https://github.com/IDEA-Research/Grounded-SAM-2) to do dynamic pixel segmentation. Then, for each video, we hand-design two to three target cameras for each video using our camera UI.
To download the dataset, from the root directory of the project, run
```bash
huggingface-cli download Eyeline-Labs/Vista4D-Eval-Data --repo-type dataset --local-dir eval_data
```
to download the Vista4D evaluation dataset into `./eval_data/` and then run
```bash
tar -xvf eval_data/eval_data.tar -C eval_data/
```
to extract the contents. It should have the following structure:
```
eval_data/
metadata.csv
recon_and_seg/ # 4D reconstruction and dynamic mask segmentation
avocado-slice/ # There should be 51 total videos
cameras.npz # Source intrinsics and extrinsics
video.mp4
depths/
00000.exr
...
dynamic_mask/
00000.png
...
sky_mask/ # Sky segmentation (to set them to a large depth)
00000.png
...
[video_name]/
...
...
cameras/
avocado-slice/ # Two to three target cameras per video
close-crane-above.npz
left-front-zoom.npz
[video_name]/
[camera_name].npz
...
...
```
`metadata.csv` contains the following information:
- `name`: Name of video-camera pair, in the format `[video]_[camera]`
- `video`: Name of source video, the 4D reconstruction and segmentation can be found in `eval_data/recon_and_seg/[video]/`
- `camera`: Name of camera, corresponds to a `video`, can be found in `eval_data/cameras/[video]/[camera].npz`
- `seed`: Randomly-generated fixed seed for evaluation
- `prompt`: Prompt for the video-camera pair, usually just the prompt of the source video
- `dynamic`: Dynamic keywords used to obtain the segmentation map
- `do_sky_seg`: Whether the video contains sky (and thus we need to segment it separately)
- `source`: Source of the video, `davis` or `pexels`
- `video_id`: For videos from `pexels` only, original ID of the video on Pexels, full link is `https://www.pexels.com/video/[video_id]`
Instructions on how to use this dataset, model weights, more results, and paper can be found on our [project page](https://eyeline-labs.github.io/Vista4D/) and [GitHub repository](https://github.com/Eyeline-Labs/Vista4D/tree/main).
提供机构:
Eyeline-Labs



