five

nvidia/ffs_stereo4d

收藏
Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/nvidia/ffs_stereo4d
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 size_categories: - 1M<n<10M task_categories: - depth-estimation pretty_name: FFS Stereo4D tags: - stereo-matching - disparity - stereo4d - foundationstereo --- # FFS Stereo4D [[Project Page]](https://nvlabs.github.io/Fast-FoundationStereo/) [[Paper]](https://huggingface.co/papers/2512.11130) [[Code]](https://github.com/NVlabs/Fast-FoundationStereo) Disparity maps for stereo matching, generated from the [Stereo4D](https://github.com/niconielsen32/Stereo4D) dataset using [FoundationStereo](https://github.com/NVlabs/FoundationStereo). ## Dataset Structure ``` data/train/ metadata.csv 0000000.zip (first 50,000 images) 0000001.zip (next 50,000 images) ... 0000025.zip ``` Each zip contains disparity PNG files named `{vid_id}_frame_{frame_idx:06d}.png`. - **Disparity images**: 3-channel uint8 784×784 PNG files encoding per-pixel disparity. Decode with: `disp = (R * 255*255 + G * 255 + B) / 1000.0`. See also: https://github.com/NVlabs/FoundationStereo/blob/master/scripts/vis_dataset.py - **metadata.csv**: Links each disparity image back to its source YouTube video, with a `zip_file` column indicating which zip contains the image. ### Metadata Columns | Column | Description | |---|---| | `file_name` | Disparity image filename (inside the zip) | | `zip_file` | Which zip file contains this image | | `vid_id` | Clip identifier (matches the `.npz` calibration file) | | `frame_idx` | Frame index in the rectified stereo output | | `youtube_video_id` | YouTube video ID of the source 360 video | | `timestamp_us` | Timestamp in microseconds in the original video | | `timestamp_sec` | Timestamp in seconds | | `video_frame_index` | Estimated frame number in the original video | | `fps` | FPS of the source video | ## Retrieving Source RGB Frames This dataset contains **disparity maps only**. Due to the copyrights of these videos, users need to download on your own behalf. The corresponding left/right RGB stereo pairs can be recovered by: 1. Following [stereo4d toolkit](https://github.com/Stereo4d/stereo4d-code) to download the YouTube video using `youtube_video_id`. 2. Seek to `timestamp_sec` (or `video_frame_index`) to locate the source frame. 3. Apply equirectangular rectification using the Stereo4D calibration `.npz` files to obtain the left and right perspective images. ## Generation Pipeline 1. **Source**: YouTube 360 videos from the Stereo4D dataset. 2. **Rectification**: Equirectangular frames are rectified and cropped to 1024×1024 perspective stereo pairs. 3. **Disparity estimation**: FoundationStereo computes dense disparity at 784×784 resolution (resized by `scale=0.765625` of the 1024×1024 input). ### Camera Parameters The rectified stereo pairs are generated at 1024×1024 with the following pinhole camera model: | Parameter | Value (1024×1024 rectified) | Value (784×784 disparity) | Formula | |---|---|---|---| | HFOV | 60° | 60° | `output_hfov` in `batch_rectify.py` | | Baseline | 0.063 m | 0.063 m | Assumed interpupillary distance for VR180 cameras | | fx, fy | 886.8 px | 678.8 px | `size * 0.5 / tan(0.5 * HFOV * pi/180)` | | cx, cy | 512 px | 392 px | Image center | Depth is derived as: `depth = fx * baseline / disparity`. Since disparity is computed at 784×784 resolution (scale factor 784/1024 = 0.765625 of the 1024×1024 input), use the 784×784 camera parameters when converting disparity to depth: ```python import numpy as np hfov = 60 # degrees baseline = 0.063 # meters imw = 784 fx = imw * 0.5 / np.tan(0.5 * np.radians(hfov)) # 678.8 px depth = fx * baseline / disparity ``` ## Citation If you use this dataset, please consider cite: ```bibtex @article{wen2026fastfoundationstereo, title={Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching}, author={Bowen Wen and Shaurya Dewan and Stan Birchfield}, journal={CVPR}, year={2026} } @article{wen2025foundationstereo, title={FoundationStereo: Zero-Shot Stereo Matching}, author={Wen, Bowen and Trepte, Matthew and Aribido, Joseph and Kautz, Jan and Birchfield, Stan and Wan, Yao}, journal={CVPR}, year={2025} } @inproceedings{jin2025stereo4d, title={{Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos}}, author={Jin, Linyi and Tucker, Richard and Li, Zhengqi and Fouhey, David and Snavely, Noah and Holynski, Aleksander}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year={2025}, } ```
提供机构:
nvidia
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作