ZhengGuangze/Stereo4D_vlbm

Name: ZhengGuangze/Stereo4D_vlbm
Creator: ZhengGuangze
Published: 2026-03-27 18:53:59
License: 暂无描述

Hugging Face2026-03-27 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/ZhengGuangze/Stereo4D_vlbm

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-sa-4.0 --- # Stereo4D (converted to VLBM format) This dataset contains 4,687 sequences from the Stereo4D dataset converted to the VLBM-compatible format using `preprocess_stereo4d.py`. The sequences have been compressed into `.tar.gz` archives in chunks of 50 sequences per archive. ## Dataset Description - **Source**: [Stereo4D](https://stereo4d.github.io/) (converted) - **Format**: VLBM-compatible per-sequence layout - **Contents**: RGB images, sparse depth maps (from 3D trajectories), 2D/3D trajectories, camera intrinsics and extrinsics, visibilities, and scene metadata ### Scale | Metric | Value | |---|---| | Total sequences | 4,687 | | Image resolution | 512 x 512 px | | Depth type | Sparse (projected from tracked 3D points) | ## Dataset Structure Each sequence directory follows this layout: ``` <seq_name>/ ├── rgbs/ │ ├── rgb_00000.jpg │ ├── rgb_00001.jpg │ └── ... ├── depths/ │ ├── depth_00000.npz │ ├── depth_00001.npz │ └── ... ├── intrinsics.npy ├── extrinsics.npy ├── trajs_2d.npy ├── trajs_3d.npy ├── visibilities.npy └── scene_info.json ``` ### File Descriptions - `rgbs/`: RGB frames from the left rectified video saved as JPEG (`rgb_XXXXX.jpg`). Resolution is 512x512 pixels. - `depths/`: Sparse depth maps saved as compressed NumPy archives (`depth_XXXXX.npz`). Each archive stores a float16 array under the key `depth` of shape `(H, W)` in meters. Depth is computed from 3D trajectory points projected into camera space — values are 0 where no trajectory point is observed. - `intrinsics.npy`: Camera intrinsic matrices for each frame `(T, 3, 3)` float16. Computed from a 60° horizontal field of view. - `extrinsics.npy`: World-to-camera extrinsic matrices (W2C) for each frame `(T, 4, 4)` float16. Computed as the inverse of the camera-to-world matrices provided in the source data. - `trajs_2d.npy`: 2D trajectories `(T, N, 2)` float16 -- pixel coordinates (x, y). Projected from 3D world coordinates using extrinsics and intrinsics. - `trajs_3d.npy`: 3D trajectories `(T, N, 3)` float16 -- world-space coordinates (x, y, z); zero-filled where invisible. - `visibilities.npy`: Visibility flags `(T, N)` float16 (1.0 visible, 0.0 not visible). - `scene_info.json`: JSON file with per-sequence metadata including `num_frames`, `image_size`, `num_trajectories`, `source`, `depth_range`, `depth_type`, and `original_sequence`. ## Conversion Details Stereo4D provides sparse track annotations in a compact format: - `track_lengths`: Number of frames each track appears in - `track_indices`: Frame indices for each observation - `track_coordinates`: 3D world coordinates for each observation - `camera2world`: Camera-to-world transformation matrices The conversion expands these sparse tracks into dense `(T, N, 3)` arrays, projects to 2D, and computes sparse depth maps from the visible 3D points. ## Data Specifications - **Image format**: JPEG (RGB), 512x512 px - **Depth format**: NPZ (float16), sparse (from tracked 3D points, 0 = unknown) - **Annotation format**: Individual `.npy` files (float16) - **Extrinsics**: World-to-camera (W2C) 4x4 matrices - **Horizontal FOV**: 60° (assumed) ## Usage Example (Python) ```python import numpy as np from PIL import Image from pathlib import Path import json seq_dir = Path("data/stereo4d_vlbm/-5JaYFNtYlM_115181849") # Load annotations trajs_2d = np.load(seq_dir / "trajs_2d.npy") # (T, N, 2) trajs_3d = np.load(seq_dir / "trajs_3d.npy") # (T, N, 3) vis = np.load(seq_dir / "visibilities.npy") # (T, N) intrinsics = np.load(seq_dir / "intrinsics.npy") # (T, 3, 3) extrinsics = np.load(seq_dir / "extrinsics.npy") # (T, 4, 4) # Load an image and depth map frame_idx = 0 rgb = Image.open(seq_dir / "rgbs" / f"rgb_{frame_idx:05d}.jpg") depth_npz = np.load(seq_dir / "depths" / f"depth_{frame_idx:05d}.npz") depth = depth_npz['depth'] # float16 array (H, W), 0 = no observation # Load scene info with open(seq_dir / "scene_info.json", 'r') as f: scene_info = json.load(f) print(scene_info) ``` ## Citation Please cite the original Stereo4D dataset when using the converted data: ```bibtex @article{chen2024stereo4d, title={Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos}, author={Chen, Linyi and Herrmann, Charles and Sun, Deqing and Jampani, Varun and Yang, Ming-Hsuan and Fleet, David J. and Rubinstein, Michael and Dekel, Tali and Barron, Jonathan T.}, year={2024} } ```

提供机构：

ZhengGuangze

5,000+

优质数据集

54 个

任务类型

进入经典数据集