ZhengGuangze/Stereo4D_vlbm_old

Name: ZhengGuangze/Stereo4D_vlbm_old
Creator: ZhengGuangze
Published: 2026-02-22 02:23:41
License: 暂无描述

Hugging Face2026-02-22 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/ZhengGuangze/Stereo4D_vlbm_old

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 --- # Stereo4D (converted to VLBM format, quality top-5%) This dataset contains a quality-filtered subset of Stereo4D sequences converted to the VLBM/Flock4D-compatible format using the conversion tool `stereo4d2vlbm.py`. ## Dataset Description - **Source**: [Stereo4D](https://stereo4d.github.io/) (converted) - **Format**: VLBM / Flock4D-compatible per-sequence layout - **Contents**: RGB images, pseudo-depth maps, 2D/3D trajectories, camera intrinsics and extrinsics, and scene metadata - **License**: See original Stereo4D license; conversion outputs and metadata are shared under Apache-2.0 here ### Scale of Source and Filtered Dataset The original Stereo4D dataset contains **98,112 sequences** sourced from in-the-wild stereo videos. To focus on high-quality dynamic content, we applied an automated video quality detection pipeline (`quality_detect.py`) and retained only the top-5% sequences by composite quality score, resulting in **4,908 sequences** included in this dataset. ## Quality Filtering Pipeline Quality detection was performed with `quality_detect.py`, which uniformly samples 16 frames per video and computes the following checks: | Issue | Criterion | |---|---| | STATIC / NEAR_STATIC | Mean inter-frame pixel difference below threshold | | HARD_CUT | Single frame-diff spike above threshold | | FADE / DISSOLVE | Monotonic brightness drift or sustained moderate diff | | DARK / OVEREXPOSED | Mean brightness out of the valid range | | BLURRY | Median Laplacian variance too low | | FLICKER | Std of per-frame brightness too high | | DUPLICATE_FRAMES | High ratio of near-identical consecutive frames | | LOOP | First and last frames nearly identical | | TOO_SHORT | Fewer than 16 total frames | | BLACK_FRAMES | High ratio of near-black frames | | LOW_RESOLUTION | Width or height below 64 px | Each video also receives a composite quality score (0–100) combining motion magnitude (40%), sharpness (35%), brightness balance (15%), and temporal stability (10%). Videos with any detected issue are penalized. The top-5% by score are selected for conversion. ## Pseudo-Depth Generation Stereo4D does not provide ground-truth depth maps. Instead, sparse pseudo-depth is computed from the provided 3D point tracks and camera poses via `stereo4d2vlbm.py`: 1. **Load annotations**: read `camera2world` (T, 3, 4), `track_lengths`, `track_indices`, `track_coordinates` from the per-sequence `.npz` file. 2. **Compute intrinsics**: derive a pinhole intrinsic matrix from the horizontal FOV (`fov_bounds`) and image resolution. 3. **Compute extrinsics**: invert the `camera2world` matrix to obtain world-to-camera transforms (W2C). 4. **Dense track array**: convert the sparse track representation to a dense `(T, N, 3)` world-coordinates array and a boolean visibility mask. 5. **Project to 2D**: apply W2C and projection to obtain `(T, N, 2)` image-space coordinates. 6. **Sparse depth map**: transform visible 3D points to camera space; the Z-component gives metric depth. Points are rounded to the nearest pixel and written to a sparse `(H, W)` depth map (zero = unknown). The resulting depth maps are sparse — only pixels covered by tracked 3D points carry valid depth values. ## Dataset Structure Each sequence directory follows this layout: ``` {sequence_id}/ ├── rgbs/ │ ├── rgb_00000.jpg │ ├── rgb_00001.jpg │ └── ... ├── depths/ │ ├── depth_00000.npz │ ├── depth_00001.npz │ └── ... ├── annotations.npz └── scene_info.json ``` ### File Descriptions - `rgbs/`: RGB frames extracted from the left-rectified video and saved as JPEG (`rgb_XXXXX.jpg`). Resolution is 512×512 pixels. - `depths/`: Sparse pseudo-depth maps saved as compressed NumPy archives (`depth_XXXXX.npz`). Each archive stores a float32 array under the key `depth` of shape `(H, W)`; zero values indicate unknown depth. - `annotations.npz`: NumPy compressed file containing the following float16 arrays: - `trajs_2d`: 2D trajectories `(T, N, 2)` — pixel coordinates (x, y). - `trajs_3d`: 3D trajectories `(T, N, 3)` — world-space coordinates (x, y, z); zero-filled where invisible. - `visibilities`: `(T, N)` — visibility flags (1.0 visible, 0.0 not visible). - `intrinsics`: `(T, 3, 3)` — camera intrinsic matrices for each frame. - `extrinsics`: `(T, 4, 4)` — world-to-camera extrinsic matrices (W2C) for each frame. - `scene_info.json`: JSON file with per-sequence metadata. Fields: `source`, `num_frames`, `image_size`, `num_trajectories`, `depth_range`, `depth_type`, `original_sequence`. ## Data Specifications - **Image format**: JPEG (RGB), 512×512 px - **Depth format**: NPZ (float32), sparse (zero = unknown) - **Annotation format**: `annotations.npz` (float16 arrays for compact storage) - **Frames per sequence**: ~199 frames (varies slightly by sequence) - **Points per sequence**: tens of thousands of 3D tracked points per sequence ## Usage Example (Python) ```python import numpy as np from PIL import Image from pathlib import Path import json seq_dir = Path("stereo4d_vlbm/<sequence_id>") # Load annotations annotations = np.load(seq_dir / "annotations.npz", allow_pickle=True) trajs_2d = annotations['trajs_2d'] # (T, N, 2) trajs_3d = annotations['trajs_3d'] # (T, N, 3) vis = annotations['visibilities'] # (T, N) intrinsics = annotations['intrinsics'] # (T, 3, 3) extrinsics = annotations['extrinsics'] # (T, 4, 4) # Load an image and sparse depth map frame_idx = 0 rgb = Image.open(seq_dir / "rgbs" / f"rgb_{frame_idx:05d}.jpg") depth_npz = np.load(seq_dir / "depths" / f"depth_{frame_idx:05d}.npz") depth = depth_npz['depth'] # float32 array (H, W), 0 = unknown # Load scene info with open(seq_dir / "scene_info.json", 'r') as f: scene_info = json.load(f) print(scene_info) ``` ## Conversion Script The full conversion pipeline is provided in `stereo4d2vlbm.py`. It supports single-sequence, batch, and resume-from-checkpoint modes: ```bash # Convert a single sequence python stereo4d2vlbm.py --seq "_0be62W7ndY_15081748" # Batch convert top-5% sequences (uses quality filter file) python stereo4d2vlbm.py --batch --num_workers 8 \ --top5_file tmp/data/stereo4d/quality_top5_full/top5pct_videos.txt # Batch convert all available sequences python stereo4d2vlbm.py --batch --num_workers 8 ``` ## Citation Please cite the original Stereo4D dataset when using the converted data. If you use the VLBM/Flock4D conversion, please also cite this repository. ## Contact If you encounter issues with the conversion or the converted files, please open an issue in the repository.

提供机构：

ZhengGuangze

5,000+

优质数据集

54 个

任务类型

进入经典数据集