ZhengGuangze/Stereo4D_vlbm_old
收藏Hugging Face2026-02-22 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ZhengGuangze/Stereo4D_vlbm_old
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
---
# Stereo4D (converted to VLBM format, quality top-5%)
This dataset contains a quality-filtered subset of Stereo4D sequences converted to the VLBM/Flock4D-compatible format using the conversion tool `stereo4d2vlbm.py`.
## Dataset Description
- **Source**: [Stereo4D](https://stereo4d.github.io/) (converted)
- **Format**: VLBM / Flock4D-compatible per-sequence layout
- **Contents**: RGB images, pseudo-depth maps, 2D/3D trajectories, camera intrinsics and extrinsics, and scene metadata
- **License**: See original Stereo4D license; conversion outputs and metadata are shared under Apache-2.0 here
### Scale of Source and Filtered Dataset
The original Stereo4D dataset contains **98,112 sequences** sourced from in-the-wild stereo videos. To focus on high-quality dynamic content, we applied an automated video quality detection pipeline (`quality_detect.py`) and retained only the top-5% sequences by composite quality score, resulting in **4,908 sequences** included in this dataset.
## Quality Filtering Pipeline
Quality detection was performed with `quality_detect.py`, which uniformly samples 16 frames per video and computes the following checks:
| Issue | Criterion |
|---|---|
| STATIC / NEAR_STATIC | Mean inter-frame pixel difference below threshold |
| HARD_CUT | Single frame-diff spike above threshold |
| FADE / DISSOLVE | Monotonic brightness drift or sustained moderate diff |
| DARK / OVEREXPOSED | Mean brightness out of the valid range |
| BLURRY | Median Laplacian variance too low |
| FLICKER | Std of per-frame brightness too high |
| DUPLICATE_FRAMES | High ratio of near-identical consecutive frames |
| LOOP | First and last frames nearly identical |
| TOO_SHORT | Fewer than 16 total frames |
| BLACK_FRAMES | High ratio of near-black frames |
| LOW_RESOLUTION | Width or height below 64 px |
Each video also receives a composite quality score (0–100) combining motion magnitude (40%), sharpness (35%), brightness balance (15%), and temporal stability (10%). Videos with any detected issue are penalized. The top-5% by score are selected for conversion.
## Pseudo-Depth Generation
Stereo4D does not provide ground-truth depth maps. Instead, sparse pseudo-depth is computed from the provided 3D point tracks and camera poses via `stereo4d2vlbm.py`:
1. **Load annotations**: read `camera2world` (T, 3, 4), `track_lengths`, `track_indices`, `track_coordinates` from the per-sequence `.npz` file.
2. **Compute intrinsics**: derive a pinhole intrinsic matrix from the horizontal FOV (`fov_bounds`) and image resolution.
3. **Compute extrinsics**: invert the `camera2world` matrix to obtain world-to-camera transforms (W2C).
4. **Dense track array**: convert the sparse track representation to a dense `(T, N, 3)` world-coordinates array and a boolean visibility mask.
5. **Project to 2D**: apply W2C and projection to obtain `(T, N, 2)` image-space coordinates.
6. **Sparse depth map**: transform visible 3D points to camera space; the Z-component gives metric depth. Points are rounded to the nearest pixel and written to a sparse `(H, W)` depth map (zero = unknown).
The resulting depth maps are sparse — only pixels covered by tracked 3D points carry valid depth values.
## Dataset Structure
Each sequence directory follows this layout:
```
{sequence_id}/
├── rgbs/
│ ├── rgb_00000.jpg
│ ├── rgb_00001.jpg
│ └── ...
├── depths/
│ ├── depth_00000.npz
│ ├── depth_00001.npz
│ └── ...
├── annotations.npz
└── scene_info.json
```
### File Descriptions
- `rgbs/`: RGB frames extracted from the left-rectified video and saved as JPEG (`rgb_XXXXX.jpg`). Resolution is 512×512 pixels.
- `depths/`: Sparse pseudo-depth maps saved as compressed NumPy archives (`depth_XXXXX.npz`). Each archive stores a float32 array under the key `depth` of shape `(H, W)`; zero values indicate unknown depth.
- `annotations.npz`: NumPy compressed file containing the following float16 arrays:
- `trajs_2d`: 2D trajectories `(T, N, 2)` — pixel coordinates (x, y).
- `trajs_3d`: 3D trajectories `(T, N, 3)` — world-space coordinates (x, y, z); zero-filled where invisible.
- `visibilities`: `(T, N)` — visibility flags (1.0 visible, 0.0 not visible).
- `intrinsics`: `(T, 3, 3)` — camera intrinsic matrices for each frame.
- `extrinsics`: `(T, 4, 4)` — world-to-camera extrinsic matrices (W2C) for each frame.
- `scene_info.json`: JSON file with per-sequence metadata. Fields: `source`, `num_frames`, `image_size`, `num_trajectories`, `depth_range`, `depth_type`, `original_sequence`.
## Data Specifications
- **Image format**: JPEG (RGB), 512×512 px
- **Depth format**: NPZ (float32), sparse (zero = unknown)
- **Annotation format**: `annotations.npz` (float16 arrays for compact storage)
- **Frames per sequence**: ~199 frames (varies slightly by sequence)
- **Points per sequence**: tens of thousands of 3D tracked points per sequence
## Usage Example (Python)
```python
import numpy as np
from PIL import Image
from pathlib import Path
import json
seq_dir = Path("stereo4d_vlbm/<sequence_id>")
# Load annotations
annotations = np.load(seq_dir / "annotations.npz", allow_pickle=True)
trajs_2d = annotations['trajs_2d'] # (T, N, 2)
trajs_3d = annotations['trajs_3d'] # (T, N, 3)
vis = annotations['visibilities'] # (T, N)
intrinsics = annotations['intrinsics'] # (T, 3, 3)
extrinsics = annotations['extrinsics'] # (T, 4, 4)
# Load an image and sparse depth map
frame_idx = 0
rgb = Image.open(seq_dir / "rgbs" / f"rgb_{frame_idx:05d}.jpg")
depth_npz = np.load(seq_dir / "depths" / f"depth_{frame_idx:05d}.npz")
depth = depth_npz['depth'] # float32 array (H, W), 0 = unknown
# Load scene info
with open(seq_dir / "scene_info.json", 'r') as f:
scene_info = json.load(f)
print(scene_info)
```
## Conversion Script
The full conversion pipeline is provided in `stereo4d2vlbm.py`. It supports single-sequence, batch, and resume-from-checkpoint modes:
```bash
# Convert a single sequence
python stereo4d2vlbm.py --seq "_0be62W7ndY_15081748"
# Batch convert top-5% sequences (uses quality filter file)
python stereo4d2vlbm.py --batch --num_workers 8 \
--top5_file tmp/data/stereo4d/quality_top5_full/top5pct_videos.txt
# Batch convert all available sequences
python stereo4d2vlbm.py --batch --num_workers 8
```
## Citation
Please cite the original Stereo4D dataset when using the converted data. If you use the VLBM/Flock4D conversion, please also cite this repository.
## Contact
If you encounter issues with the conversion or the converted files, please open an issue in the repository.
提供机构:
ZhengGuangze



