ZhengGuangze/Stereo4D_vlbm
收藏Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ZhengGuangze/Stereo4D_vlbm
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-sa-4.0
---
# Stereo4D (converted to VLBM format)
This dataset contains 4,687 sequences from the Stereo4D dataset converted to the VLBM-compatible format using `preprocess_stereo4d.py`. The sequences have been compressed into `.tar.gz` archives in chunks of 50 sequences per archive.
## Dataset Description
- **Source**: [Stereo4D](https://stereo4d.github.io/) (converted)
- **Format**: VLBM-compatible per-sequence layout
- **Contents**: RGB images, sparse depth maps (from 3D trajectories), 2D/3D trajectories, camera intrinsics and extrinsics, visibilities, and scene metadata
### Scale
| Metric | Value |
|---|---|
| Total sequences | 4,687 |
| Image resolution | 512 x 512 px |
| Depth type | Sparse (projected from tracked 3D points) |
## Dataset Structure
Each sequence directory follows this layout:
```
<seq_name>/
├── rgbs/
│ ├── rgb_00000.jpg
│ ├── rgb_00001.jpg
│ └── ...
├── depths/
│ ├── depth_00000.npz
│ ├── depth_00001.npz
│ └── ...
├── intrinsics.npy
├── extrinsics.npy
├── trajs_2d.npy
├── trajs_3d.npy
├── visibilities.npy
└── scene_info.json
```
### File Descriptions
- `rgbs/`: RGB frames from the left rectified video saved as JPEG (`rgb_XXXXX.jpg`). Resolution is 512x512 pixels.
- `depths/`: Sparse depth maps saved as compressed NumPy archives (`depth_XXXXX.npz`). Each archive stores a float16 array under the key `depth` of shape `(H, W)` in meters. Depth is computed from 3D trajectory points projected into camera space — values are 0 where no trajectory point is observed.
- `intrinsics.npy`: Camera intrinsic matrices for each frame `(T, 3, 3)` float16. Computed from a 60° horizontal field of view.
- `extrinsics.npy`: World-to-camera extrinsic matrices (W2C) for each frame `(T, 4, 4)` float16. Computed as the inverse of the camera-to-world matrices provided in the source data.
- `trajs_2d.npy`: 2D trajectories `(T, N, 2)` float16 -- pixel coordinates (x, y). Projected from 3D world coordinates using extrinsics and intrinsics.
- `trajs_3d.npy`: 3D trajectories `(T, N, 3)` float16 -- world-space coordinates (x, y, z); zero-filled where invisible.
- `visibilities.npy`: Visibility flags `(T, N)` float16 (1.0 visible, 0.0 not visible).
- `scene_info.json`: JSON file with per-sequence metadata including `num_frames`, `image_size`, `num_trajectories`, `source`, `depth_range`, `depth_type`, and `original_sequence`.
## Conversion Details
Stereo4D provides sparse track annotations in a compact format:
- `track_lengths`: Number of frames each track appears in
- `track_indices`: Frame indices for each observation
- `track_coordinates`: 3D world coordinates for each observation
- `camera2world`: Camera-to-world transformation matrices
The conversion expands these sparse tracks into dense `(T, N, 3)` arrays, projects to 2D, and computes sparse depth maps from the visible 3D points.
## Data Specifications
- **Image format**: JPEG (RGB), 512x512 px
- **Depth format**: NPZ (float16), sparse (from tracked 3D points, 0 = unknown)
- **Annotation format**: Individual `.npy` files (float16)
- **Extrinsics**: World-to-camera (W2C) 4x4 matrices
- **Horizontal FOV**: 60° (assumed)
## Usage Example (Python)
```python
import numpy as np
from PIL import Image
from pathlib import Path
import json
seq_dir = Path("data/stereo4d_vlbm/-5JaYFNtYlM_115181849")
# Load annotations
trajs_2d = np.load(seq_dir / "trajs_2d.npy") # (T, N, 2)
trajs_3d = np.load(seq_dir / "trajs_3d.npy") # (T, N, 3)
vis = np.load(seq_dir / "visibilities.npy") # (T, N)
intrinsics = np.load(seq_dir / "intrinsics.npy") # (T, 3, 3)
extrinsics = np.load(seq_dir / "extrinsics.npy") # (T, 4, 4)
# Load an image and depth map
frame_idx = 0
rgb = Image.open(seq_dir / "rgbs" / f"rgb_{frame_idx:05d}.jpg")
depth_npz = np.load(seq_dir / "depths" / f"depth_{frame_idx:05d}.npz")
depth = depth_npz['depth'] # float16 array (H, W), 0 = no observation
# Load scene info
with open(seq_dir / "scene_info.json", 'r') as f:
scene_info = json.load(f)
print(scene_info)
```
## Citation
Please cite the original Stereo4D dataset when using the converted data:
```bibtex
@article{chen2024stereo4d,
title={Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos},
author={Chen, Linyi and Herrmann, Charles and Sun, Deqing and Jampani, Varun and Yang, Ming-Hsuan and Fleet, David J. and Rubinstein, Michael and Dekel, Tali and Barron, Jonathan T.},
year={2024}
}
```
提供机构:
ZhengGuangze



