ZhengGuangze/Kubric_vlbm_old
收藏Hugging Face2026-01-29 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ZhengGuangze/Kubric_vlbm_old
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
---
# Kubric CoTracker3 Dataset
This dataset contains 3D point tracking sequences generated from Kubric synthetic scenes, converted to the Flock4D format. The dataset includes RGB images, depth maps, 2D/3D trajectories, camera parameters, and scene metadata for each sequence.
## Dataset Description
This dataset contains point tracking sequences with the following characteristics:
- **Total Sequences**: 5,869 sequences
- **Format**: Flock4D-compatible format
- **Source**: Generated from Kubric synthetic scenes using CoTracker3
- **Resolution**: 512×512 pixels
- **License**: Please refer to the original Kubric and CoTracker3 licenses
## Dataset Structure
Each sequence directory contains:
```
{sequence_id}/
├── rgbs/
│ ├── rgb_00000.jpg
│ ├── rgb_00001.jpg
│ └── ...
├── depths/
│ ├── depth_00000.npz
│ ├── depth_00001.npz
│ └── ...
├── annotations.npz
└── scene_info.json
```
### File Descriptions
- **rgbs/**: RGB images in JPEG format (512×512 pixels)
- **depths/**: Depth maps in npz format (512×512 pixels). The depth values originate from Kubric's metric depth output (in meters).
- **annotations.npz**: NumPy compressed file containing:
- `trajs_2d`: 2D trajectories `(T, N, 2)` - T frames, N points, (x, y) coordinates (float16)
- `trajs_3d`: 3D trajectories `(T, N, 3)` - T frames, N points, (x, y, z) coordinates in camera space (float16)
- `visibilities`: Visibility flags `(T, N)` - 1.0 for visible, 0.0 for invisible (float16)
- `intrinsics`: Camera intrinsic matrices `(T, 3, 3)` - one per frame (float16)
- `extrinsics`: Camera extrinsic matrices `(T, 4, 4)` - world-to-camera transformation (float16)
- **scene_info.json**: Scene metadata including sensor parameters
### Data Specifications
- **Image Resolution**: 512 × 512 pixels
- **Number of Points**: 32,768 points per sequence
- **Number of Frames**: 120 frames per sequence
- **Data Types**:
- Images: JPEG (RGB) and PNG (depth, uint16)
- Annotations: float16 for efficient storage
- **Depth Source**: The depth maps originate from Kubric's metric depth output (meters).
- **Coordinate Systems**:
- 2D trajectories: Image pixel coordinates (x, y)
- 3D trajectories: Camera coordinate system (x, y, z)
- Extrinsics: World-to-camera transformation matrices (4×4)
## Usage
### Loading Data in Python
```python
import numpy as np
from PIL import Image
from pathlib import Path
import json
# Load a sequence
seq_dir = Path("data/kubric_cotracker3/0000")
# Load annotations
annotations = np.load(seq_dir / "annotations.npz", allow_pickle=True)
trajs_2d = annotations['trajs_2d'] # (T, N, 2)
trajs_3d = annotations['trajs_3d'] # (T, N, 3)
visibilities = annotations['visibilities'] # (T, N)
intrinsics = annotations['intrinsics'] # (T, 3, 3)
extrinsics = annotations['extrinsics'] # (T, 4, 4)
# Load images
frame_idx = 0
rgb_img = Image.open(seq_dir / "rgbs" / f"rgb_{frame_idx:05d}.jpg")
depth_img = Image.open(seq_dir / "depths" / f"depth_{frame_idx:05d}.png")
# Load scene info
with open(seq_dir / "scene_info.json", 'r') as f:
scene_info = json.load(f)
```
### Converting 3D Points to World Coordinates
```python
# Transform 3D points from camera to world coordinates
extrinsics_inv = np.linalg.inv(extrinsics[frame_idx])
R_inv = extrinsics_inv[:3, :3]
t_inv = extrinsics_inv[:3, 3]
points_3d_camera = trajs_3d[frame_idx] # (N, 3) in camera coordinates
points_3d_world = (R_inv @ points_3d_camera.T).T + t_inv # (N, 3) in world coordinates
```
## Dataset Statistics
- **Total Sequences**: 5,869
- **Frames per Sequence**: 120 frames
- **Points per Sequence**: 32,768 points
- **Image Format**: RGB JPEG (512×512)
- **Depth Format**: NPZ (512×512)
- **Total Size**: ~313 GB (uncompressed)
## Dataset Details
### Annotations Format
The `annotations.npz` file contains the following arrays:
- **`trajs_2d`** `(T, N, 2)`: 2D pixel coordinates for each point across all frames
- T: number of frames (120)
- N: number of points (32,768)
- Coordinates: (x, y) in pixel space
- **`trajs_3d`** `(T, N, 3)`: 3D coordinates in camera space
- Coordinates: (x, y, z) in camera coordinate system
- **`visibilities`** `(T, N)`: Visibility flags
- 1.0: point is visible in the frame
- 0.0: point is occluded or outside the frame
- **`intrinsics`** `(T, 3, 3)`: Camera intrinsic matrices (one per frame)
- Format: standard camera intrinsic matrix
- Used for 2D to 3D conversion
- **`extrinsics`** `(T, 4, 4)`: Camera extrinsic matrices (world-to-camera transformation)
- Format: 4×4 transformation matrix
- Used for coordinate system transformation
### Scene Information
The `scene_info.json` file contains metadata about each scene:
```json
{
"sensor_width": 32.0,
"sensor_height": 32.0,
"focal_length": <focal_length>,
"assets": [],
"character": []
}
```
## Citation
If you use this dataset, please cite:
1. The original [Kubric](https://github.com/google-research/kubric) paper
2. [CoTracker3](https://github.com/facebookresearch/co-tracker) paper
3. [Flock4D](https://huggingface.co/datasets/ZhengGuangze/Flock4D) dataset (if applicable)
## Contact
For questions or issues regarding this dataset, please open an issue in the repository.
提供机构:
ZhengGuangze



