ZhengGuangze/Kubric_vlbm_old

Name: ZhengGuangze/Kubric_vlbm_old
Creator: ZhengGuangze
Published: 2026-01-29 07:05:53
License: 暂无描述

Hugging Face2026-01-29 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/ZhengGuangze/Kubric_vlbm_old

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 --- # Kubric CoTracker3 Dataset This dataset contains 3D point tracking sequences generated from Kubric synthetic scenes, converted to the Flock4D format. The dataset includes RGB images, depth maps, 2D/3D trajectories, camera parameters, and scene metadata for each sequence. ## Dataset Description This dataset contains point tracking sequences with the following characteristics: - **Total Sequences**: 5,869 sequences - **Format**: Flock4D-compatible format - **Source**: Generated from Kubric synthetic scenes using CoTracker3 - **Resolution**: 512×512 pixels - **License**: Please refer to the original Kubric and CoTracker3 licenses ## Dataset Structure Each sequence directory contains: ``` {sequence_id}/ ├── rgbs/ │ ├── rgb_00000.jpg │ ├── rgb_00001.jpg │ └── ... ├── depths/ │ ├── depth_00000.npz │ ├── depth_00001.npz │ └── ... ├── annotations.npz └── scene_info.json ``` ### File Descriptions - **rgbs/**: RGB images in JPEG format (512×512 pixels) - **depths/**: Depth maps in npz format (512×512 pixels). The depth values originate from Kubric's metric depth output (in meters). - **annotations.npz**: NumPy compressed file containing: - `trajs_2d`: 2D trajectories `(T, N, 2)` - T frames, N points, (x, y) coordinates (float16) - `trajs_3d`: 3D trajectories `(T, N, 3)` - T frames, N points, (x, y, z) coordinates in camera space (float16) - `visibilities`: Visibility flags `(T, N)` - 1.0 for visible, 0.0 for invisible (float16) - `intrinsics`: Camera intrinsic matrices `(T, 3, 3)` - one per frame (float16) - `extrinsics`: Camera extrinsic matrices `(T, 4, 4)` - world-to-camera transformation (float16) - **scene_info.json**: Scene metadata including sensor parameters ### Data Specifications - **Image Resolution**: 512 × 512 pixels - **Number of Points**: 32,768 points per sequence - **Number of Frames**: 120 frames per sequence - **Data Types**: - Images: JPEG (RGB) and PNG (depth, uint16) - Annotations: float16 for efficient storage - **Depth Source**: The depth maps originate from Kubric's metric depth output (meters). - **Coordinate Systems**: - 2D trajectories: Image pixel coordinates (x, y) - 3D trajectories: Camera coordinate system (x, y, z) - Extrinsics: World-to-camera transformation matrices (4×4) ## Usage ### Loading Data in Python ```python import numpy as np from PIL import Image from pathlib import Path import json # Load a sequence seq_dir = Path("data/kubric_cotracker3/0000") # Load annotations annotations = np.load(seq_dir / "annotations.npz", allow_pickle=True) trajs_2d = annotations['trajs_2d'] # (T, N, 2) trajs_3d = annotations['trajs_3d'] # (T, N, 3) visibilities = annotations['visibilities'] # (T, N) intrinsics = annotations['intrinsics'] # (T, 3, 3) extrinsics = annotations['extrinsics'] # (T, 4, 4) # Load images frame_idx = 0 rgb_img = Image.open(seq_dir / "rgbs" / f"rgb_{frame_idx:05d}.jpg") depth_img = Image.open(seq_dir / "depths" / f"depth_{frame_idx:05d}.png") # Load scene info with open(seq_dir / "scene_info.json", 'r') as f: scene_info = json.load(f) ``` ### Converting 3D Points to World Coordinates ```python # Transform 3D points from camera to world coordinates extrinsics_inv = np.linalg.inv(extrinsics[frame_idx]) R_inv = extrinsics_inv[:3, :3] t_inv = extrinsics_inv[:3, 3] points_3d_camera = trajs_3d[frame_idx] # (N, 3) in camera coordinates points_3d_world = (R_inv @ points_3d_camera.T).T + t_inv # (N, 3) in world coordinates ``` ## Dataset Statistics - **Total Sequences**: 5,869 - **Frames per Sequence**: 120 frames - **Points per Sequence**: 32,768 points - **Image Format**: RGB JPEG (512×512) - **Depth Format**: NPZ (512×512) - **Total Size**: ~313 GB (uncompressed) ## Dataset Details ### Annotations Format The `annotations.npz` file contains the following arrays: - **`trajs_2d`** `(T, N, 2)`: 2D pixel coordinates for each point across all frames - T: number of frames (120) - N: number of points (32,768) - Coordinates: (x, y) in pixel space - **`trajs_3d`** `(T, N, 3)`: 3D coordinates in camera space - Coordinates: (x, y, z) in camera coordinate system - **`visibilities`** `(T, N)`: Visibility flags - 1.0: point is visible in the frame - 0.0: point is occluded or outside the frame - **`intrinsics`** `(T, 3, 3)`: Camera intrinsic matrices (one per frame) - Format: standard camera intrinsic matrix - Used for 2D to 3D conversion - **`extrinsics`** `(T, 4, 4)`: Camera extrinsic matrices (world-to-camera transformation) - Format: 4×4 transformation matrix - Used for coordinate system transformation ### Scene Information The `scene_info.json` file contains metadata about each scene: ```json { "sensor_width": 32.0, "sensor_height": 32.0, "focal_length": <focal_length>, "assets": [], "character": [] } ``` ## Citation If you use this dataset, please cite: 1. The original [Kubric](https://github.com/google-research/kubric) paper 2. [CoTracker3](https://github.com/facebookresearch/co-tracker) paper 3. [Flock4D](https://huggingface.co/datasets/ZhengGuangze/Flock4D) dataset (if applicable) ## Contact For questions or issues regarding this dataset, please open an issue in the repository.

提供机构：

ZhengGuangze

5,000+

优质数据集

54 个

任务类型

进入经典数据集