Hanchiao/PlanaReLoc
收藏Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Hanchiao/PlanaReLoc
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-sa-3.0
size_categories:
- 10B<n<100B
---
# Data Organization
## 1. Overview
The PlanaReLoc dataset is curated for the task of camera relocalization with a plane-centric pipeline introduced in the paper ["PlanaReLoc: Camera Relocalization in 3D Planar Primitives via Region-Based Structure Matching"](https://arxiv.org/abs/2603.20818). The dataset consists of a collection of scenes, each represented as **an untextured map formed by an arrangment of multiple 3D planar primitives**. For each scene, a set of RGB images is provided as queries (i.e., images to be relocalized), each associated with ground-truth annotations such as plane segmentation, and the camera pose. The motivation behind this dataset is to place a premium on planar primitives and investigate the use of 3D planar maps for leaner camera relocalization in structured environments.
Note that the dataset is built upon the [ScanNet](http://www.scan-net.org/) and [12Scenes](http://www.graphics.stanford.edu/projects/reloc/) datasets. **Users are required to agree to and comply with the terms of use of these datasets before using the PlanaReLoc dataset.**
## 2. Dataset Resources
[](https://github.com/3dv-casia/PlanaReLoc) <span style="display:inline-block;width:2px;height:20px;background:#d0d0d0;vertical-align:top;margin:0 10px;"></span> [](https://arxiv.org/abs/2603.20818)
## 3. Dataset Structure
The dataset contains two parts:
1. `scannet_planareloc_dataset`: built upon [ScanNet](http://www.scan-net.org/), consists of 1210 scenes, 45802 query images for training and 303 scene, 7735 query images for testing/validation. The total size is around 17.2GB.
2. `s12scenes_planareloc_dataset`: built upon [12Scenes](http://www.graphics.stanford.edu/projects/reloc/), consists of 12 scenes, 1023 query images ONLY for cross-dataset evaluation. The total size is around 350MB.
Here is the directory structure of the `scannet_planareloc_dataset`:
```
scannet_planareloc_dataset/
├── caches/ # batched into Arrow chunks for efficient loading
│ ├── maps/
│ │ ├── train_scene0000_00-scene0564_02.arrow # 1210 scenes
│ │ ├── test_scene0575_00-scene0706_00.arrow # 303 scenes
│ │ └── val_scene0581_00-scene0698_00.arrow # 3 scenes
│ └── queries/
│ ├── train_000_003999.arrow
│ ├── train_001_007999.arrow
│ ├── ...
│ ├── test_000_003999.arrow
│ ├── test_001_007734.arrow
│ └── val_000_000103.arrow # 104 queries from 3 scenes for validation during training
├── map_glbs # in glb format for visualization and optional use
│ ├── scenexxxx_xx.glb
│ └── ...
├── cache_set_val_split.json # json files record identifiers of maps and queries within different splits.
├── cache_set_test_split.json
└── cache_set_train_split.json
```
`s12scenes_planareloc_dataset` follows the similar structure as above.
## 4. Dataset Details
### JSON Files for Dataset Splits
JSON files named as `cache_set_{split}_split.json` (e.g., `cache_set_train_split.json`) record identifiers of maps and queries included in different dataset splits (train, test, val), which are used to retrieve data from the Arrow files under `./caches/`. Each JSON file contains the following fields:
- `queries`: a list of identifiers for query samples included in the dataset split. Each identifier corresponds to a unique query image and is typically in the format of `f"{map_id}_{view_id}"` (e.g., "scene0575_00_000000").
- `maps`: a list of meta information for each scene included in the dataset split, containing the unique identifier for each scene (e.g., "scene0575_00") and the number and the list of indices of query samples associated with that scene.
- `meta`: a dictionary containing metadata about the dataset split, including:
- `num_maps`: the total number of unique scenes included in the split.
- `num_queries`: the total number of query images included in the split.
### Map Data
Map data is stored in Arrow files under `./caches/maps/`, which contain the following fields:
- `id`: a unique identifier for each scene (e.g., "scene0575_00").
- `primitives`: a list of planar primitives in that scene, where each primitive includes:
- `params`: the plane parameters in the world coordinate space, represented as a list of four values $[a, b, c, d]$ corresponding to the plane equation $ax + by + cz = d$. These parameters are normalized such that the normal vector $\mathbf{n} =(a, b, c)^T$ is a unit vector.
- `verts_2d`: 3D vertices of each planar primitive are projected to 2D using a projection matrix $\mathbf{J}$ and stored as a nested list of 2D coordinates.
- `proj_mat`: the projection matrix $\mathbf{J}$ used to project coplanar 3D vertices to 2D. The 3D vertices can restored by multiplying the 2D vertices with the transpose of the projection matrix: $\mathbf{p}_{3d} = \mathbf{p}_{2d} \times \mathbf{J}^T + \mathbf{n}\cdot d$
- `faces`: the mesh faces of the planar primitive, represented as a nested list of vertex indices.
### Query Data
Query data is stored in Arrow files under "./caches/queries/", which contain the following fields:
- `id`: a unique identifier for each query sample, typically in the format of `f"{map_id}_{view_id}"` (e.g., "scene0575_00_000000").
- `map_id`: a unique identifier for the scene to which the query view belongs (e.g., "scene0575_00").
- `image`: the RGB image of the query view, encoded as bytes (JPEG format), with a fixed resolution of 480×640 (H×W).
- `depth`: the raw depth map of the query view, encoded as bytes (PNG format). The depth values are stored as 16-bit unsigned integers, where the actual depth in meters can be obtained by dividing the stored value by 1000. For example, a stored value of 1500 corresponds to a depth of 1.5 meters. Not used in PlanaReLoc's default pipeline. The depth map shares the same resolution (480×640, H×W) as the RGB image and is pre-aligned, so no additional geometric transformation is required.
- `depth_from_plane`: the depth map derived from plane parameters in the query space, encoded as bytes (PNG format). Similar to the depth map, the plane depth values are stored as 16-bit unsigned integers and can be converted to meters by dividing by 1000. Used in the first training phase of PlanaReLoc.
- `pose_c2w`: the **camera-to-world** transformation matrix of the query view, represented as a 4×4 nested list, with the translation component in meters. This is provided as the ground truth and is used only for training and evaluation purposes, not for inference.
- `K`: the intrinsic matrix of the query view, represented as a 3×3 nested list in the form of `[[fx, 0, cx], [0, fy, cy], [0, 0, 1]]`.
- `plane_annos`: a list of plane annotations for the query view, which is provided as the ground truth and is used only for training and evaluation purposes, not for inference. Each annotation corresponds to an observed plane in the query view and includes:
- `rle`: the run-length encoding of the plane mask, which can be decoded by `pycocotools.mask.decode()` to obtain the binary mask of the plane in the query view.
- `params_c`: the plane parameters in the camera coordinate system, represented as a list of four values $[a, b, c, d]$ corresponding to the plane equation $ax + by + cz = d$. These parameters are normalized such that the normal vector $\mathbf{n} =(a, b, c)^T$ is a unit vector.
- `params_w`: the plane parameters in the world coordinate system.
- `map_prim_id`: the index of the corresponding planar primitive in the map, which is be used to establish correspondences between query primitives and map primitives.
## 5. Uses
### How to download?
```sh
# change to the directory where you want to store the dataset, e.g.,
mkdir datasets && cd datasets
# download datasets with huggingface-cli
hf download hanchiao/PlanaReLoc --repo-type dataset --local-dir .
```
### How to use?
```python
from typing import Literal
from datasets import load_dataset
# specify the dataset and the split, e.g., if you want to load the test split of the scannet dataset:
dataset: Literal["scannet", "s12scenes"] = "scannet" # or "s12scenes"
split: Literal["train", "test", "val"] = "test" # or "train", "val"
cache_path = f"datasets/{dataset}_planareloc_dataset/caches/"
queries = load_dataset(
"arrow",
data_files={
split: cache_path + "queries/{split}_*.arrow"
},
# cache_dir=".cache/huggingface/datasets" # specify the cache directory if needed
)
maps = load_dataset(
"arrow",
data_files={
split: cache_path + "maps/{split}_*.arrow"
},
# cache_dir=".cache/huggingface/datasets" # specify the cache directory if needed
)
# generate dict to mapping from identifiers to indices in the loaded dataset for retrieval
q_key2idx={k: i for i, k in enumerate(queries["id"])}
m_key2idx={k: i for i, k in enumerate(maps["id"])}
```
You can retrieve any query sample that is recorded in the JSON file for that split, e.g., `cache_set_test_split.json` for the test split:
```python
import json
# load the JSON file for the specified dataset and split
json_file = f"datasets/{dataset}_planareloc_dataset/cache_set_{split}_split.json"
with open(json_file, "r") as f:
summary = json.load(f)
for d in summary["queries"]:
query = queries[q_key2idx[d]]
scene_map = maps[m_key2idx[query["map_id"]]]
... # use the retrieved data for training or evaluation
```
Then, to use the retrieved query and map data, refer to the field descriptions in the [Map Data](#map-data) and [Query Data](#query-data) sections above. For example, you can decode the RGB image, depth map and plane masks of a query sample as follows:
```python
import cv2
import numpy as np
from pycocotools import mask as cocomask
image = cv2.imdecode(np.frombuffer(query['image'], dtype=np.uint8), cv2.IMREAD_COLOR)
depth = cv2.imdecode(np.frombuffer(query['depth'], dtype=np.uint8), cv2.IMREAD_UNCHANGED).astype(np.float32) / 1000.0
pan_seg_gt = np.full(image.shape[:2], -1, dtype=np.int32) # (H, W), -1 for non-plane pixels
for i, anno in enumerate(query['plane_annos']):
pan_seg_gt[cocomask.decode(anno["rle"]) != 0] = i
```
Moreover, you can recover the 3D vertices of each planar primitive in the map by:
```python
primitives = []
for p in scene_map["primitives"]:
params = np.array(p['params'])
verts_3d = np.array(p['verts_2d']) @ np.array(p['proj_mat']).T + params[:3] * params[3] # (N, 3)
new_p = {
"params": params,
"verts_3d": verts_3d,
"faces": np.array(p['faces'])
}
primitives.append(new_p)
```
## 6. Annotations
### Annotation process
- [ ] To be updated
---
许可证:CC-BY-NC-SA-3.0
规模类别:
- 10B < 样本数 < 100B
---
# 数据集组织
## 1. 概述
PlanaReLoc数据集是为面向平面的流水线(plane-centric pipeline)的相机重定位任务打造的,相关方法出自论文《PlanaReLoc:基于区域结构匹配的三维平面基元相机重定位》,链接:https://arxiv.org/abs/2603.20818。该数据集包含多个场景,每个场景均由**多个三维平面基元(3D planar primitives)组成的无纹理地图**表示。每个场景均配有一组RGB图像作为查询样本(即待重定位的图像),每张查询图像均附带平面分割、相机位姿(camera pose)等真值标注。本数据集的设计初衷是聚焦平面基元,探究在结构化环境中使用三维平面地图实现更轻量化的相机重定位的可行性。
请注意,本数据集基于ScanNet与12Scenes数据集构建。**用户在使用PlanaReLoc数据集前,需同意并遵守上述两个数据集的使用条款。**
## 2. 数据集资源
[](https://github.com/3dv-casia/PlanaReLoc) <span style="display:inline-block;width:2px;height:20px;background:#d0d0d0;vertical-align:top;margin:0 10px;"></span> [](https://arxiv.org/abs/2603.20818)
## 3. 数据集结构
数据集包含两个部分:
1. `scannet_planareloc_dataset`:基于ScanNet构建,包含1210个场景,其中训练集配有45802张查询图像,测试/验证集包含303个场景与7735张查询图像,总大小约17.2GB。
2. `s12scenes_planareloc_dataset`:基于12Scenes构建,仅包含12个场景与1023张查询图像,用于跨数据集评估,总大小约350MB。
以下为`scannet_planareloc_dataset`的目录结构:
scannet_planareloc_dataset/
├── caches/ # 批量存储为Arrow块以实现高效加载
│ ├── maps/
│ │ ├── train_scene0000_00-scene0564_02.arrow # 1210个场景的地图数据
│ │ ├── test_scene0575_00-scene0706_00.arrow # 303个测试场景的地图数据
│ │ └── val_scene0581_00-scene0698_00.arrow # 3个验证场景的地图数据
│ └── queries/
│ ├── train_000_003999.arrow
│ ├── train_001_007999.arrow
│ ├── ...
│ ├── test_000_003999.arrow
│ ├── test_001_007734.arrow
│ └── val_000_000103.arrow # 来自3个场景的104张验证用查询图像
├── map_glbs # 以GLB格式存储,用于可视化与可选使用
│ ├── scenexxxx_xx.glb
│ └── ...
├── cache_set_val_split.json # 记录不同划分下地图与查询样本的标识符的JSON文件
├── cache_set_test_split.json
└── cache_set_train_split.json
`s12scenes_planareloc_dataset`的目录结构与上述一致。
## 4. 数据集详情
### 数据集划分用JSON文件
命名格式为`cache_set_{split}_split.json`的JSON文件(例如`cache_set_train_split.json`)会记录不同数据集划分(训练集、测试集、验证集)中包含的地图与查询样本的标识符,用于从`./caches/`下的Arrow文件中检索数据。每个JSON文件包含以下字段:
- `queries`:当前数据划分中包含的查询样本标识符列表。每个标识符对应唯一的查询图像,格式通常为`f"{map_id}_{view_id}"`(例如"scene0575_00_000000")。
- `maps`:当前数据划分中每个场景的元信息列表,包含每个场景的唯一标识符(例如"scene0575_00"),以及该场景关联的查询样本数量与索引列表。
- `meta`:包含当前数据集划分元数据的字典,具体字段包括:
- `num_maps`:当前划分中包含的唯一场景总数。
- `num_queries`:当前划分中包含的查询图像总数。
### 地图数据
地图数据存储于`./caches/maps/`下的Arrow文件中,包含以下字段:
- `id`:每个场景的唯一标识符(例如"scene0575_00")。
- `primitives`:该场景中的平面基元列表,每个基元包含以下信息:
- `params`:世界坐标系下的平面参数,以四元组$[a, b, c, d]$表示,对应平面方程$ax + by + cz = d$。参数已归一化,确保法向量$mathbf{n} =(a, b, c)^T$为单位向量。
- `verts_2d`:每个平面基元的三维顶点通过投影矩阵$mathbf{J}$投影至二维平面后存储的嵌套列表,存储二维坐标。
- `proj_mat`:用于将共面三维顶点投影至二维平面的投影矩阵$mathbf{J}$。可通过将二维顶点与投影矩阵的转置相乘并加上$mathbf{n}cdot d$恢复三维顶点:$mathbf{p}_{3d} = mathbf{p}_{2d} imes mathbf{J}^T + mathbf{n}cdot d$
- `faces`:平面基元的网格面,以顶点索引的嵌套列表表示。
### 查询数据
查询数据存储于`./caches/queries/`下的Arrow文件中,包含以下字段:
- `id`:每个查询样本的唯一标识符,格式通常为`f"{map_id}_{view_id}"`(例如"scene0575_00_000000")。
- `map_id`:该查询视图所属场景的唯一标识符(例如"scene0575_00")。
- `image`:查询视图的RGB图像,以字节形式编码(JPEG格式),分辨率固定为480×640(高×宽)。
- `depth`:查询视图的原始深度图,以字节形式编码(PNG格式)。深度值以16位无符号整数存储,实际深度(单位:米)可通过将存储值除以1000得到,例如存储值1500对应深度1.5米。PlanaReLoc默认流水线未使用该字段。该深度图与RGB图像分辨率一致(480×640,高×宽)且已完成预对齐,无需额外几何变换。
- `depth_from_plane`:基于查询空间内平面参数生成的深度图,以字节形式编码(PNG格式)。与原始深度图类似,平面深度值以16位无符号整数存储,可通过除以1000转换为米制单位。PlanaReLoc第一阶段训练会使用该字段。
- `pose_c2w`:该查询视图的**相机到世界坐标系(camera-to-world)**变换矩阵,以4×4嵌套列表表示,平移分量单位为米。该字段作为真值仅用于训练与评估,不可用于推理。
- `K`:查询视图的内参矩阵,以3×3嵌套列表表示,格式为`[[fx, 0, cx], [0, fy, cy], [0, 0, 1]]`。
- `plane_annos`:该查询视图的平面标注列表,作为真值仅用于训练与评估,不可用于推理。每个标注对应查询视图中观测到的一个平面,包含以下信息:
- `rle`:平面掩码的游程编码(run-length encoding),可通过`pycocotools.mask.decode()`解码得到查询视图中该平面的二值掩码。
- `params_c`:相机坐标系下的平面参数,以四元组$[a, b, c, d]$表示,对应平面方程$ax + by + cz = d$。参数已归一化,确保法向量$mathbf{n} =(a, b, c)^T$为单位向量。
- `params_w`:世界坐标系下的平面参数。
- `map_prim_id`:该平面在地图中对应基元的索引,用于建立查询基元与地图基元之间的对应关系。
## 5. 使用说明
### 下载方式
sh
# 切换至您希望存储数据集的目录,例如:
mkdir datasets && cd datasets
# 使用huggingface-cli下载数据集
hf download hanchiao/PlanaReLoc --repo-type dataset --local-dir .
### 使用方法
python
from typing import Literal
from datasets import load_dataset
# 指定数据集与划分,例如若需加载ScanNet数据集的测试划分:
dataset: Literal["scannet", "s12scenes"] = "scannet" # 或 "s12scenes"
split: Literal["train", "test", "val"] = "test" # 或 "train", "val"
cache_path = f"datasets/{dataset}_planareloc_dataset/caches/"
queries = load_dataset(
"arrow",
data_files={
split: cache_path + "queries/{split}_*.arrow"
},
# cache_dir=".cache/huggingface/datasets" # 如需可指定缓存目录
)
maps = load_dataset(
"arrow",
data_files={
split: cache_path + "maps/{split}_*.arrow"
},
# cache_dir=".cache/huggingface/datasets" # 如需可指定缓存目录
)
# 生成标识符到加载后数据集索引的映射字典,以便检索数据
q_key2idx={k: i for i, k in enumerate(queries["id"])}
m_key2idx={k: i for i, k in enumerate(maps["id"])}
您可通过该划分对应的JSON文件(例如测试划分对应的`cache_set_test_split.json`)检索任意查询样本:
python
import json
# 加载指定数据集与划分对应的JSON文件
json_file = f"datasets/{dataset}_planareloc_dataset/cache_set_{split}_split.json"
with open(json_file, "r") as f:
summary = json.load(f)
for d in summary["queries"]:
query = queries[q_key2idx[d]]
scene_map = maps[m_key2idx[query["map_id"]]]
... # 使用检索到的数据进行训练或评估
随后,若需使用检索到的查询与地图数据,请参考上文[地图数据](#map-data)与[查询数据](#query-data)部分的字段说明。例如,可按以下方式解码查询样本的RGB图像、深度图与平面掩码:
python
import cv2
import numpy as np
from pycocotools import mask as cocomask
# 解码RGB图像
image = cv2.imdecode(np.frombuffer(query['image'], dtype=np.uint8), cv2.IMREAD_COLOR)
# 解码深度图并转换为米制单位
depth = cv2.imdecode(np.frombuffer(query['depth'], dtype=np.uint8), cv2.IMREAD_UNCHANGED).astype(np.float32) / 1000.0
# 初始化平面分割真值图,-1表示非平面像素
pan_seg_gt = np.full(image.shape[:2], -1, dtype=np.int32) # (H, W)
for i, anno in enumerate(query['plane_annos']):
pan_seg_gt[cocomask.decode(anno["rle"]) != 0] = i
此外,您可通过以下方式恢复地图中每个平面基元的三维顶点:
python
primitives = []
for p in scene_map["primitives"]:
params = np.array(p['params'])
# 恢复三维顶点
verts_3d = np.array(p['verts_2d']) @ np.array(p['proj_mat']).T + params[:3] * params[3] # (N, 3)
new_p = {
"params": params,
"verts_3d": verts_3d,
"faces": np.array(p['faces'])
}
primitives.append(new_p)
## 6. 标注信息
### 标注流程
- [ ] 待更新
提供机构:
Hanchiao



