bhatvineet/bopask-train
收藏Hugging Face2026-04-19 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/bhatvineet/bopask-train
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- visual-question-answering
- question-answering
language:
- en
tags:
- robotics
- 6dof-pose
- grasping
- spatial-reasoning
- trajectory
- depth-estimation
- bop-challenge
size_categories:
- 10M<n<100M
pretty_name: BOPASK-Train
---
# BOPASK-Train
**BOPASK** is a large-scale visual-question-answering dataset for
robotic spatial understanding, built on top of four BOP-Challenge: **HANDAL**, **HOPE**,
**LineMOD**, and **YCB-V**.
This release contains **32.68 M training QA pairs** across **172 K unique RGB images**,
covering **8 question types**.
## Contents
| Family | QA pairs | Unique images |
|----------|-----------:|--------------:|
| handal | 442,729 | 4,416 |
| hope | 18,323,007 | 82,229 |
| linemod | 13,165,385 | 61,883 |
| ycbv | 749,461 | 23,524 |
| **Total**| **32,680,582** | **172,052** |
### Question-type breakdown (all families combined)
| Question type | Count |
|------------------------|------------:|
| trajectory | 13,552,164 |
| spatial_reasoning | 8,888,686 |
| depth_relative | 7,256,308 |
| pose | 1,438,641 |
| grasp | 853,401 |
| depth_absolute | 446,658 |
| object_rearrangement | 157,095 |
| camera (extrinsics) | 87,629 |
## Layout
```
bopask-train/
├── handal/
│ ├── bopask-handal-train.jsonl
│ ├── images/ # RGB frames (.png)
│ ├── depth_maps/ # aligned depth (.png, uint16 mm)
│ └── masks/ # per-object binary masks (.png)
├── hope/ (same structure)
├── linemod/ (same structure)
└── ycbv/ (same structure)
```
All paths inside each jsonl are **family-relative**:
- `images/<basename>.png`
- `depth_maps/<basename>_depth.png`
- `masks/<basename>_..._mask.png` (comma-separated if multiple objects)
## Record format (LLaVA-style)
Each line of a `*-train.jsonl` is a JSON object:
```json
{
"conversations": [
{"from": "user", "value": "<image>\n<question>"},
{"from": "gpt", "value": "<answer>"}
],
"images": ["images/scene_000000_frame_000000.png"],
"depths": ["depth_maps/scene_000000_frame_000000_depth.png"],
"masks": "masks/scene_000000_frame_000000_obj_000018_mask.png",
"question_type": "pose",
"question_subtype": "matrix",
"object_id": 18
}
```
### Field notes
- `conversations`: a user/assistant turn pair. The user prompt starts with the
`<image>` sentinel token used by many VLMs (e.g. LLaVA / Qwen-VL).
- `images`, `depths`: lists of paths relative to the family folder.
- `masks`: a single string. If multiple masks are relevant (e.g. pairwise
`trajectory`, `spatial_reasoning`, `object_rearrangement` questions) they are
comma-separated: `"masks/...target..._mask.png,masks/...goal..._mask.png"`.
Some rows have `masks: null` when the question does not target a specific object
(e.g. `camera` extrinsics). Masks are **optional** for most downstream uses.
- `object_id`: integer for single-object questions, or a `"target,goal"` string
for pairwise ones. Absent for `camera` questions.
- `question_type`: one of `pose`, `grasp`, `camera`, `depth_absolute`,
`depth_relative`, `spatial_reasoning`, `trajectory`, `object_rearrangement`.
- `question_subtype`: further specifies the answer format (e.g.
`matrix` / `quaternion` / `2dbbox` / `3dbbox` for `pose`; `2d` / `3d` for
`trajectory`; etc.).
## Known caveats
- A very small number of depth / mask files (≈0.004% of rows, mostly in
LineMOD scenes 12 & 39) are absent because the originals were not recoverable.
The JSONLs still reference them so you may want to handle `FileNotFoundError`
gracefully in your loader.
- `masks` are not strictly required for most VQA training setups; downstream
users who only need RGB + depth + the conversations can safely ignore them.
## Quick start
```python
import json
from datasets import load_dataset
ds = load_dataset(
"bhatvineet/bopask-train",
data_files={"handal": "handal/bopask-handal-train.jsonl",
"hope": "hope/bopask-hope-train.jsonl",
"linemod": "linemod/bopask-linemod-train.jsonl",
"ycbv": "ycbv/bopask-ycbv-train.jsonl"},
)
print(ds)
```
Or streaming one family at a time:
```python
import json
path = "handal/bopask-handal-train.jsonl"
with open(path) as f:
for line in f:
rec = json.loads(line)
# rec["images"][0] is relative to the "handal/" folder
...
```
## Citation
If you use this dataset, please cite the [BOPASK](https://arxiv.org/abs/2511.16857) paper
and the underlying BOP-Challenge object-pose datasets
(HANDAL, HOPE, LineMOD, YCB-V).
## License
Released under the MIT License for the question-answer annotations.
The underlying RGB, depth, and mask assets inherit the license of their source
BOP-Challenge datasets.
license: MIT协议
task_categories:
- 视觉问答(visual-question-answering)
- 问答(question-answering)
language:
- 英语(en)
tags:
- 机器人学(robotics)
- 六自由度位姿(6dof-pose)
- 抓取(grasping)
- 空间推理(spatial-reasoning)
- 轨迹(trajectory)
- 深度估计(depth-estimation)
- BOP挑战赛(bop-challenge)
size_categories:
- 1000万<样本量<1亿
pretty_name: BOPASK-Train
---
# BOPASK-Train
**BOPASK** 是一款面向机器人空间理解的大规模视觉问答(visual-question-answering)数据集,构建于四大BOP挑战赛(BOP-Challenge)数据集之上:**HANDAL**、**HOPE**、**LineMOD** 与 **YCB-V**。
本次发布的数据集包含**3268万条训练问答(QA)样本对**,覆盖**17.2万张独特RGB图像**,涵盖**8类问答题型**。
## 数据集统计
| 数据集族 | 问答样本对数量 | 独特图像数量 |
|----------|-----------:|--------------:|
| handal | 442,729 | 4,416 |
| hope | 18,323,007 | 82,229 |
| linemod | 13,165,385 | 61,883 |
| ycbv | 749,461 | 23,524 |
| **总计**| **32,680,582** | **172,052** |
### 问答题型分类(全数据集族合并统计)
| 题型 | 样本数量 |
|------------------------|------------:|
| 轨迹(trajectory) | 13,552,164 |
| 空间推理(spatial_reasoning) | 8,888,686 |
| 相对深度(depth_relative) | 7,256,308 |
| 位姿(pose) | 1,438,641 |
| 抓取(grasp) | 853,401 |
| 绝对深度(depth_absolute) | 446,658 |
| 物体重排(object_rearrangement) | 157,095 |
| 相机(外参)(camera (extrinsics)) | 87,629 |
## 数据集结构
bopask-train/
├── handal/
│ ├── bopask-handal-train.jsonl
│ ├── images/ # RGB帧图像(格式为.png)
│ ├── depth_maps/ # 对齐深度图(格式为.png,数据类型uint16,单位为毫米)
│ └── masks/ # 单目标二值掩码(格式为.png)
├── hope/ (结构与handal一致)
├── linemod/ (结构与handal一致)
└── ycbv/ (结构与handal一致)
所有jsonl文件内的路径均为**数据集族相对路径**:
- `images/<basename>.png`
- `depth_maps/<basename>_depth.png`
- `masks/<basename>_..._mask.png` (若包含多个目标,路径以逗号分隔)
## 数据记录格式(LLaVA风格)
每个`*-train.jsonl`文件的每一行均为一个JSON对象:
json
{
"conversations": [
{"from": "用户", "value": "<图像>
<问题>"},
{"from": "助手", "value": "<答案>"}
],
"images": ["images/scene_000000_frame_000000.png"],
"depths": ["depth_maps/scene_000000_frame_000000_depth.png"],
"masks": "masks/scene_000000_frame_000000_obj_000018_mask.png",
"question_type": "pose",
"question_subtype": "matrix",
"object_id": 18
}
### 字段说明
- `conversations`: 一组用户与助手的对话轮次。用户提示以多数视觉语言模型(VLM),如LLaVA、通义千问-VL所使用的`<图像>`标记开头。
- `images`、`depths`: 相对于数据集族文件夹的路径列表。
- `masks`: 单个字符串。若涉及多个掩码(例如成对的轨迹、空间推理、物体重排类问题),掩码路径以逗号分隔:`"masks/...target..._mask.png,masks/...goal..._mask.png"`。当问题不针对特定目标(例如相机外参类问题)时,部分样本的`masks`字段为`null`。多数下游任务中掩码为**可选字段**。
- `object_id`: 单目标问题的整数值,成对问题则为以逗号分隔的`"target,goal"`字符串。相机类问题无此字段。
- `question_type`: 取值范围如下:`pose`(位姿)、`grasp`(抓取)、`camera`(相机)、`depth_absolute`(绝对深度)、`depth_relative`(相对深度)、`spatial_reasoning`(空间推理)、`trajectory`(轨迹)、`object_rearrangement`(物体重排)。
- `question_subtype`: 进一步指定答案格式,例如位姿类问题可对应`matrix`(矩阵)、`quaternion`(四元数)、`2dbbox`(二维边界框)、`3dbbox`(三维边界框);轨迹类问题可对应`2d`/`3d`等。
## 已知注意事项
- 极少量深度图与掩码文件(约占总样本的0.004%,主要来自LineMOD数据集的场景12与39)因原始文件无法恢复而缺失。尽管JSONL文件仍会引用这些路径,建议在数据加载器中妥善处理`FileNotFoundError`异常。
- 多数视觉问答训练场景中并不严格需要掩码字段;仅需RGB图像、深度图与对话样本对的下游用户可安全忽略掩码字段。
## 快速上手示例
python
import json
from datasets import load_dataset
ds = load_dataset(
"bhatvineet/bopask-train",
data_files={"handal": "handal/bopask-handal-train.jsonl",
"hope": "hope/bopask-hope-train.jsonl",
"linemod": "linemod/bopask-linemod-train.jsonl",
"ycbv": "ycbv/bopask-ycbv-train.jsonl"},
)
print(ds)
或流式加载单个数据集族:
python
import json
path = "handal/bopask-handal-train.jsonl"
with open(path) as f:
for line in f:
rec = json.loads(line)
# rec["images"][0] 相对于 handal/ 文件夹的路径
...
## 引用说明
若使用本数据集,请引用[BOPASK](https://arxiv.org/abs/2511.16857)论文以及所依赖的BOP挑战赛目标位姿数据集(HANDAL、HOPE、LineMOD、YCB-V)。
## 许可证
问答标注部分采用MIT协议发布。原始RGB图像、深度图与掩码资源需遵循其来源BOP挑战赛数据集的许可协议。
提供机构:
bhatvineet



