five

bhatvineet/bopask-train

收藏
Hugging Face2026-04-19 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/bhatvineet/bopask-train
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - visual-question-answering - question-answering language: - en tags: - robotics - 6dof-pose - grasping - spatial-reasoning - trajectory - depth-estimation - bop-challenge size_categories: - 10M<n<100M pretty_name: BOPASK-Train --- # BOPASK-Train **BOPASK** is a large-scale visual-question-answering dataset for robotic spatial understanding, built on top of four BOP-Challenge: **HANDAL**, **HOPE**, **LineMOD**, and **YCB-V**. This release contains **32.68 M training QA pairs** across **172 K unique RGB images**, covering **8 question types**. ## Contents | Family | QA pairs | Unique images | |----------|-----------:|--------------:| | handal | 442,729 | 4,416 | | hope | 18,323,007 | 82,229 | | linemod | 13,165,385 | 61,883 | | ycbv | 749,461 | 23,524 | | **Total**| **32,680,582** | **172,052** | ### Question-type breakdown (all families combined) | Question type | Count | |------------------------|------------:| | trajectory | 13,552,164 | | spatial_reasoning | 8,888,686 | | depth_relative | 7,256,308 | | pose | 1,438,641 | | grasp | 853,401 | | depth_absolute | 446,658 | | object_rearrangement | 157,095 | | camera (extrinsics) | 87,629 | ## Layout ``` bopask-train/ ├── handal/ │ ├── bopask-handal-train.jsonl │ ├── images/ # RGB frames (.png) │ ├── depth_maps/ # aligned depth (.png, uint16 mm) │ └── masks/ # per-object binary masks (.png) ├── hope/ (same structure) ├── linemod/ (same structure) └── ycbv/ (same structure) ``` All paths inside each jsonl are **family-relative**: - `images/<basename>.png` - `depth_maps/<basename>_depth.png` - `masks/<basename>_..._mask.png` (comma-separated if multiple objects) ## Record format (LLaVA-style) Each line of a `*-train.jsonl` is a JSON object: ```json { "conversations": [ {"from": "user", "value": "<image>\n<question>"}, {"from": "gpt", "value": "<answer>"} ], "images": ["images/scene_000000_frame_000000.png"], "depths": ["depth_maps/scene_000000_frame_000000_depth.png"], "masks": "masks/scene_000000_frame_000000_obj_000018_mask.png", "question_type": "pose", "question_subtype": "matrix", "object_id": 18 } ``` ### Field notes - `conversations`: a user/assistant turn pair. The user prompt starts with the `<image>` sentinel token used by many VLMs (e.g. LLaVA / Qwen-VL). - `images`, `depths`: lists of paths relative to the family folder. - `masks`: a single string. If multiple masks are relevant (e.g. pairwise `trajectory`, `spatial_reasoning`, `object_rearrangement` questions) they are comma-separated: `"masks/...target..._mask.png,masks/...goal..._mask.png"`. Some rows have `masks: null` when the question does not target a specific object (e.g. `camera` extrinsics). Masks are **optional** for most downstream uses. - `object_id`: integer for single-object questions, or a `"target,goal"` string for pairwise ones. Absent for `camera` questions. - `question_type`: one of `pose`, `grasp`, `camera`, `depth_absolute`, `depth_relative`, `spatial_reasoning`, `trajectory`, `object_rearrangement`. - `question_subtype`: further specifies the answer format (e.g. `matrix` / `quaternion` / `2dbbox` / `3dbbox` for `pose`; `2d` / `3d` for `trajectory`; etc.). ## Known caveats - A very small number of depth / mask files (≈0.004% of rows, mostly in LineMOD scenes 12 & 39) are absent because the originals were not recoverable. The JSONLs still reference them so you may want to handle `FileNotFoundError` gracefully in your loader. - `masks` are not strictly required for most VQA training setups; downstream users who only need RGB + depth + the conversations can safely ignore them. ## Quick start ```python import json from datasets import load_dataset ds = load_dataset( "bhatvineet/bopask-train", data_files={"handal": "handal/bopask-handal-train.jsonl", "hope": "hope/bopask-hope-train.jsonl", "linemod": "linemod/bopask-linemod-train.jsonl", "ycbv": "ycbv/bopask-ycbv-train.jsonl"}, ) print(ds) ``` Or streaming one family at a time: ```python import json path = "handal/bopask-handal-train.jsonl" with open(path) as f: for line in f: rec = json.loads(line) # rec["images"][0] is relative to the "handal/" folder ... ``` ## Citation If you use this dataset, please cite the [BOPASK](https://arxiv.org/abs/2511.16857) paper and the underlying BOP-Challenge object-pose datasets (HANDAL, HOPE, LineMOD, YCB-V). ## License Released under the MIT License for the question-answer annotations. The underlying RGB, depth, and mask assets inherit the license of their source BOP-Challenge datasets.

license: MIT协议 task_categories: - 视觉问答(visual-question-answering) - 问答(question-answering) language: - 英语(en) tags: - 机器人学(robotics) - 六自由度位姿(6dof-pose) - 抓取(grasping) - 空间推理(spatial-reasoning) - 轨迹(trajectory) - 深度估计(depth-estimation) - BOP挑战赛(bop-challenge) size_categories: - 1000万<样本量<1亿 pretty_name: BOPASK-Train --- # BOPASK-Train **BOPASK** 是一款面向机器人空间理解的大规模视觉问答(visual-question-answering)数据集,构建于四大BOP挑战赛(BOP-Challenge)数据集之上:**HANDAL**、**HOPE**、**LineMOD** 与 **YCB-V**。 本次发布的数据集包含**3268万条训练问答(QA)样本对**,覆盖**17.2万张独特RGB图像**,涵盖**8类问答题型**。 ## 数据集统计 | 数据集族 | 问答样本对数量 | 独特图像数量 | |----------|-----------:|--------------:| | handal | 442,729 | 4,416 | | hope | 18,323,007 | 82,229 | | linemod | 13,165,385 | 61,883 | | ycbv | 749,461 | 23,524 | | **总计**| **32,680,582** | **172,052** | ### 问答题型分类(全数据集族合并统计) | 题型 | 样本数量 | |------------------------|------------:| | 轨迹(trajectory) | 13,552,164 | | 空间推理(spatial_reasoning) | 8,888,686 | | 相对深度(depth_relative) | 7,256,308 | | 位姿(pose) | 1,438,641 | | 抓取(grasp) | 853,401 | | 绝对深度(depth_absolute) | 446,658 | | 物体重排(object_rearrangement) | 157,095 | | 相机(外参)(camera (extrinsics)) | 87,629 | ## 数据集结构 bopask-train/ ├── handal/ │ ├── bopask-handal-train.jsonl │ ├── images/ # RGB帧图像(格式为.png) │ ├── depth_maps/ # 对齐深度图(格式为.png,数据类型uint16,单位为毫米) │ └── masks/ # 单目标二值掩码(格式为.png) ├── hope/ (结构与handal一致) ├── linemod/ (结构与handal一致) └── ycbv/ (结构与handal一致) 所有jsonl文件内的路径均为**数据集族相对路径**: - `images/<basename>.png` - `depth_maps/<basename>_depth.png` - `masks/<basename>_..._mask.png` (若包含多个目标,路径以逗号分隔) ## 数据记录格式(LLaVA风格) 每个`*-train.jsonl`文件的每一行均为一个JSON对象: json { "conversations": [ {"from": "用户", "value": "<图像> <问题>"}, {"from": "助手", "value": "<答案>"} ], "images": ["images/scene_000000_frame_000000.png"], "depths": ["depth_maps/scene_000000_frame_000000_depth.png"], "masks": "masks/scene_000000_frame_000000_obj_000018_mask.png", "question_type": "pose", "question_subtype": "matrix", "object_id": 18 } ### 字段说明 - `conversations`: 一组用户与助手的对话轮次。用户提示以多数视觉语言模型(VLM),如LLaVA、通义千问-VL所使用的`<图像>`标记开头。 - `images`、`depths`: 相对于数据集族文件夹的路径列表。 - `masks`: 单个字符串。若涉及多个掩码(例如成对的轨迹、空间推理、物体重排类问题),掩码路径以逗号分隔:`"masks/...target..._mask.png,masks/...goal..._mask.png"`。当问题不针对特定目标(例如相机外参类问题)时,部分样本的`masks`字段为`null`。多数下游任务中掩码为**可选字段**。 - `object_id`: 单目标问题的整数值,成对问题则为以逗号分隔的`"target,goal"`字符串。相机类问题无此字段。 - `question_type`: 取值范围如下:`pose`(位姿)、`grasp`(抓取)、`camera`(相机)、`depth_absolute`(绝对深度)、`depth_relative`(相对深度)、`spatial_reasoning`(空间推理)、`trajectory`(轨迹)、`object_rearrangement`(物体重排)。 - `question_subtype`: 进一步指定答案格式,例如位姿类问题可对应`matrix`(矩阵)、`quaternion`(四元数)、`2dbbox`(二维边界框)、`3dbbox`(三维边界框);轨迹类问题可对应`2d`/`3d`等。 ## 已知注意事项 - 极少量深度图与掩码文件(约占总样本的0.004%,主要来自LineMOD数据集的场景12与39)因原始文件无法恢复而缺失。尽管JSONL文件仍会引用这些路径,建议在数据加载器中妥善处理`FileNotFoundError`异常。 - 多数视觉问答训练场景中并不严格需要掩码字段;仅需RGB图像、深度图与对话样本对的下游用户可安全忽略掩码字段。 ## 快速上手示例 python import json from datasets import load_dataset ds = load_dataset( "bhatvineet/bopask-train", data_files={"handal": "handal/bopask-handal-train.jsonl", "hope": "hope/bopask-hope-train.jsonl", "linemod": "linemod/bopask-linemod-train.jsonl", "ycbv": "ycbv/bopask-ycbv-train.jsonl"}, ) print(ds) 或流式加载单个数据集族: python import json path = "handal/bopask-handal-train.jsonl" with open(path) as f: for line in f: rec = json.loads(line) # rec["images"][0] 相对于 handal/ 文件夹的路径 ... ## 引用说明 若使用本数据集,请引用[BOPASK](https://arxiv.org/abs/2511.16857)论文以及所依赖的BOP挑战赛目标位姿数据集(HANDAL、HOPE、LineMOD、YCB-V)。 ## 许可证 问答标注部分采用MIT协议发布。原始RGB图像、深度图与掩码资源需遵循其来源BOP挑战赛数据集的许可协议。
提供机构:
bhatvineet
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作