bhatvineet/bopask-train

Name: bhatvineet/bopask-train
Creator: bhatvineet
Published: 2026-04-19 03:56:51
License: 暂无描述

Hugging Face2026-04-19 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/bhatvineet/bopask-train

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - visual-question-answering - question-answering language: - en tags: - robotics - 6dof-pose - grasping - spatial-reasoning - trajectory - depth-estimation - bop-challenge size_categories: - 10M<n<100M pretty_name: BOPASK-Train --- # BOPASK-Train **BOPASK** is a large-scale visual-question-answering dataset for robotic spatial understanding, built on top of four BOP-Challenge: **HANDAL**, **HOPE**, **LineMOD**, and **YCB-V**. This release contains **32.68 M training QA pairs** across **172 K unique RGB images**, covering **8 question types**. ## Contents | Family | QA pairs | Unique images | |----------|-----------:|--------------:| | handal | 442,729 | 4,416 | | hope | 18,323,007 | 82,229 | | linemod | 13,165,385 | 61,883 | | ycbv | 749,461 | 23,524 | | **Total**| **32,680,582** | **172,052** | ### Question-type breakdown (all families combined) | Question type | Count | |------------------------|------------:| | trajectory | 13,552,164 | | spatial_reasoning | 8,888,686 | | depth_relative | 7,256,308 | | pose | 1,438,641 | | grasp | 853,401 | | depth_absolute | 446,658 | | object_rearrangement | 157,095 | | camera (extrinsics) | 87,629 | ## Layout ``` bopask-train/ ├── handal/ │ ├── bopask-handal-train.jsonl │ ├── images/ # RGB frames (.png) │ ├── depth_maps/ # aligned depth (.png, uint16 mm) │ └── masks/ # per-object binary masks (.png) ├── hope/ (same structure) ├── linemod/ (same structure) └── ycbv/ (same structure) ``` All paths inside each jsonl are **family-relative**: - `images/<basename>.png` - `depth_maps/<basename>_depth.png` - `masks/<basename>_..._mask.png` (comma-separated if multiple objects) ## Record format (LLaVA-style) Each line of a `*-train.jsonl` is a JSON object: ```json { "conversations": [ {"from": "user", "value": "<image>\n<question>"}, {"from": "gpt", "value": "<answer>"} ], "images": ["images/scene_000000_frame_000000.png"], "depths": ["depth_maps/scene_000000_frame_000000_depth.png"], "masks": "masks/scene_000000_frame_000000_obj_000018_mask.png", "question_type": "pose", "question_subtype": "matrix", "object_id": 18 } ``` ### Field notes - `conversations`: a user/assistant turn pair. The user prompt starts with the `<image>` sentinel token used by many VLMs (e.g. LLaVA / Qwen-VL). - `images`, `depths`: lists of paths relative to the family folder. - `masks`: a single string. If multiple masks are relevant (e.g. pairwise `trajectory`, `spatial_reasoning`, `object_rearrangement` questions) they are comma-separated: `"masks/...target..._mask.png,masks/...goal..._mask.png"`. Some rows have `masks: null` when the question does not target a specific object (e.g. `camera` extrinsics). Masks are **optional** for most downstream uses. - `object_id`: integer for single-object questions, or a `"target,goal"` string for pairwise ones. Absent for `camera` questions. - `question_type`: one of `pose`, `grasp`, `camera`, `depth_absolute`, `depth_relative`, `spatial_reasoning`, `trajectory`, `object_rearrangement`. - `question_subtype`: further specifies the answer format (e.g. `matrix` / `quaternion` / `2dbbox` / `3dbbox` for `pose`; `2d` / `3d` for `trajectory`; etc.). ## Known caveats - A very small number of depth / mask files (≈0.004% of rows, mostly in LineMOD scenes 12 & 39) are absent because the originals were not recoverable. The JSONLs still reference them so you may want to handle `FileNotFoundError` gracefully in your loader. - `masks` are not strictly required for most VQA training setups; downstream users who only need RGB + depth + the conversations can safely ignore them. ## Quick start ```python import json from datasets import load_dataset ds = load_dataset( "bhatvineet/bopask-train", data_files={"handal": "handal/bopask-handal-train.jsonl", "hope": "hope/bopask-hope-train.jsonl", "linemod": "linemod/bopask-linemod-train.jsonl", "ycbv": "ycbv/bopask-ycbv-train.jsonl"}, ) print(ds) ``` Or streaming one family at a time: ```python import json path = "handal/bopask-handal-train.jsonl" with open(path) as f: for line in f: rec = json.loads(line) # rec["images"][0] is relative to the "handal/" folder ... ``` ## Citation If you use this dataset, please cite the [BOPASK](https://arxiv.org/abs/2511.16857) paper and the underlying BOP-Challenge object-pose datasets (HANDAL, HOPE, LineMOD, YCB-V). ## License Released under the MIT License for the question-answer annotations. The underlying RGB, depth, and mask assets inherit the license of their source BOP-Challenge datasets.

license: MIT协议 task_categories: - 视觉问答（visual-question-answering） - 问答（question-answering） language: - 英语（en） tags: - 机器人学（robotics） - 六自由度位姿（6dof-pose） - 抓取（grasping） - 空间推理（spatial-reasoning） - 轨迹（trajectory） - 深度估计（depth-estimation） - BOP挑战赛（bop-challenge） size_categories: - 1000万<样本量<1亿 pretty_name: BOPASK-Train --- # BOPASK-Train **BOPASK** 是一款面向机器人空间理解的大规模视觉问答（visual-question-answering）数据集，构建于四大BOP挑战赛（BOP-Challenge）数据集之上：**HANDAL**、**HOPE**、**LineMOD** 与 **YCB-V**。本次发布的数据集包含**3268万条训练问答（QA）样本对**，覆盖**17.2万张独特RGB图像**，涵盖**8类问答题型**。 ## 数据集统计 | 数据集族 | 问答样本对数量 | 独特图像数量 | |----------|-----------:|--------------:| | handal | 442,729 | 4,416 | | hope | 18,323,007 | 82,229 | | linemod | 13,165,385 | 61,883 | | ycbv | 749,461 | 23,524 | | **总计**| **32,680,582** | **172,052** | ### 问答题型分类（全数据集族合并统计） | 题型 | 样本数量 | |------------------------|------------:| | 轨迹（trajectory） | 13,552,164 | | 空间推理（spatial_reasoning） | 8,888,686 | | 相对深度（depth_relative） | 7,256,308 | | 位姿（pose） | 1,438,641 | | 抓取（grasp） | 853,401 | | 绝对深度（depth_absolute） | 446,658 | | 物体重排（object_rearrangement） | 157,095 | | 相机（外参）（camera (extrinsics)） | 87,629 | ## 数据集结构 bopask-train/ ├── handal/ │ ├── bopask-handal-train.jsonl │ ├── images/ # RGB帧图像（格式为.png） │ ├── depth_maps/ # 对齐深度图（格式为.png，数据类型uint16，单位为毫米） │ └── masks/ # 单目标二值掩码（格式为.png） ├── hope/ （结构与handal一致） ├── linemod/ （结构与handal一致） └── ycbv/ （结构与handal一致）所有jsonl文件内的路径均为**数据集族相对路径**： - `images/<basename>.png` - `depth_maps/<basename>_depth.png` - `masks/<basename>_..._mask.png` （若包含多个目标，路径以逗号分隔） ## 数据记录格式（LLaVA风格）每个`*-train.jsonl`文件的每一行均为一个JSON对象： json { "conversations": [ {"from": "用户", "value": "<图像> <问题>"}, {"from": "助手", "value": "<答案>"} ], "images": ["images/scene_000000_frame_000000.png"], "depths": ["depth_maps/scene_000000_frame_000000_depth.png"], "masks": "masks/scene_000000_frame_000000_obj_000018_mask.png", "question_type": "pose", "question_subtype": "matrix", "object_id": 18 } ### 字段说明 - `conversations`: 一组用户与助手的对话轮次。用户提示以多数视觉语言模型（VLM），如LLaVA、通义千问-VL所使用的`<图像>`标记开头。 - `images`、`depths`: 相对于数据集族文件夹的路径列表。 - `masks`: 单个字符串。若涉及多个掩码（例如成对的轨迹、空间推理、物体重排类问题），掩码路径以逗号分隔：`"masks/...target..._mask.png,masks/...goal..._mask.png"`。当问题不针对特定目标（例如相机外参类问题）时，部分样本的`masks`字段为`null`。多数下游任务中掩码为**可选字段**。 - `object_id`: 单目标问题的整数值，成对问题则为以逗号分隔的`"target,goal"`字符串。相机类问题无此字段。 - `question_type`: 取值范围如下：`pose`（位姿）、`grasp`（抓取）、`camera`（相机）、`depth_absolute`（绝对深度）、`depth_relative`（相对深度）、`spatial_reasoning`（空间推理）、`trajectory`（轨迹）、`object_rearrangement`（物体重排）。 - `question_subtype`: 进一步指定答案格式，例如位姿类问题可对应`matrix`（矩阵）、`quaternion`（四元数）、`2dbbox`（二维边界框）、`3dbbox`（三维边界框）；轨迹类问题可对应`2d`/`3d`等。 ## 已知注意事项 - 极少量深度图与掩码文件（约占总样本的0.004%，主要来自LineMOD数据集的场景12与39）因原始文件无法恢复而缺失。尽管JSONL文件仍会引用这些路径，建议在数据加载器中妥善处理`FileNotFoundError`异常。 - 多数视觉问答训练场景中并不严格需要掩码字段；仅需RGB图像、深度图与对话样本对的下游用户可安全忽略掩码字段。 ## 快速上手示例 python import json from datasets import load_dataset ds = load_dataset( "bhatvineet/bopask-train", data_files={"handal": "handal/bopask-handal-train.jsonl", "hope": "hope/bopask-hope-train.jsonl", "linemod": "linemod/bopask-linemod-train.jsonl", "ycbv": "ycbv/bopask-ycbv-train.jsonl"}, ) print(ds) 或流式加载单个数据集族： python import json path = "handal/bopask-handal-train.jsonl" with open(path) as f: for line in f: rec = json.loads(line) # rec["images"][0] 相对于 handal/ 文件夹的路径 ... ## 引用说明若使用本数据集，请引用[BOPASK](https://arxiv.org/abs/2511.16857)论文以及所依赖的BOP挑战赛目标位姿数据集（HANDAL、HOPE、LineMOD、YCB-V）。 ## 许可证问答标注部分采用MIT协议发布。原始RGB图像、深度图与掩码资源需遵循其来源BOP挑战赛数据集的许可协议。

提供机构：

bhatvineet

5,000+

优质数据集

54 个

任务类型

进入经典数据集