pixmo-point-explanations

Name: pixmo-point-explanations
Creator: maas
Published: 2025-11-27 16:35:11
License: 暂无描述

魔搭社区2025-11-27 更新2025-02-15 收录

下载链接：

https://modelscope.cn/datasets/allenai/pixmo-point-explanations

下载链接

链接失效反馈

官方服务：

资源简介：

# PixMo-Point-Explanations PixMo-Point-Explanations is a dataset of images, questions, and answers with explanations that can include in-line points that refer to parts of the image. It can be used to train vison language models to respond to questions through a mixture of text and points. PixMo-Point-Explanations is part of the [PixMo dataset collection](https://huggingface.co/collections/allenai/pixmo-674746ea613028006285687b) and was used to train the [Molmo family of models](https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19) We consider this dataset experimental, while these explanations can be very informative we have also seen models can hallucinate more when generating outputs of this sort. For that reason, the Molmo models are trained to only generate outputs like this when specifically requested by prefixing input questions with "point_qa:". This mode can be used in the [Molmo demo](https://multimodal-29mpz7ym.vercel.app/share/2921825e-ef44-49fa-a6cb-1956da0be62a) Quick links: - 📃 [Paper](https://molmo.allenai.org/paper.pdf) - 🎥 [Blog with Videos](https://molmo.allenai.org/blog) ## Loading ```python data = datasets.load_dataset("allenai/pixmo-point-explanations") ``` ## Data Format Images are stored as URLs. The in-line points use a format from the LLM/annotators that does not exactly match the Molmo format. The data includes some fields derived from these responses to make them easier to parse, these fields can be null if the original response was not parsed. - `parsed_response` responses with the text "<|POINT|>" where the inline point annotations were - `alt_text` the alt text for each point annotation in the response - `inline_text` the inline text for each point annotation in the response - `points` the list-of-list of points for each point annotation ## Checking Image Hashes Image hashes are included to support double-checking that the downloaded image matches the annotated image. It can be checked like this: ```python from hashlib import sha256 import requests example = data[0] image_bytes = requests.get(example["image_url"]).content byte_hash = sha256(image_bytes).hexdigest() assert byte_hash == example["image_sha256"] ``` ## License This dataset is licensed under ODC-BY-1.0. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use). This dataset includes data generated from Claude which are subject to Anthropic [terms of service](https://www.anthropic.com/legal/commercial-terms) and [usage policy](https://www.anthropic.com/legal/aup).

# PixMo-Point-Explanations PixMo-Point-Explanations 是一个包含图像、问题、答案与解释的数据集，其解释内容可包含指向图像局部区域的行内标注点。该数据集可用于训练视觉语言模型（Vision Language Model），使其能够结合文本与标注点来响应查询任务。 PixMo-Point-Explanations 属于[PixMo数据集合集](https://huggingface.co/collections/allenai/pixmo-674746ea613028006285687b)，曾被用于训练[Molmo系列模型](https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19)。本数据集为实验性质产物。尽管此类解释具备较高信息价值，但我们也观察到，模型在生成此类格式的输出时更容易产生幻觉问题。为此，Molmo系列模型仅在用户以"point_qa:"作为输入问题前缀时，才会生成此类输出。该模式可在[Molmo演示页面](https://multimodal-29mpz7ym.vercel.app/share/2921825e-ef44-49fa-a6cb-1956da0be62a)中使用。快速链接： - 📃 [论文](https://molmo.allenai.org/paper.pdf) - 🎥 [带视频的博客](https://molmo.allenai.org/blog) ## 加载方式 python data = datasets.load_dataset("allenai/pixmo-point-explanations") ## 数据格式图像以URL形式存储。行内标注点采用适配大语言模型（LLM/Large Language Model）与标注人员的格式，与Molmo模型的格式不完全一致。数据集包含若干从原始响应衍生而来的字段，以简化解析流程；若原始响应未被成功解析，这些字段的值将为null。 - `parsed_response`：包含"<|POINT|>"标记的响应，该标记用于标识行内标注点的位置 - `alt_text`：响应中每个标注点对应的替代文本（alt文本） - `inline_text`：响应中每个标注点对应的行内文本 - `points`：对应每个标注点的点坐标嵌套列表 ## 图像哈希校验数据集附带图像哈希值，用于验证下载得到的图像与标注图像完全一致。校验代码如下： python from hashlib import sha256 import requests example = data[0] image_bytes = requests.get(example["image_url"]).content byte_hash = sha256(image_bytes).hexdigest() assert byte_hash == example["image_sha256"] ## 授权协议本数据集采用ODC-BY-1.0协议授权，仅可用于研究与教育用途，并需遵循艾伦人工智能研究所（Allen Institute for AI, Ai2）的[负责任使用指南](https://allenai.org/responsible-use)。本数据集包含由Claude生成的数据，此类数据受Anthropic的[服务条款](https://www.anthropic.com/legal/commercial-terms)与[使用政策](https://www.anthropic.com/legal/aup)约束。

提供机构：

maas

创建时间：

2025-05-27

搜集汇总

数据集介绍

背景与挑战

背景概述

PixMo-Point-Explanations是一个包含图像、问题和答案的数据集，答案中可嵌入指向图像部分的点状解释，用于训练视觉语言模型以混合文本和点的方式响应问题。该数据集属于PixMo集合，支持Molmo模型训练，数据以图像URL和解析后的点注释格式存储，遵循ODC-BY-1.0许可证，适用于研究教育目的。

以上内容由遇见数据集搜集并总结生成