five

pixmo-ask-model-anything

收藏
魔搭社区2025-12-05 更新2025-02-15 收录
下载链接:
https://modelscope.cn/datasets/allenai/pixmo-ask-model-anything
下载链接
链接失效反馈
官方服务:
资源简介:
# PixMo-AskModelAnything PixMo-AskModelAnything is an instruction-tuning dataset for vision-language models. It contains human-authored question-answer pairs about diverse images with long-form answers. PixMo-AskModelAnything is a part of the [PixMo dataset collection](https://huggingface.co/collections/allenai/pixmo-674746ea613028006285687b) and was used to train the [Molmo family of models](https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19) Quick links: - 📃 [Paper](https://molmo.allenai.org/paper.pdf) - 🎥 [Blog with Videos](https://molmo.allenai.org/blog) ## Loading ```python data = datasets.load_dataset("allenai/pixmo-ask-model-anything", split="train") ``` ## Data Format Each row contains an image URL and a Q/A pair. Note the image URLs can be repeated since many images have multiple Q/A pairs. ## Image Checking Image hashes are included to support double-checking that the downloaded image matches the annotated image. It can be checked like this: ```python from hashlib import sha256 import requests example = data[0] image_bytes = requests.get(example["image_url"]).content byte_hash = sha256(image_bytes).hexdigest() assert byte_hash == example["image_sha256"] ``` ## License This dataset is licensed under ODC-BY-1.0. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use). This dataset includes data generated from Claude which are subject to Anthropic [terms of service](https://www.anthropic.com/legal/commercial-terms) and [usage policy](https://www.anthropic.com/legal/aup).

# PixMo-AskModelAnything PixMo-AskModelAnything是一款面向视觉语言模型(vision-language model)的指令微调数据集,收录了人类撰写的、针对多样化图像的长文本问答对。 PixMo-AskModelAnything隶属于[PixMo数据集合集](https://huggingface.co/collections/allenai/pixmo-674746ea613028006285687b),曾被用于训练[Molmo系列模型](https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19)。 快速访问链接: - 📃 [研究论文](https://molmo.allenai.org/paper.pdf) - 🎥 [带视频的博客文章](https://molmo.allenai.org/blog) ## 加载方式 python data = datasets.load_dataset("allenai/pixmo-ask-model-anything", split="train") ## 数据格式 每条数据样本包含一张图像的URL与一组问答对。请注意,由于单张图像可对应多组问答对,因此图像URL可能会重复出现。 ## 图像校验 为支持验证下载图像与标注图像的一致性,数据集中附带了图像的SHA256哈希值。校验代码示例如下: python from hashlib import sha256 import requests example = data[0] image_bytes = requests.get(example["image_url"]).content byte_hash = sha256(image_bytes).hexdigest() assert byte_hash == example["image_sha256"] ## 授权协议 本数据集采用ODC-BY-1.0协议进行授权,仅可用于研究与教育用途,并需遵循AI2的[负责任使用指南](https://allenai.org/responsible-use)。本数据集包含由Claude生成的数据,此类数据需遵守Anthropic的[服务条款](https://www.anthropic.com/legal/commercial-terms)与[使用政策](https://www.anthropic.com/legal/aup)。
提供机构:
maas
创建时间:
2025-05-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作