pixmo-docs
收藏魔搭社区2026-01-06 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/allenai/pixmo-docs
下载链接
链接失效反馈官方服务:
资源简介:
# PixMo-Docs
We now recommend using [CoSyn-400k](https://huggingface.co/datasets/allenai/CoSyn-400k) and [CoSyn-point](https://huggingface.co/datasets/allenai/CoSyn-point) over these
datasets. They are improved versions with more images categories and an improved generation pipeline.
PixMo-Docs is a collection of synthetic question-answer pairs about various kinds of computer-generated images, including charts, tables, diagrams, and documents.
The data was created by using the [Claude large language model](https://claude.ai/) to generate code that can be executed to render an image,
and using [GPT-4o mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) to generate Q/A pairs based on the code (without using the rendered image).
The code used to generate this data is [open source](https://github.com/allenai/pixmo-docs).
PixMo-Docs is part of the [PixMo dataset collection](https://huggingface.co/collections/allenai/pixmo-674746ea613028006285687b)
and was used to train the [Molmo family of models](https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19)
Quick links:
- 📃 [Paper](https://molmo.allenai.org/paper.pdf)
- 🎥 [Blog with Videos](https://molmo.allenai.org/blog)
## Loading
The dataset has four subsets:
- `charts`: Charts and figures
- `diagrams`: Diagrams and flowcharts
- `tables: Tables
- `other`: Other kinds of documents
Use `config_name` to specify which one to load, by default `charts` will be loaded. For example:
```python
table_dataset = datasets.load_dataset("allenai/pixmo-docs", "tables", split="train")
```
## Data Format
The rendered image is included in the dataset directly:
```python
print(table_dataset[0]["image"])
# >>> PIL.PngImagePlugin.PngImageFile image mode=RGB size=2400x1200 at 0x7F362070CEB0>
```
Each image is matched with multiple question-answer pairs:
```python
for q, a in zip(table_dataset[0]["questions"]["question"], table_dataset[0]["questions"]["answer"]):
print(q, a)
# >>>
# What is the waist circumference range for adult females? 64-88 cm
# What is the weight range for children aged 2-12 years? 10-45 kg
# Is the BMI range for infants provided in the table? No
# Which age group has the highest resting heart rate range? Infants (0-1 year)
# What is the difference in lung capacity range between adolescents and elderly? Maximum difference: 0.5 L, Minimum difference: 1.5 L
# Do adult males have a higher blood pressure range than adolescents? Yes
# What is the average height of elderly females compared to male adolescents? Male adolescents are taller by 10 cm
# Does the table provide a consistent BMI range across all groups for females? Yes
# Which gender has a lower average hip circumference range among the elderly? Females have a lower average hip circumference
```
## Splits
The data is divided into validation and train splits. These splits are "unofficial" because we do not generally use this data for evaluation anyway. However,
they reflect what was used when training the Molmo models, which were only trained on the train splits.
## License
This dataset is licensed by ODC-BY-1.0. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use).
This dataset includes output images derived from code generated from Claude that are subject to Anthropic [terms of service](https://www.anthropic.com/legal/commercial-terms) and [usage policy](https://www.anthropic.com/legal/aup).
The questions were generated from GPT-4o Mini and are subject to [separate terms](https://openai.com/policies/row-terms-of-use) governing their use.
# PixMo-Docs
我们目前推荐使用[CoSyn-400k](https://huggingface.co/datasets/allenai/CoSyn-400k)与[CoSyn-point](https://huggingface.co/datasets/allenai/CoSyn-point)替代本数据集。二者为优化版本,涵盖更多图像类别,并采用了改进的生成流水线。
PixMo-Docs 是面向各类计算机生成图像的合成问答对数据集,涵盖图表、表格、示意图与文档类图像。该数据集的构建流程为:先通过[Claude大语言模型(Claude large language model)](https://claude.ai/)生成可执行的图像渲染代码,再基于该代码(无需使用渲染完成的图像),通过[GPT-4o mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/)生成对应的问答对。用于生成本数据集的代码已[开源(open source)](https://github.com/allenai/pixmo-docs)。
PixMo-Docs 隶属于[PixMo数据集合集(PixMo dataset collection)](https://huggingface.co/collections/allenai/pixmo-674746ea613028006285687b),曾被用于训练[Molmo系列模型(Molmo family of models)](https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19)。
快速链接:
- 📃 [论文](https://molmo.allenai.org/paper.pdf)
- 🎥 [带演示视频的博客](https://molmo.allenai.org/blog)
## 加载方式
该数据集包含四个子集:
- `charts`:图表与插图
- `diagrams`:示意图与流程图
- `tables`:表格
- `other`:其他类型文档
默认加载`charts`子集,可通过`config_name`参数指定需加载的子集。示例代码如下:
python
table_dataset = datasets.load_dataset("allenai/pixmo-docs", "tables", split="train")
## 数据格式
数据集直接包含渲染完成的图像:
python
print(table_dataset[0]["image"])
# >>> PIL.PngImagePlugin.PngImageFile image mode=RGB size=2400x1200 at 0x7F362070CEB0>
每张图像对应多个问答对:
python
for q, a in zip(table_dataset[0]["questions"]["question"], table_dataset[0]["questions"]["answer"]):
print(q, a)
# >>>
# What is the waist circumference range for adult females? 64-88 cm
# What is the weight range for children aged 2-12 years? 10-45 kg
# Is the BMI range for infants provided in the table? No
# Which age group has the highest resting heart rate range? Infants (0-1 year)
# What is the difference in lung capacity range between adolescents and elderly? Maximum difference: 0.5 L, Minimum difference: 1.5 L
# Do adult males have a higher blood pressure range than adolescents? Yes
# What is the average height of elderly females compared to male adolescents? Male adolescents are taller by 10 cm
# Does the table provide a consistent BMI range across all groups for females? Yes
# Which gender has a lower average hip circumference range among the elderly? Females have a lower average hip circumference
## 数据集划分
该数据集分为验证集与训练集两个划分。由于本数据集通常不用于模型评估,因此这两个划分并非官方标准划分。不过二者与训练Molmo模型时所用的数据划分一致,而Molmo模型仅基于训练集进行训练。
## 授权协议
本数据集采用ODC-BY-1.0协议进行授权,仅可用于研究与教育用途,并需遵循AI2的[负责任使用指南](https://allenai.org/responsible-use)。本数据集包含由Claude生成的代码衍生的输出图像,需遵守Anthropic的[服务条款](https://www.anthropic.com/legal/commercial-terms)与[使用政策](https://www.anthropic.com/legal/aup)。问答对由GPT-4o mini生成,其使用需遵守[单独的使用条款](https://openai.com/policies/row-terms-of-use)。
提供机构:
maas
创建时间:
2025-05-28



