allenai/pixmo-docs
收藏Hugging Face2024-12-05 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/allenai/pixmo-docs
下载链接
链接失效反馈官方服务:
资源简介:
PixMo-Docs是一个包含各种计算机生成图像(如图表、表格、图示和文档)的合成问答对集合。数据集的生成使用了Claude大型语言模型生成代码来渲染图像,并使用GPT-4o mini生成基于代码的问答对。数据集包含四种类型的图像,每种类型都有训练集和验证集。数据格式包括图像和与之匹配的多个问答对。数据集的许可证为ODC-BY-1.0,适用于研究和教育用途。
PixMo-Docs is a collection of synthetic question-answer pairs about various kinds of computer-generated images, including charts, tables, diagrams, and documents. The data was created by using the Claude large language model to generate code that can be executed to render an image, and using GPT-4o Mini to generate Q/A pairs based on the code (without using the rendered image). PixMo-Docs is part of the PixMo dataset collection and was used to train the Molmo family of models. The dataset is divided into four subsets: charts, diagrams, tables, and other. Each image is matched with multiple question-answer pairs. The dataset is divided into validation and train splits, although these splits are unofficial as this data is not generally used for evaluation. The dataset is licensed under ODC-BY-1.0 for research and educational use.
提供机构:
allenai



