ArxivQA

Name: ArxivQA
Creator: maas
Published: 2025-12-05 16:22:54
License: 暂无描述

魔搭社区2025-12-05 更新2025-02-15 收录

下载链接：

https://modelscope.cn/datasets/MMInstruction/ArxivQA

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for Mutlimodal Arxiv QA ## Dataset Loading Instruction Each line of the `arxivqa.jsonl` file is an example: ``` {"id": "cond-mat-2862", "image": "images/0805.4509_1.jpg", "options": ["A) The ordering temperatures for all materials are above the normalized temperature \\( T/T_c \\) of 1.2.", "B) The magnetic ordering temperatures decrease for Dy, Tb, and Ho as the normalized temperature \\( T/T_c \\) approaches 1.", "C) The magnetic ordering temperatures for all materials are the same across the normalized temperature \\( T/T_c \\).", "D) The magnetic ordering temperature is highest for Yttrium (Y) and decreases for Dy, Tb, and Ho."], "question": "What can be inferred about the magnetic ordering temperatures of the materials tested as shown in the graph?", "label": "B", "rationale": "The graph shows a sharp decline in frequency as the normalized temperature \\( T/T_c \\) approaches 1 for Dy, Tb, and Ho, indicating that their magnetic ordering temperatures decrease. No such data is shown for Yttrium (Y), thus we can't infer it has the highest magnetic ordering temperature." } ``` - Download the `arxivqa.json` and `images.tgz` to your machine. - Decompress images: `tar -xzvf images.tgz`. - Loading the dataset and process the sample according to your need. ```python3 import json with open("arxivqa.jsonl", 'r') as fr: arxiv_qa = [ json.loads(line.strip()) for line in fr] sample = arxiv_qa[0] print(sample["image"]) # image file ``` ## Dataset details **Dataset type**: ArxivQA is a set of GPT4V-generated VQA samples based on figures from Arxiv Papers. **Papers or resources for more information**: https://mm-arxiv.github.io/ **License**: CC-BY-SA-4.0; and it should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use **Intended use**: Primary intended uses: The primary use of ArxivQA is research on large multimodal models. Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

# 多模态Arxiv问答数据集卡片 ## 数据集加载说明 `arxivqa.jsonl` 文件的每一行均为一条示例数据： {"id": "cond-mat-2862", "image": "images/0805.4509_1.jpg", "options": ["A) 所有材料的有序温度均高于归一化温度 \( T/T_c \) = 1.2。", "B) 随着归一化温度 \( T/T_c \) 趋近于1，Dy、Tb与Ho的磁有序温度逐渐降低。", "C) 所有材料的磁有序温度在归一化温度 \( T/T_c \) 范围内保持一致。", "D) 钇（Y）的磁有序温度最高，Dy、Tb与Ho的磁有序温度依次降低。"], "question": "根据图示图表，可推断出被测材料的磁有序温度具有何种特征？", "label": "B", "rationale": "图表显示，对于Dy、Tb与Ho，随着归一化温度 \( T/T_c \) 趋近于1，频率出现急剧下降，表明其磁有序温度逐渐降低。图表未提供钇（Y）的相关数据，因此无法推断其磁有序温度为最高值。"} - 请将 `arxivqa.json` 与 `images.tgz` 下载至本地设备。 - 解压图像文件：`tar -xzvf images.tgz`。 - 根据需求加载数据集并处理样本。 python3 import json with open("arxivqa.jsonl", 'r') as fr: arxiv_qa = [ json.loads(line.strip()) for line in fr] sample = arxiv_qa[0] print(sample["image"]) # 图像文件路径 ## 数据集详情 **数据集类型**：ArxivQA是一套基于Arxiv论文插图、由GPT4V生成的视觉问答（Visual Question Answering, VQA）样本集。 **更多信息参考论文或资源**：https://mm-arxiv.github.io/ **授权协议**：CC-BY-SA-4.0；同时需遵守OpenAI相关政策：https://openai.com/policies/terms-of-use **预期用途**：核心用途：ArxivQA的核心用途为大语言模型（Large Language Model, LLM）与多模态模型相关研究。核心用户群体：该数据集的核心用户为计算机视觉、自然语言处理、机器学习与人工智能领域的研究人员与爱好者。

提供机构：

maas

创建时间：

2025-02-09

5,000+

优质数据集

54 个

任务类型

进入经典数据集