MathVista

Name: MathVista
Creator: maas
Published: 2026-05-16 12:14:29
License: 暂无描述

魔搭社区2026-05-16 更新2025-09-06 收录

下载链接：

https://modelscope.cn/datasets/evalscope/MathVista

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for MathVista - [Dataset Description](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#dataset-description) - [Paper Information](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#paper-information) - [Dataset Examples](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#dataset-examples) - [Leaderboard](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#leaderboard) - [Dataset Usage](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#dataset-usage) - [Data Downloading](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#data-downloading) - [Data Format](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#data-format) - [Data Visualization](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#data-visualization) - [Data Source](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#data-source) - [Automatic Evaluation](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#automatic-evaluation) - [License](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#license) - [Citation](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#citation) ## Dataset Description **MathVista** is a consolidated Mathematical reasoning benchmark within Visual contexts. It consists of **three newly created datasets, IQTest, FunctionQA, and PaperQA**, which address the missing visual domains and are tailored to evaluate logical reasoning on puzzle test figures, algebraic reasoning over functional plots, and scientific reasoning with academic paper figures, respectively. It also incorporates **9 MathQA datasets** and **19 VQA datasets** from the literature, which significantly enrich the diversity and complexity of visual perception and mathematical reasoning challenges within our benchmark. In total, **MathVista** includes **6,141 examples** collected from **31 different datasets**. ## Paper Information - Paper: https://arxiv.org/abs/2310.02255 - Code: https://github.com/lupantech/MathVista - Project: https://mathvista.github.io/ - Visualization: https://mathvista.github.io/#visualization - Leaderboard: https://mathvista.github.io/#leaderboard ## Dataset Examples Examples of our newly annotated datasets: IQTest, FunctionQA, and PaperQA: <img src="https://raw.githubusercontent.com/lupantech/MathVista/main/assets/our_new_3_datasets.png" style="zoom:40%;" /> <details> <summary>🔍 Click to expand/collapse more examples</summary> Examples of seven mathematical reasoning skills: 1. Arithmetic Reasoning <img src="https://raw.githubusercontent.com/lupantech/MathVista/main/assets/skills/ari.png" style="zoom:40%;" /> 2. Statistical Reasoning <img src="https://raw.githubusercontent.com/lupantech/MathVista/main/assets/skills/sta.png" style="zoom:40%;" /> 3. Algebraic Reasoning <img src="https://raw.githubusercontent.com/lupantech/MathVista/main/assets/skills/alg.png" style="zoom:40%;" /> 4. Geometry Reasoning <img src="https://raw.githubusercontent.com/lupantech/MathVista/main/assets/skills/geo.png" style="zoom:40%;" /> 5. Numeric common sense <img src="https://raw.githubusercontent.com/lupantech/MathVista/main/assets/skills/num.png" style="zoom:40%;" /> 6. Scientific Reasoning <img src="https://raw.githubusercontent.com/lupantech/MathVista/main/assets/skills/sci.png" style="zoom:40%;" /> 7. Logical Reasoning <img src="https://raw.githubusercontent.com/lupantech/MathVista/main/assets/skills/log.png" style="zoom:40%;" /> </details> ## Leaderboard 🏆 The leaderboard for the *testmini* set (1,000 examples) is available [here](https://mathvista.github.io/#leaderboard). 🏆 The leaderboard for the *test* set (5,141 examples) and the automatic evaluation on [CodaLab](https://codalab.org/) are under construction. ## Dataset Usage ### Data Downloading All the data examples were divided into two subsets: *testmini* and *test*. - **testmini**: 1,000 examples used for model development, validation, or for those with limited computing resources. - **test**: 5,141 examples for standard evaluation. Notably, the answer labels for test will NOT be publicly released. You can download this dataset by the following command (make sure that you have installed [Huggingface Datasets](https://huggingface.co/docs/datasets/quickstart)): ```python from datasets import load_dataset dataset = load_dataset("AI4Math/MathVista") ``` Here are some examples of how to access the downloaded dataset: ```python # print the first example on the testmini set print(dataset["testmini"][0]) print(dataset["testmini"][0]['pid']) # print the problem id print(dataset["testmini"][0]['question']) # print the question text print(dataset["testmini"][0]['query']) # print the query text print(dataset["testmini"][0]['image']) # print the image path print(dataset["testmini"][0]['answer']) # print the answer dataset["testmini"][0]['decoded_image'] # display the image # print the first example on the test set print(dataset["test"][0]) ``` ### Data Format The dataset is provided in json format and contains the following attributes: ```json { "question": [string] The question text, "image": [string] A file path pointing to the associated image, "choices": [list] Choice options for multiple-choice problems. For free-form problems, this could be a 'none' value, "unit": [string] The unit associated with the answer, e.g., "m^2", "years". If no unit is relevant, it can be a 'none' value, "precision": [integer] The number of decimal places the answer should be rounded to, "answer": [string] The correct answer for the problem, "question_type": [string] The type of question: "multi_choice" or "free_form", "answer_type": [string] The format of the answer: "text", "integer", "float", or "list", "pid": [string] Problem ID, e.g., "1", "metadata": { "split": [string] Data split: "testmini" or "test", "language": [string] Question language: "English", "Chinese", or "Persian", "img_width": [integer] The width of the associated image in pixels, "img_height": [integer] The height of the associated image in pixels, "source": [string] The source dataset from which the problem was taken, "category": [string] The category of the problem: "math-targeted-vqa" or "general-vqa", "task": [string] The task of the problem, e.g., "geometry problem solving", "context": [string] The visual context type of the associated image, "grade": [string] The grade level of the problem, e.g., "high school", "skills": [list] A list of mathematical reasoning skills that the problem tests }, "query": [string] the query text used as input (prompt) for the evaluation model } ``` ### Data Visualization 🎰 You can explore the dataset in an interactive way [here](https://mathvista.github.io/#visualization). <details> <summary>Click to expand/collapse the visualization page screeshot.</summary> <img src="https://raw.githubusercontent.com/lupantech/MathVista/main/assets/data_visualizer.png" style="zoom:40%;" /> </details> ### Data Source The **MathVista** dataset is derived from three newly collected datasets: IQTest, FunctionQA, and Paper, as well as 28 other source datasets. Details can be found in the [source.json](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/source.json) file. All these source datasets have been preprocessed and labeled for evaluation purposes. ### Automatic Evaluation 🔔 To automatically evaluate a model on the dataset, please refer to our GitHub repository [here](https://github.com/lupantech/MathVista/tree/main). ## License The new contributions to our dataset are distributed under the [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) license, including - The creation of three datasets: IQTest, FunctionQA, and Paper; - The filtering and cleaning of source datasets; - The standard formalization of instances for evaluation purposes; - The annotations of metadata. The copyright of the images and the questions belongs to the original authors, and the source of every image and original question can be found in the `metadata` field and in the [source.json](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/source.json) file. Alongside this license, the following conditions apply: - **Purpose:** The dataset was primarily designed for use as a test set. - **Commercial Use:** The dataset can be used commercially as a test set, but using it as a training set is prohibited. By accessing or using this dataset, you acknowledge and agree to abide by these terms in conjunction with the [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) license. ## Citation If you use the **MathVista** dataset in your work, please kindly cite the paper using this BibTeX: ``` @inproceedings{lu2024mathvista, author = {Lu, Pan and Bansal, Hritik and Xia, Tony and Liu, Jiacheng and Li, Chunyuan and Hajishirzi, Hannaneh and Cheng, Hao and Chang, Kai-Wei and Galley, Michel and Gao, Jianfeng}, title = {MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts}, booktitle = {International Conference on Learning Representations (ICLR)}, year = {2024} } ```

# MathVista 数据集卡片 - [数据集说明](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#dataset-description) - [论文信息](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#paper-information) - [数据集示例](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#dataset-examples) - [排行榜](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#leaderboard) - [数据集使用](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#dataset-usage) - [数据下载](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#data-downloading) - [数据格式](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#data-format) - [数据可视化](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#data-visualization) - [数据来源](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#data-source) - [自动评估](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#automatic-evaluation) - [许可协议](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#license) - [引用方式](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/README.md#citation) ## 数据集说明 **MathVista** 是一款面向视觉语境的数学推理基准测试集。它包含三个全新构建的数据集：IQTest、FunctionQA 与 PaperQA，分别填补了此前缺失的视觉领域空白，针对性地用于评估谜题测试图的逻辑推理能力、函数图表上的代数推理能力，以及学术论文图表的科学推理能力。此外，本基准还纳入了9个数学问答（MathQA）数据集与19个视觉问答（Visual Question Answering，VQA）数据集，极大丰富了本基准中视觉感知与数学推理挑战的多样性与复杂度。总体而言，**MathVista** 包含来自31个不同数据集的共计6141个样本。 ## 论文信息 - 论文：https://arxiv.org/abs/2310.02255 - 代码仓库：https://github.com/lupantech/MathVista - 项目主页：https://mathvista.github.io/ - 可视化页面：https://mathvista.github.io/#visualization - 排行榜：https://mathvista.github.io/#leaderboard ## 数据集示例我们新标注的数据集示例：IQTest、FunctionQA 与 PaperQA： <img src="https://raw.githubusercontent.com/lupantech/MathVista/main/assets/our_new_3_datasets.png" style="zoom:40%;" /> <details> <summary>🔍 点击展开/收起更多示例</summary> 七种数学推理技能的示例： 1. 算术推理（Arithmetic Reasoning） <img src="https://raw.githubusercontent.com/lupantech/MathVista/main/assets/skills/ari.png" style="zoom:40%;" /> 2. 统计推理（Statistical Reasoning） <img src="https://raw.githubusercontent.com/lupantech/MathVista/main/assets/skills/sta.png" style="zoom:40%;" /> 3. 代数推理（Algebraic Reasoning） <img src="https://raw.githubusercontent.com/lupantech/MathVista/main/assets/skills/alg.png" style="zoom:40%;" /> 4. 几何推理（Geometry Reasoning） <img src="https://raw.githubusercontent.com/lupantech/MathVista/main/assets/skills/geo.png" style="zoom:40%;" /> 5. 数值常识 <img src="https://raw.githubusercontent.com/lupantech/MathVista/main/assets/skills/num.png" style="zoom:40%;" /> 6. 科学推理（Scientific Reasoning） <img src="https://raw.githubusercontent.com/lupantech/MathVista/main/assets/skills/sci.png" style="zoom:40%;" /> 7. 逻辑推理（Logical Reasoning） <img src="https://raw.githubusercontent.com/lupantech/MathVista/main/assets/skills/log.png" style="zoom:40%;" /> </details> ## 排行榜 🏆 针对 *testmini* 子集（1000个样本）的排行榜已上线，详见[此处](https://mathvista.github.io/#leaderboard)。 🏆 针对 *test* 子集（5141个样本）的排行榜以及在 [CodaLab](https://codalab.org/) 上的自动评估功能正在开发中。 ## 数据集使用 ### 数据下载所有数据样本被划分为两个子集：*testmini* 与 *test*。 - **testmini**：包含1000个样本，用于模型开发、验证，或面向计算资源有限的使用者。 - **test**：包含5141个样本，用于标准评估。值得注意的是，*test* 子集的答案标签不会对外公开。您可通过以下命令下载本数据集（请确保已安装 [Huggingface Datasets](https://huggingface.co/docs/datasets/quickstart)）： python from datasets import load_dataset dataset = load_dataset("AI4Math/MathVista") 以下为访问下载后数据集的示例代码： python # 打印 testmini 集的第一个样本 print(dataset["testmini"][0]) print(dataset["testmini"][0]['pid']) # 打印问题ID print(dataset["testmini"][0]['question']) # 打印问题文本 print(dataset["testmini"][0]['query']) # 打印查询文本 print(dataset["testmini"][0]['image']) # 打印图像路径 print(dataset["testmini"][0]['answer']) # 打印答案 dataset["testmini"][0]['decoded_image'] # 展示图像 # 打印 test 集的第一个样本 print(dataset["test"][0]) ### 数据格式本数据集以JSON格式提供，包含以下属性： json { "question": [string] 问题文本, "image": [string] 指向关联图像的文件路径, "choices": [list] 选择题的选项列表。对于自由作答问题，该字段值为'none', "unit": [string] 答案关联的单位，例如"m^2"、"years"。若无相关单位，该字段值为'none', "precision": [integer] 答案需保留的小数位数, "answer": [string] 问题的正确答案, "question_type": [string] 问题类型："multi_choice"（选择题）或"free_form"（自由作答）, "answer_type": [string] 答案格式："text"（文本）、"integer"（整数）、"float"（浮点数）或"list", "pid": [string] 问题ID，例如"1", "metadata": { "split": [string] 数据划分："testmini"或"test", "language": [string] 问题语言："English"（英语）、"Chinese"（中文）或"Persian"（波斯语）, "img_width": [integer] 关联图像的像素宽度, "img_height": [integer] 关联图像的像素高度, "source": [string] 该问题所属的源数据集, "category": [string] 问题类别："math-targeted-vqa"（面向数学的视觉问答）或"general-vqa"（通用视觉问答）, "task": [string] 问题所属任务，例如"geometry problem solving"（几何问题求解）, "context": [string] 关联图像的视觉语境类型, "grade": [string] 问题对应的年级水平，例如"high school"（高中）, "skills": [list] 该问题考察的数学推理技能列表 }, "query": [string] 用作评估模型输入（提示）的查询文本 } ### 数据可视化 🎰 您可通过[此处](https://mathvista.github.io/#visualization)交互式探索本数据集。 <details> <summary>点击展开/收起可视化页面截图</summary> <img src="https://raw.githubusercontent.com/lupantech/MathVista/main/assets/data_visualizer.png" style="zoom:40%;" /> </details> ### 数据来源 **MathVista** 数据集源自三个新收集的数据集：IQTest、FunctionQA与PaperQA，以及另外28个源数据集。详细信息可参阅 [source.json](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/source.json) 文件。所有这些源数据集均已经过预处理并标注以用于评估。 ### 自动评估 🔔 若需在本数据集上自动评估模型，请参阅我们的GitHub仓库[此处](https://github.com/lupantech/MathVista/tree/main)。 ## 许可协议本数据集的新增贡献内容基于 [知识共享署名-相同方式共享4.0协议（CC BY-SA 4.0）](https://creativecommons.org/licenses/by-sa/4.0/) 进行分发，包括： - 三个数据集：IQTest、FunctionQA与PaperQA的构建； - 源数据集的筛选与清洗； - 用于评估的样本标准化形式化处理； - 元数据的标注工作。图像与问题的版权归原作者所有，每张图像与原始问题的来源可在`metadata`字段以及 [source.json](https://huggingface.co/datasets/AI4Math/MathVista/blob/main/source.json) 文件中查询。除本许可协议外，还需遵守以下条款： - **用途**：本数据集主要设计用作测试集。 - **商业使用**：本数据集可作为测试集进行商业使用，但禁止将其用作训练集。您访问或使用本数据集即表示您知晓并同意遵守本条款以及 [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) 协议。 ## 引用方式若您在研究工作中使用 **MathVista** 数据集，请使用以下BibTeX格式引用该论文： @inproceedings{lu2024mathvista, author = {Lu, Pan and Bansal, Hritik and Xia, Tony and Liu, Jiacheng and Li, Chunyuan and Hajishirzi, Hannaneh and Cheng, Hao and Chang, Kai-Wei and Galley, Michel and Gao, Jianfeng}, title = {MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts}, booktitle = {International Conference on Learning Representations (ICLR)}, year = {2024} }

提供机构：

maas

创建时间：

2025-09-01

搜集汇总

数据集介绍