MathArena/arxivlean-0326_outputs
收藏Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/MathArena/arxivlean-0326_outputs
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: problem_idx
dtype: string
- name: problem
dtype: string
- name: model_name
dtype: string
- name: model_config
dtype: string
- name: idx_answer
dtype: int64
- name: all_messages
dtype: string
- name: user_message
dtype: string
- name: answer
dtype: string
- name: input_tokens
dtype: int64
- name: output_tokens
dtype: int64
- name: cost
dtype: float64
- name: input_cost_per_tokens
dtype: float64
- name: output_cost_per_tokens
dtype: float64
- name: source
dtype: string
- name: gold_answer
dtype: string
- name: formal_statement
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 86745713
num_examples: 249
download_size: 31897727
dataset_size: 86745713
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
license: cc-by-sa-4.0
language:
- en
pretty_name: Model Outputs ArXivLean March 2026
size_categories:
- 1K<n<10K
---
### Homepage and repository
- **Homepage:** [https://matharena.ai/](https://matharena.ai/)
- **Repository:** [https://github.com/eth-sri/matharena](https://github.com/eth-sri/matharena)
### Dataset Summary
This dataset contains model answers to the questions from ArXivLean March 2026 generated using the MathArena GitHub repository.
### Data Fields
Below one can find the description of each field in the dataset.
- `problem_idx` (int): Index of the problem in the competition
- `problem` (str): Full problem statement
- `gold_answer` (str): Ground-truth answer to the question
- `model_name` (str): Name of the model as presented on the MathArena website
- `model_config` (str): Path to the config file in the MathArena Github repo
- `idx_answer` (int): Each model answered every question multiple times. This index indicates which attempt this is
- `user_message` (str): User message presented to the model. Contains a competition-specific instruction along with the problem statement
- `answer` (str): Full model answer
- `all_messages` (str): Full history of the model response.
- `correct` (bool): Indicates whether the answer is correct as evaluated by the MathArena parser
- `input_tokens` (int): Number of input tokens. Is 0 when this value is missing
- `output_tokens` (int): Number of output tokens. Is 0 when this value is missing
- `cost` (float): Total cost Is 0 when this value is missing
- `input_cost_per_tokens` (float): Cost per one million input tokens
- `output_cost_per_tokens` (float): Cost per one million output tokens
### Licensing Information
This dataset is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0). Please abide by the license when using the provided data.
### Citation Information
```
@misc{balunovic_srimatharena_2025,
title = {MathArena: Evaluating LLMs on Uncontaminated Math Competitions},
author = {Mislav Balunović and Jasper Dekoninck and Ivo Petrov and Nikola Jovanović and Martin Vechev},
copyright = {MIT},
url = {https://matharena.ai/},
publisher = {SRI Lab, ETH Zurich},
month = feb,
year = {2025},
}
```
数据集信息:
特征字段:
- 字段名:problem_idx,数据类型:字符串型
- 字段名:problem,数据类型:字符串型
- 字段名:model_name,数据类型:字符串型
- 字段名:model_config,数据类型:字符串型
- 字段名:idx_answer,数据类型:64位整数型
- 字段名:all_messages,数据类型:字符串型
- 字段名:user_message,数据类型:字符串型
- 字段名:answer,数据类型:字符串型
- 字段名:input_tokens,数据类型:64位整数型
- 字段名:output_tokens,数据类型:64位整数型
- 字段名:cost,数据类型:64位浮点型
- 字段名:input_cost_per_tokens,数据类型:64位浮点型
- 字段名:output_cost_per_tokens,数据类型:64位浮点型
- 字段名:source,数据类型:字符串型
- 字段名:gold_answer,数据类型:字符串型
- 字段名:formal_statement,数据类型:字符串型
- 字段名:correct,数据类型:布尔型
数据拆分:
- 拆分名称:train,字节数:86745713,样本数:249
下载大小:31897727
数据集总大小:86745713
配置项:
- 配置名称:default,数据文件:
- 拆分:train,路径:data/train-*
许可证:CC BY-SA 4.0
语言:
- 英语
数据集展示名称:模型输出集:ArXivLean 2026年3月
规模类别:
- 1K<n<10K
### 主页与仓库
- **主页:** [https://matharena.ai/](https://matharena.ai/)
- **仓库:** [https://github.com/eth-sri/matharena](https://github.com/eth-sri/matharena)
### 数据集概述
本数据集收录了通过MathArena GitHub仓库生成的、针对2026年3月ArXivLean赛事题目的模型解答结果。
### 数据字段说明
以下为数据集中各字段的详细说明:
- `problem_idx`(整数型):赛事题目索引
- `problem`(字符串型):完整题目描述
- `model_name`(字符串型):MathArena官网展示的模型名称
- `model_config`(字符串型):MathArena GitHub仓库中对应配置文件的路径
- `idx_answer`(整数型):每个模型会对同一题目多次作答,该索引用于标识当前为第几次尝试
- `all_messages`(字符串型):模型交互的完整对话历史
- `user_message`(字符串型):输入至模型的用户提示内容,包含赛事专属指令与题目描述
- `answer`(字符串型):模型生成的完整解答
- `input_tokens`(整数型):输入Token数量,若该值缺失则记为0
- `output_tokens`(整数型):输出Token数量,若该值缺失则记为0
- `cost`(浮点型):总调用成本,若该值缺失则记为0
- `input_cost_per_tokens`(浮点型):每百万输入Token的计费单价
- `output_cost_per_tokens`(浮点型):每百万输出Token的计费单价
- `source`(字符串型):数据来源
- `gold_answer`(字符串型):该题目的基准标准答案
- `formal_statement`(字符串型):题目正式表述
- `correct`(布尔型):标识模型解答是否通过MathArena解析器验证为正确
### 许可证信息
本数据集采用知识共享署名-相同方式共享4.0国际许可协议(CC BY-SA 4.0)进行授权,使用本数据集时请遵守该许可协议条款。
### 引用信息
@misc{balunovic_srimatharena_2025,
title = {MathArena: Evaluating LLMs on Uncontaminated Math Competitions},
author = {Mislav Balunović and Jasper Dekoninck and Ivo Petrov and Nikola Jovanović and Martin Vechev},
copyright = {MIT},
url = {https://matharena.ai/},
publisher = {SRI Lab, ETH Zurich},
month = feb,
year = {2025},
}
提供机构:
MathArena



