five

MathArena/arxivlean-0326_outputs

收藏
Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/MathArena/arxivlean-0326_outputs
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: problem_idx dtype: string - name: problem dtype: string - name: model_name dtype: string - name: model_config dtype: string - name: idx_answer dtype: int64 - name: all_messages dtype: string - name: user_message dtype: string - name: answer dtype: string - name: input_tokens dtype: int64 - name: output_tokens dtype: int64 - name: cost dtype: float64 - name: input_cost_per_tokens dtype: float64 - name: output_cost_per_tokens dtype: float64 - name: source dtype: string - name: gold_answer dtype: string - name: formal_statement dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 86745713 num_examples: 249 download_size: 31897727 dataset_size: 86745713 configs: - config_name: default data_files: - split: train path: data/train-* license: cc-by-sa-4.0 language: - en pretty_name: Model Outputs ArXivLean March 2026 size_categories: - 1K<n<10K --- ### Homepage and repository - **Homepage:** [https://matharena.ai/](https://matharena.ai/) - **Repository:** [https://github.com/eth-sri/matharena](https://github.com/eth-sri/matharena) ### Dataset Summary This dataset contains model answers to the questions from ArXivLean March 2026 generated using the MathArena GitHub repository. ### Data Fields Below one can find the description of each field in the dataset. - `problem_idx` (int): Index of the problem in the competition - `problem` (str): Full problem statement - `gold_answer` (str): Ground-truth answer to the question - `model_name` (str): Name of the model as presented on the MathArena website - `model_config` (str): Path to the config file in the MathArena Github repo - `idx_answer` (int): Each model answered every question multiple times. This index indicates which attempt this is - `user_message` (str): User message presented to the model. Contains a competition-specific instruction along with the problem statement - `answer` (str): Full model answer - `all_messages` (str): Full history of the model response. - `correct` (bool): Indicates whether the answer is correct as evaluated by the MathArena parser - `input_tokens` (int): Number of input tokens. Is 0 when this value is missing - `output_tokens` (int): Number of output tokens. Is 0 when this value is missing - `cost` (float): Total cost Is 0 when this value is missing - `input_cost_per_tokens` (float): Cost per one million input tokens - `output_cost_per_tokens` (float): Cost per one million output tokens ### Licensing Information This dataset is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0). Please abide by the license when using the provided data. ### Citation Information ``` @misc{balunovic_srimatharena_2025, title = {MathArena: Evaluating LLMs on Uncontaminated Math Competitions}, author = {Mislav Balunović and Jasper Dekoninck and Ivo Petrov and Nikola Jovanović and Martin Vechev}, copyright = {MIT}, url = {https://matharena.ai/}, publisher = {SRI Lab, ETH Zurich}, month = feb, year = {2025}, } ```

数据集信息: 特征字段: - 字段名:problem_idx,数据类型:字符串型 - 字段名:problem,数据类型:字符串型 - 字段名:model_name,数据类型:字符串型 - 字段名:model_config,数据类型:字符串型 - 字段名:idx_answer,数据类型:64位整数型 - 字段名:all_messages,数据类型:字符串型 - 字段名:user_message,数据类型:字符串型 - 字段名:answer,数据类型:字符串型 - 字段名:input_tokens,数据类型:64位整数型 - 字段名:output_tokens,数据类型:64位整数型 - 字段名:cost,数据类型:64位浮点型 - 字段名:input_cost_per_tokens,数据类型:64位浮点型 - 字段名:output_cost_per_tokens,数据类型:64位浮点型 - 字段名:source,数据类型:字符串型 - 字段名:gold_answer,数据类型:字符串型 - 字段名:formal_statement,数据类型:字符串型 - 字段名:correct,数据类型:布尔型 数据拆分: - 拆分名称:train,字节数:86745713,样本数:249 下载大小:31897727 数据集总大小:86745713 配置项: - 配置名称:default,数据文件: - 拆分:train,路径:data/train-* 许可证:CC BY-SA 4.0 语言: - 英语 数据集展示名称:模型输出集:ArXivLean 2026年3月 规模类别: - 1K<n<10K ### 主页与仓库 - **主页:** [https://matharena.ai/](https://matharena.ai/) - **仓库:** [https://github.com/eth-sri/matharena](https://github.com/eth-sri/matharena) ### 数据集概述 本数据集收录了通过MathArena GitHub仓库生成的、针对2026年3月ArXivLean赛事题目的模型解答结果。 ### 数据字段说明 以下为数据集中各字段的详细说明: - `problem_idx`(整数型):赛事题目索引 - `problem`(字符串型):完整题目描述 - `model_name`(字符串型):MathArena官网展示的模型名称 - `model_config`(字符串型):MathArena GitHub仓库中对应配置文件的路径 - `idx_answer`(整数型):每个模型会对同一题目多次作答,该索引用于标识当前为第几次尝试 - `all_messages`(字符串型):模型交互的完整对话历史 - `user_message`(字符串型):输入至模型的用户提示内容,包含赛事专属指令与题目描述 - `answer`(字符串型):模型生成的完整解答 - `input_tokens`(整数型):输入Token数量,若该值缺失则记为0 - `output_tokens`(整数型):输出Token数量,若该值缺失则记为0 - `cost`(浮点型):总调用成本,若该值缺失则记为0 - `input_cost_per_tokens`(浮点型):每百万输入Token的计费单价 - `output_cost_per_tokens`(浮点型):每百万输出Token的计费单价 - `source`(字符串型):数据来源 - `gold_answer`(字符串型):该题目的基准标准答案 - `formal_statement`(字符串型):题目正式表述 - `correct`(布尔型):标识模型解答是否通过MathArena解析器验证为正确 ### 许可证信息 本数据集采用知识共享署名-相同方式共享4.0国际许可协议(CC BY-SA 4.0)进行授权,使用本数据集时请遵守该许可协议条款。 ### 引用信息 @misc{balunovic_srimatharena_2025, title = {MathArena: Evaluating LLMs on Uncontaminated Math Competitions}, author = {Mislav Balunović and Jasper Dekoninck and Ivo Petrov and Nikola Jovanović and Martin Vechev}, copyright = {MIT}, url = {https://matharena.ai/}, publisher = {SRI Lab, ETH Zurich}, month = feb, year = {2025}, }
提供机构:
MathArena
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作