five

MathArena/arxivmath-0226_outputs

收藏
Hugging Face2026-03-25 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/MathArena/arxivmath-0226_outputs
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: problem_idx dtype: string - name: problem dtype: string - name: model_name dtype: string - name: model_config dtype: string - name: idx_answer dtype: int64 - name: all_messages list: - name: content dtype: string - name: role dtype: string - name: type dtype: string - name: user_message dtype: string - name: answer dtype: string - name: input_tokens dtype: int64 - name: output_tokens dtype: int64 - name: cost dtype: float64 - name: input_cost_per_tokens dtype: float64 - name: output_cost_per_tokens dtype: float64 - name: source dtype: string - name: gold_answer dtype: string - name: parsed_answer dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 241478866 num_examples: 2135 download_size: 104839644 dataset_size: 241478866 configs: - config_name: default data_files: - split: train path: data/train-* license: cc-by-sa-4.0 language: - en pretty_name: Model Outputs ArXivMath February 2026 size_categories: - 1K<n<10K --- ### Homepage and repository - **Homepage:** [https://matharena.ai/](https://matharena.ai/) - **Repository:** [https://github.com/eth-sri/matharena](https://github.com/eth-sri/matharena) ### Dataset Summary This dataset contains model answers to the questions from ArXivMath February 2026 generated using the MathArena GitHub repository. ### Data Fields Below one can find the description of each field in the dataset. - `problem_idx` (int): Index of the problem in the competition - `problem` (str): Full problem statement - `gold_answer` (str): Ground-truth answer to the question - `model_name` (str): Name of the model as presented on the MathArena website - `model_config` (str): Path to the config file in the MathArena Github repo - `idx_answer` (int): Each model answered every question multiple times. This index indicates which attempt this is - `user_message` (str): User message presented to the model. Contains a competition-specific instruction along with the problem statement - `answer` (str): Full model answer - `parsed_answer` (str): Answer as it was parsed by the MathArena parser. Note: a direct string comparison between the parsed_answer and the gold_answer will give false negatives when measuring correctness. - `correct` (bool): Indicates whether the answer is correct as evaluated by the MathArena parser - `input_tokens` (int): Number of input tokens. Is 0 when this value is missing - `output_tokens` (int): Number of output tokens. Is 0 when this value is missing - `cost` (float): Total cost Is 0 when this value is missing - `input_cost_per_tokens` (float): Cost per one million input tokens - `output_cost_per_tokens` (float): Cost per one million output tokens ### Licensing Information This dataset is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0). Please abide by the license when using the provided data. ### Citation Information ``` @misc{balunovic_srimatharena_2025, title = {MathArena: Evaluating LLMs on Uncontaminated Math Competitions}, author = {Mislav Balunović and Jasper Dekoninck and Ivo Petrov and Nikola Jovanović and Martin Vechev}, copyright = {MIT}, url = {https://matharena.ai/}, publisher = {SRI Lab, ETH Zurich}, month = feb, year = {2025}, } ```
提供机构:
MathArena
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作