mathfish-tasks

Name: mathfish-tasks
Creator: maas
Published: 2025-07-16 16:30:20
License: 暂无描述

魔搭社区2025-07-16 更新2025-05-31 收录

下载链接：

https://modelscope.cn/datasets/allenai/mathfish-tasks

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for MathFish Tasks This dataset is a derivative of [MathFish](https://huggingface.co/datasets/allenai/mathfish), where dev set examples are inserted into prompts for models to assess their abilities to verify and tag standards in math problems. See [MathFish](https://huggingface.co/datasets/allenai/mathfish) for more details on sources, creation, and uses of this data. This data can be used in conjunction with our model API wrapper included in this [Github repository](https://github.com/allenai/mathfish/tree/main). ## Dataset Details ### Dataset Description - **Curated by:** Lucy Li, Tal August, Rose E Wang, Luca Soldaini, Courtney Allison, Kyle Lo - **Funded by:** The Gates Foundation - **Language(s) (NLP):** English - **License:** ODC-By 1.0 ## Dataset Structure Files are named in the following manner: ``` data_{task format}-{mathfish data split}_{other parameters}_{prompt number}_{table format}.jsonl ``` Each line in a tagging file is formatted as the following: ``` { "id": unique instance ID "dataset": some grouping of instances within a given task format, "messages": [ { "role": "user", "prompt_template": "", "options": [ # a list of tagging options ], "problem_activity": "", }, { "role": "assistant", "response_template": "{option}", "response_format": "", # e.g. json or comma-separated list "correct_option_index": [ # integer indices here that correspond to "options" above ] } ] } ``` Each instance may also include keys indicating few-shot exemplars. Note that files labeled with `entailment` are inputs for the task we call "verification" in our paper. In verification files, the format is similar to tagging above, but instead of an `options` key, there is a `standards_description` key including a natural language description of a math standard, and the assistant's dictionary includes a yes/no entry for whether the given problem `aligns` with the described standard. ## Dataset Creation The prompts in this repository are filtered by testing 15 possible prompts from [this file](https://github.com/allenai/mathfish/blob/main/mathfish/datasets/prompts.json) across three models: Llama 2 70B, Mixtral 8x7B, and GPT-4-turbo. This repo includes each models' top three performing prompts on tagging and verification tasks, to facilitate reproducibility of the findings in our paper (link TBD). ## Citation ``` @misc{lucy2024evaluatinglanguagemodelmath, title={Evaluating Language Model Math Reasoning via Grounding in Educational Curricula}, author={Li Lucy and Tal August and Rose E. Wang and Luca Soldaini and Courtney Allison and Kyle Lo}, year={2024}, eprint={2408.04226}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2408.04226}, } ``` ## Dataset Card Contact kylel@allenai.org

# MathFish 任务数据集卡片本数据集是[MathFish](https://huggingface.co/datasets/allenai/mathfish)的衍生数据集，其将开发集（dev set）样本嵌入提示词中，用于评估模型验证数学题标准并为其打标的能力。有关本数据集的来源、构建与使用详情，请参阅[MathFish](https://huggingface.co/datasets/allenai/mathfish)。本数据集可与本[GitHub仓库](https://github.com/allenai/mathfish/tree/main)中提供的模型API封装工具配合使用。 ## 数据集详情 ### 数据集描述 - **数据整理者：** Lucy Li、Tal August、Rose E Wang、Luca Soldaini、Courtney Allison、Kyle Lo - **资助方：** 盖茨基金会（The Gates Foundation） - **语言（自然语言处理）：** 英语 - **许可协议：** ODC-By 1.0 ## 数据集结构文件命名规则如下： data_{任务格式}-{mathfish数据集划分}_{其他参数}_{提示词编号}_{表格格式}.jsonl 打标文件的每一行格式如下： { "id": 唯一实例标识符, "dataset": 给定任务格式下的实例分组, "messages": [ { "role": "user", "prompt_template": "", "options": [ # 打标选项列表 ], "problem_activity": "", }, { "role": "assistant", "response_template": "{option}", "response_format": "", # 例如：JSON或逗号分隔列表 "correct_option_index": [ # 对应上述"options"的整数索引 ] } ] } 每个实例还可包含用于表示少样本示例（few-shot exemplars）的键。请注意，带有`entailment`（蕴含任务）标识的文件为本论文中所称的“验证”任务的输入数据。验证文件的格式与上述打标文件类似，但无需`options`键，取而代之的是`standards_description`键，其包含数学标准的自然语言描述；此外，助手字段中包含一个“是/否”条目，用于说明给定题目是否与所述数学标准相符。 ## 数据集构建本仓库中的提示词通过以下方式筛选：从[该文件](https://github.com/allenai/mathfish/blob/main/mathfish/datasets/prompts.json)中选取15种候选提示词，在Llama 2 70B、Mixtral 8x7B以及GPT-4-turbo这三款模型上进行测试。本仓库收录了各模型在打标与验证任务上表现最优的前三款提示词，以支持本论文研究结果的可复现性（论文链接待公布）。 ## 引用 @misc{lucy2024evaluatinglanguagemodelmath, title={Evaluating Language Model Math Reasoning via Grounding in Educational Curricula}, author={Li Lucy and Tal August and Rose E. Wang and Luca Soldaini and Courtney Allison and Kyle Lo}, year={2024}, eprint={2408.04226}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2408.04226}, } ## 数据集卡片联系人 kylel@allenai.org

提供机构：

maas

创建时间：

2025-05-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集