five

mathfish-tasks

收藏
魔搭社区2025-07-16 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/allenai/mathfish-tasks
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for MathFish Tasks This dataset is a derivative of [MathFish](https://huggingface.co/datasets/allenai/mathfish), where dev set examples are inserted into prompts for models to assess their abilities to verify and tag standards in math problems. See [MathFish](https://huggingface.co/datasets/allenai/mathfish) for more details on sources, creation, and uses of this data. This data can be used in conjunction with our model API wrapper included in this [Github repository](https://github.com/allenai/mathfish/tree/main). ## Dataset Details ### Dataset Description - **Curated by:** Lucy Li, Tal August, Rose E Wang, Luca Soldaini, Courtney Allison, Kyle Lo - **Funded by:** The Gates Foundation - **Language(s) (NLP):** English - **License:** ODC-By 1.0 ## Dataset Structure Files are named in the following manner: ``` data_{task format}-{mathfish data split}_{other parameters}_{prompt number}_{table format}.jsonl ``` Each line in a tagging file is formatted as the following: ``` { "id": unique instance ID "dataset": some grouping of instances within a given task format, "messages": [ { "role": "user", "prompt_template": "", "options": [ # a list of tagging options ], "problem_activity": "", }, { "role": "assistant", "response_template": "{option}", "response_format": "", # e.g. json or comma-separated list "correct_option_index": [ # integer indices here that correspond to "options" above ] } ] } ``` Each instance may also include keys indicating few-shot exemplars. Note that files labeled with `entailment` are inputs for the task we call "verification" in our paper. In verification files, the format is similar to tagging above, but instead of an `options` key, there is a `standards_description` key including a natural language description of a math standard, and the assistant's dictionary includes a yes/no entry for whether the given problem `aligns` with the described standard. ## Dataset Creation The prompts in this repository are filtered by testing 15 possible prompts from [this file](https://github.com/allenai/mathfish/blob/main/mathfish/datasets/prompts.json) across three models: Llama 2 70B, Mixtral 8x7B, and GPT-4-turbo. This repo includes each models' top three performing prompts on tagging and verification tasks, to facilitate reproducibility of the findings in our paper (link TBD). ## Citation ``` @misc{lucy2024evaluatinglanguagemodelmath, title={Evaluating Language Model Math Reasoning via Grounding in Educational Curricula}, author={Li Lucy and Tal August and Rose E. Wang and Luca Soldaini and Courtney Allison and Kyle Lo}, year={2024}, eprint={2408.04226}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2408.04226}, } ``` ## Dataset Card Contact kylel@allenai.org

# MathFish 任务数据集卡片 本数据集是[MathFish](https://huggingface.co/datasets/allenai/mathfish)的衍生数据集,其将开发集(dev set)样本嵌入提示词中,用于评估模型验证数学题标准并为其打标的能力。 有关本数据集的来源、构建与使用详情,请参阅[MathFish](https://huggingface.co/datasets/allenai/mathfish)。 本数据集可与本[GitHub仓库](https://github.com/allenai/mathfish/tree/main)中提供的模型API封装工具配合使用。 ## 数据集详情 ### 数据集描述 - **数据整理者:** Lucy Li、Tal August、Rose E Wang、Luca Soldaini、Courtney Allison、Kyle Lo - **资助方:** 盖茨基金会(The Gates Foundation) - **语言(自然语言处理):** 英语 - **许可协议:** ODC-By 1.0 ## 数据集结构 文件命名规则如下: data_{任务格式}-{mathfish数据集划分}_{其他参数}_{提示词编号}_{表格格式}.jsonl 打标文件的每一行格式如下: { "id": 唯一实例标识符, "dataset": 给定任务格式下的实例分组, "messages": [ { "role": "user", "prompt_template": "", "options": [ # 打标选项列表 ], "problem_activity": "", }, { "role": "assistant", "response_template": "{option}", "response_format": "", # 例如:JSON或逗号分隔列表 "correct_option_index": [ # 对应上述"options"的整数索引 ] } ] } 每个实例还可包含用于表示少样本示例(few-shot exemplars)的键。 请注意,带有`entailment`(蕴含任务)标识的文件为本论文中所称的“验证”任务的输入数据。验证文件的格式与上述打标文件类似,但无需`options`键,取而代之的是`standards_description`键,其包含数学标准的自然语言描述;此外,助手字段中包含一个“是/否”条目,用于说明给定题目是否与所述数学标准相符。 ## 数据集构建 本仓库中的提示词通过以下方式筛选:从[该文件](https://github.com/allenai/mathfish/blob/main/mathfish/datasets/prompts.json)中选取15种候选提示词,在Llama 2 70B、Mixtral 8x7B以及GPT-4-turbo这三款模型上进行测试。本仓库收录了各模型在打标与验证任务上表现最优的前三款提示词,以支持本论文研究结果的可复现性(论文链接待公布)。 ## 引用 @misc{lucy2024evaluatinglanguagemodelmath, title={Evaluating Language Model Math Reasoning via Grounding in Educational Curricula}, author={Li Lucy and Tal August and Rose E. Wang and Luca Soldaini and Courtney Allison and Kyle Lo}, year={2024}, eprint={2408.04226}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2408.04226}, } ## 数据集卡片联系人 kylel@allenai.org
提供机构:
maas
创建时间:
2025-05-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作