mathfish-tasks
收藏魔搭社区2025-07-16 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/allenai/mathfish-tasks
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for MathFish Tasks
This dataset is a derivative of [MathFish](https://huggingface.co/datasets/allenai/mathfish), where dev set examples are inserted into prompts for models to assess their abilities to verify and tag standards in math problems.
See [MathFish](https://huggingface.co/datasets/allenai/mathfish) for more details on sources, creation, and uses of this data.
This data can be used in conjunction with our model API wrapper included in this [Github repository](https://github.com/allenai/mathfish/tree/main).
## Dataset Details
### Dataset Description
- **Curated by:** Lucy Li, Tal August, Rose E Wang, Luca Soldaini, Courtney Allison, Kyle Lo
- **Funded by:** The Gates Foundation
- **Language(s) (NLP):** English
- **License:** ODC-By 1.0
## Dataset Structure
Files are named in the following manner:
```
data_{task format}-{mathfish data split}_{other parameters}_{prompt number}_{table format}.jsonl
```
Each line in a tagging file is formatted as the following:
```
{
"id": unique instance ID
"dataset": some grouping of instances within a given task format,
"messages": [
{
"role": "user",
"prompt_template": "",
"options": [
# a list of tagging options
],
"problem_activity": "",
},
{
"role": "assistant",
"response_template": "{option}",
"response_format": "", # e.g. json or comma-separated list
"correct_option_index": [
# integer indices here that correspond to "options" above
]
}
]
}
```
Each instance may also include keys indicating few-shot exemplars.
Note that files labeled with `entailment` are inputs for the task we call "verification" in our paper. In verification files, the format is similar to tagging above, but instead of an `options` key, there is a `standards_description` key including a natural language description of a math standard, and the assistant's dictionary includes a yes/no entry for whether the given problem `aligns` with the described standard.
## Dataset Creation
The prompts in this repository are filtered by testing 15 possible prompts from [this file](https://github.com/allenai/mathfish/blob/main/mathfish/datasets/prompts.json) across three models: Llama 2 70B, Mixtral 8x7B, and GPT-4-turbo. This repo includes each models' top three performing prompts on tagging and verification tasks, to facilitate reproducibility of the findings in our paper (link TBD).
## Citation
```
@misc{lucy2024evaluatinglanguagemodelmath,
title={Evaluating Language Model Math Reasoning via Grounding in Educational Curricula},
author={Li Lucy and Tal August and Rose E. Wang and Luca Soldaini and Courtney Allison and Kyle Lo},
year={2024},
eprint={2408.04226},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2408.04226},
}
```
## Dataset Card Contact
kylel@allenai.org
# MathFish 任务数据集卡片
本数据集是[MathFish](https://huggingface.co/datasets/allenai/mathfish)的衍生数据集,其将开发集(dev set)样本嵌入提示词中,用于评估模型验证数学题标准并为其打标的能力。
有关本数据集的来源、构建与使用详情,请参阅[MathFish](https://huggingface.co/datasets/allenai/mathfish)。
本数据集可与本[GitHub仓库](https://github.com/allenai/mathfish/tree/main)中提供的模型API封装工具配合使用。
## 数据集详情
### 数据集描述
- **数据整理者:** Lucy Li、Tal August、Rose E Wang、Luca Soldaini、Courtney Allison、Kyle Lo
- **资助方:** 盖茨基金会(The Gates Foundation)
- **语言(自然语言处理):** 英语
- **许可协议:** ODC-By 1.0
## 数据集结构
文件命名规则如下:
data_{任务格式}-{mathfish数据集划分}_{其他参数}_{提示词编号}_{表格格式}.jsonl
打标文件的每一行格式如下:
{
"id": 唯一实例标识符,
"dataset": 给定任务格式下的实例分组,
"messages": [
{
"role": "user",
"prompt_template": "",
"options": [
# 打标选项列表
],
"problem_activity": "",
},
{
"role": "assistant",
"response_template": "{option}",
"response_format": "", # 例如:JSON或逗号分隔列表
"correct_option_index": [
# 对应上述"options"的整数索引
]
}
]
}
每个实例还可包含用于表示少样本示例(few-shot exemplars)的键。
请注意,带有`entailment`(蕴含任务)标识的文件为本论文中所称的“验证”任务的输入数据。验证文件的格式与上述打标文件类似,但无需`options`键,取而代之的是`standards_description`键,其包含数学标准的自然语言描述;此外,助手字段中包含一个“是/否”条目,用于说明给定题目是否与所述数学标准相符。
## 数据集构建
本仓库中的提示词通过以下方式筛选:从[该文件](https://github.com/allenai/mathfish/blob/main/mathfish/datasets/prompts.json)中选取15种候选提示词,在Llama 2 70B、Mixtral 8x7B以及GPT-4-turbo这三款模型上进行测试。本仓库收录了各模型在打标与验证任务上表现最优的前三款提示词,以支持本论文研究结果的可复现性(论文链接待公布)。
## 引用
@misc{lucy2024evaluatinglanguagemodelmath,
title={Evaluating Language Model Math Reasoning via Grounding in Educational Curricula},
author={Li Lucy and Tal August and Rose E. Wang and Luca Soldaini and Courtney Allison and Kyle Lo},
year={2024},
eprint={2408.04226},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2408.04226},
}
## 数据集卡片联系人
kylel@allenai.org
提供机构:
maas
创建时间:
2025-05-28



