MathInstruct
收藏魔搭社区2026-05-15 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/MathInstruct
下载链接
链接失效反馈官方服务:
资源简介:
# 🦣 MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
MathInstruct is a meticulously curated instruction tuning dataset that is lightweight yet generalizable. MathInstruct is compiled from 13 math rationale datasets, six of which are newly curated by this work. It uniquely focuses on the hybrid use of chain-of-thought (CoT) and program-of-thought (PoT) rationales, and ensures extensive coverage of diverse mathematical fields.
Project Page: [https://tiger-ai-lab.github.io/MAmmoTH/](https://tiger-ai-lab.github.io/MAmmoTH/)
Paper: [https://arxiv.org/pdf/2309.05653.pdf](https://arxiv.org/pdf/2309.05653.pdf)
Code: [https://github.com/TIGER-AI-Lab/MAmmoTH](https://github.com/TIGER-AI-Lab/MAmmoTH)
Models:
| | **Base Model: Llama-2** | **Base Model: Code Llama** |
|---
--|---
---
---
---
---
---
---
---
---
---
---
---
---
---
---
---
---
---
---
---
---
|---
---
---
---
---
---
---
---
---
---
---
---
---
---
---
---
---
---
---
---
---
---
---
---
--|
| 7B | 🦣 [MAmmoTH-7B](https://huggingface.co/TIGER-Lab/MAmmoTH-7B) | 🦣 [MAmmoTH-Coder-7B](https://huggingface.co/TIGER-Lab/MAmmoTH-Coder-7B) |
| 13B | 🦣 [MAmmoTH-13B](https://huggingface.co/TIGER-Lab/MAmmoTH-13B) | 🦣 [MAmmoTH-Coder-13B](https://huggingface.co/TIGER-Lab/MAmmoTH-Coder-13B)|
| 34B | - | 🦣 [MAmmoTH-Coder-34B](https://huggingface.co/TIGER-Lab/MAmmoTH-Coder-34B)|
| 70B | 🦣 [MAmmoTH-70B](https://huggingface.co/TIGER-Lab/MAmmoTH-70B) | - |
## **License**
Please check out the license of each subset in our curated dataset MathInstruct.
| Dataset Name | License Type |
|---
---
---
---
--|---
---
---
---
---
-|
| GSM8K | MIT |
| GSM8K-RFT | Non listed |
| AQuA-RAT | Apache 2.0 |
| MATH | MIT |
| TheoremQA | MIT |
| Camel-Math | Attribution-NonCommercial 4.0 International |
| NumGLUE | Apache-2.0 |
| MathQA | Apache-2.0 |
| Our Curated | MIT |
## 示例代码
```python
from modelscope import MsDataset
from modelscope.utils.constant import DownloadMode
ds = MsDataset.load('AI-ModelScope/MathInstruct',subset_name='default', split='train', download_mode=DownloadMode.FORCE_REDOWNLOAD)
print(next(iter(ds)))
```
## **Citation**
Please cite our paper if you use our data, model or code. Please also kindly cite the original dataset papers.
```
@article{yue2023mammoth,
title={MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning},
author={Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen},
journal={arXiv preprint arXiv:2309.05653},
year={2023}
}
```
# 🦣 MAmmoTH:通过混合指令微调构建数学通用模型
MathInstruct是一套经过精心遴选的指令微调数据集,兼具轻量性与强泛化能力。该数据集整合自13个数学推理原理数据集,其中6个为本研究全新构建。本数据集独特地聚焦于思维链(Chain-of-Thought, CoT)与思维程序(Program-of-Thought, PoT)两种推理范式的混合使用,并确保广泛覆盖多样化的数学领域。
项目主页:[https://tiger-ai-lab.github.io/MAmmoTH/](https://tiger-ai-lab.github.io/MAmmoTH/)
学术论文:[https://arxiv.org/pdf/2309.05653.pdf](https://arxiv.org/pdf/2309.05653.pdf)
代码仓库:[https://github.com/TIGER-AI-Lab/MAmmoTH](https://github.com/TIGER-AI-Lab/MAmmoTH)
### 模型
| | **基础模型:Llama-2** | **基础模型:Code Llama** |
|---|---|---|
| 7B | 🦣 [MAmmoTH-7B](https://huggingface.co/TIGER-Lab/MAmmoTH-7B) | 🦣 [MAmmoTH-Coder-7B](https://huggingface.co/TIGER-Lab/MAmmoTH-Coder-7B) |
| 13B | 🦣 [MAmmoTH-13B](https://huggingface.co/TIGER-Lab/MAmmoTH-13B) | 🦣 [MAmmoTH-Coder-13B](https://huggingface.co/TIGER-Lab/MAmmoTH-Coder-13B) |
| 34B | - | 🦣 [MAmmoTH-Coder-34B](https://huggingface.co/TIGER-Lab/MAmmoTH-Coder-34B) |
| 70B | 🦣 [MAmmoTH-70B](https://huggingface.co/TIGER-Lab/MAmmoTH-70B) | - |
### 许可协议
请查阅本研究遴选的MathInstruct数据集中各子集对应的许可协议:
| 数据集名称 | 许可类型 |
|---|---|
| GSM8K | MIT |
| GSM8K-RFT | 未列明 |
| AQuA-RAT | Apache 2.0 |
| MATH | MIT |
| TheoremQA | MIT |
| Camel-Math | Attribution-NonCommercial 4.0 International(署名-非商业性使用4.0国际许可) |
| NumGLUE | Apache-2.0 |
| MathQA | Apache-2.0 |
| 本研究新增甄选数据集 | MIT |
### 示例代码
python
from modelscope import MsDataset
from modelscope.utils.constant import DownloadMode
ds = MsDataset.load('AI-ModelScope/MathInstruct',subset_name='default', split='train', download_mode=DownloadMode.FORCE_REDOWNLOAD)
print(next(iter(ds)))
### 引用说明
若您使用本数据集、模型或代码,请引用我们的学术论文。同时也请一并引用原始数据集的相关论文。
@article{yue2023mammoth,
title={MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning},
author={Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen},
journal={arXiv preprint arXiv:2309.05653},
year={2023}
}
提供机构:
maas
创建时间:
2023-12-04



