five

GSM8k中文数据集

收藏
魔搭社区2026-05-19 更新2025-02-15 收录
下载链接:
https://modelscope.cn/datasets/testUser/GSM8K_zh
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset `GSM8K_zh` is a dataset for mathematical reasoning in Chinese, question-answer pairs are translated from GSM8K (https://github.com/openai/grade-school-math/tree/master) by `GPT-3.5-Turbo` with few-shot prompting. The dataset consists of 7473 training samples and 1319 testing samples. The former is for **supervised fine-tuning**, while the latter is for **evaluation**. for training samples, `question_zh` and `answer_zh` are question and answer keys, respectively; for testing samples, only the translated questions are provided (`question_zh`). # Citation If you find the `GSM8K_zh` dataset useful for your projects/papers, please cite the following paper. ```bibtex @article{yu2023metamath, title={MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models}, author={Yu, Longhui and Jiang, Weisen and Shi, Han and Yu, Jincheng and Liu, Zhengying and Zhang, Yu and Kwok, James T and Li, Zhenguo and Weller, Adrian and Liu, Weiyang}, journal={arXiv preprint arXiv:2309.12284}, year={2023} } ```

# 数据集 `GSM8K_zh` 是一款面向中文数学推理任务的数据集,其问答对由GPT-3.5-Turbo基于少样本提示(few-shot prompting)从GSM8K(https://github.com/openai/grade-school-math/tree/master)翻译而来。 该数据集共计7473条训练样本与1319条测试样本,前者用于**监督微调(supervised fine-tuning)**,后者用于**模型评估**。 对于训练样本,`question_zh` 与`answer_zh` 分别对应问题与答案的字段; 对于测试样本,仅提供翻译后的问题,其字段为`question_zh`。 # 引用 若您的项目或论文中使用了`GSM8K_zh` 数据集,请引用如下论文: bibtex @article{yu2023metamath, title={MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models}, author={Yu, Longhui and Jiang, Weisen and Shi, Han and Yu, Jincheng and Liu, Zhengying and Zhang, Yu and Kwok, James T and Li, Zhenguo and Weller, Adrian and Liu, Weiyang}, journal={arXiv preprint arXiv:2309.12284}, year={2023} }
提供机构:
maas
创建时间:
2025-02-09
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
GSM8K中文数据集是一个专门用于中文数学推理的数据集,由英文GSM8K数据集通过GPT-3.5-Turbo翻译而来,包含7473个训练样本和1319个测试样本。训练样本提供中文问题和答案,适用于监督微调;测试样本仅提供问题,用于模型评估。该数据集支持数学推理任务的中文语言处理研究。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务