five

tokyotech-llm/swallow-math-v2

收藏
Hugging Face2025-11-06 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/tokyotech-llm/swallow-math-v2
下载链接
链接失效反馈
官方服务:
资源简介:
SwallowMath-v2是一个大规模的数学数据集,包含约320亿个token,旨在支持大型语言模型(LLMs)在数学推理方面的研究和开发。该数据集基于SwallowMath-v1构建,并采用了LLM驱动的重写方法,将数学内容重写成清晰、分步的解释。数据集提供了两种重写风格:Textbook和Q&A,以适应不同的使用场景。SwallowMath-v2的数据集以Apache 2.0许可证发布,方便研究者和商业用户使用。

SwallowMath-v2 is a large-scale mathematical dataset containing approximately 32 billion tokens, developed to support open and reproducible research on mathematical reasoning for large language models (LLMs). Building on the success of v1, this release aims to construct a larger-scale and more permissively licensed corpus. SwallowMath-v2 employs an LLM-driven rewriting approach, removing boilerplate, restoring missing context, and reformatting solutions into clear, step-by-step explanations. Additionally, we explored multiple rewriting styles and adopted the two most effective ones—Textbook and Q&A—in the final synthesis stage, yielding higher consistency and reasoning quality. Empirical evaluations demonstrate that models trained with SwallowMath-v2 achieve stronger performance on GSM-Plus and BBH, surpassing other open mathematical datasets.
提供机构:
tokyotech-llm
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作