tokyotech-llm/swallow-math-v2
收藏Hugging Face2025-11-06 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/tokyotech-llm/swallow-math-v2
下载链接
链接失效反馈官方服务:
资源简介:
SwallowMath-v2是一个大规模的数学数据集,包含约320亿个token,旨在支持大型语言模型(LLMs)在数学推理方面的研究和开发。该数据集基于SwallowMath-v1构建,并采用了LLM驱动的重写方法,将数学内容重写成清晰、分步的解释。数据集提供了两种重写风格:Textbook和Q&A,以适应不同的使用场景。SwallowMath-v2的数据集以Apache 2.0许可证发布,方便研究者和商业用户使用。
SwallowMath-v2 is a large-scale mathematical dataset containing approximately 32 billion tokens, developed to support open and reproducible research on mathematical reasoning for large language models (LLMs). Building on the success of v1, this release aims to construct a larger-scale and more permissively licensed corpus. SwallowMath-v2 employs an LLM-driven rewriting approach, removing boilerplate, restoring missing context, and reformatting solutions into clear, step-by-step explanations. Additionally, we explored multiple rewriting styles and adopted the two most effective ones—Textbook and Q&A—in the final synthesis stage, yielding higher consistency and reasoning quality. Empirical evaluations demonstrate that models trained with SwallowMath-v2 achieve stronger performance on GSM-Plus and BBH, surpassing other open mathematical datasets.
提供机构:
tokyotech-llm



