SynLogic

Name: SynLogic
Creator: maas
Published: 2026-05-06 11:49:26
License: 暂无描述

魔搭社区2026-05-06 更新2025-05-31 收录

下载链接：

https://modelscope.cn/datasets/MiniMax/SynLogic

下载链接

链接失效反馈

官方服务：

资源简介：

# SynLogic Dataset SynLogic is a comprehensive synthetic logical reasoning dataset designed to enhance logical reasoning capabilities in Large Language Models (LLMs) through reinforcement learning with verifiable rewards. * 🐙 **GitHub Repo:** [https://github.com/MiniMax-AI/SynLogic](https://github.com/MiniMax-AI/SynLogic) * 📜 **Paper (arXiv):** [https://arxiv.org/abs/2505.19641](https://arxiv.org/abs/2505.19641) ## Dataset Description SynLogic contains 35 diverse logical reasoning tasks with automatic verification capabilities, making it ideal for reinforcement learning training. ### Key Features - **35 Task Types**: Including Sudoku, Game of 24, Cipher, Arrow Maze, Cryptarithm, and more - **Verifiable Rewards**: All samples have automatic verifiers for correctness checking - **Controllable Difficulty**: Adjustable difficulty parameters for each task - **Two Versions**: Easy (7B models) and Hard (32B models) ## Dataset Configurations ### SynLogic-Easy - **Target**: 7B parameter models - **Tasks**: 27 tasks - **Samples**: ~16,000 training instances ### SynLogic-Hard - **Target**: 32B parameter models - **Tasks**: All 35 tasks - **Samples**: ~33,000 training instances ## Usage ```python from datasets import load_dataset # Load easy version for 7B models dataset_easy = load_dataset("MiniMaxAI/SynLogic", "easy") # Load hard version for 32B models dataset_hard = load_dataset("MiniMaxAI/SynLogic", "hard") ``` ## News **[2025-07-02]** Fixed extra info information for boolean_expressions **[2025-06-12]** Fixed duplicate prompts in ARC AGI tasks. Updated both training and validation datasets to remove redundant prompt content. See commit [3416fc5](https://huggingface.co/datasets/MiniMaxAI/SynLogic/commit/3416fc57cf3ee47cac749eaf2c0d430617a99041) for details. ## Performance Models trained on SynLogic achieve state-of-the-art logical reasoning performance: - **BBEH**: 25.5% (+6 points vs DeepSeek-R1-Distill-Qwen-32B) - **KOR-Bench**: 62.2% - Strong transfer to mathematical reasoning tasks ## Citation Please cite our paper if you find our work helpful: ```bibtex @misc{liu2025synlogic, title={SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond}, author={Junteng Liu and Yuanxiang Fan and Zhuo Jiang and Han Ding and Yongyi Hu and Chi Zhang and Yiqi Shi and Shitong Weng and Aili Chen and Shiqi Chen and Yunan Huang and Mozhi Zhang and Pengyu Zhao and Junjie Yan and Junxian He}, year={2025}, eprint={2505.19641}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2505.19641}, } ```

# SynLogic数据集 SynLogic是一款综合性合成逻辑推理数据集，旨在通过带有可验证奖励的强化学习，提升大语言模型（Large Language Models，LLMs）的逻辑推理能力。 * 🐙 **GitHub 代码仓库：** [https://github.com/MiniMax-AI/SynLogic](https://github.com/MiniMax-AI/SynLogic) * 📜 **论文（arXiv）：** [https://arxiv.org/abs/2505.19641](https://arxiv.org/abs/2505.19641) ## 数据集描述 SynLogic包含35种多样化的逻辑推理任务，且自带自动验证功能，非常适合用于强化学习训练。 ### 核心特性 - **35种任务类型**：涵盖数独、24点游戏、密码破译、箭头迷宫、密码算术题等多种任务 - **可验证奖励**：所有样本均配备自动验证器，可完成正确性校验 - **难度可控**：支持针对每项任务调整难度参数 - **双版本设置**：分为简易版（适配7B参数量模型）与进阶版（适配32B参数量模型） ## 数据集配置 ### SynLogic简易版 - **适配目标**：7B参数量模型 - **任务数量**：27种任务 - **样本规模**：约16000条训练样本 ### SynLogic进阶版 - **适配目标**：32B参数量模型 - **任务数量**：全部35种任务 - **样本规模**：约33000条训练样本 ## 使用方法 python from datasets import load_dataset # 加载适配7B模型的简易版数据集 dataset_easy = load_dataset("MiniMaxAI/SynLogic", "easy") # 加载适配32B模型的进阶版数据集 dataset_hard = load_dataset("MiniMaxAI/SynLogic", "hard") ## 更新动态 **[2025-07-02]** 修复了boolean_expressions任务中的额外冗余信息问题 **[2025-06-12]** 修复了ARC AGI任务中的重复提示词问题，更新了训练与验证数据集以移除冗余提示内容。详细变更可查看提交记录 [3416fc5](https://huggingface.co/datasets/MiniMaxAI/SynLogic/commit/3416fc57cf3ee47cac749eaf2c0d430617a99041)。 ## 模型性能基于SynLogic数据集训练的模型可达成当前顶尖的逻辑推理性能： - **BBEH基准**：准确率达25.5%（相较DeepSeek-R1-Distill-Qwen-32B提升6个百分点） - **KOR-Bench基准**：准确率达62.2% - 可高效迁移至数学推理类任务 ## 引用声明若您认为本工作对您有所帮助，请引用我们的论文： bibtex @misc{liu2025synlogic, title={SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond}, author={Junteng Liu and Yuanxiang Fan and Zhuo Jiang and Han Ding and Yongyi Hu and Chi Zhang and Yiqi Shi and Shitong Weng and Aili Chen and Shiqi Chen and Yunan Huang and Mozhi Zhang and Pengyu Zhao and Junjie Yan and Junxian He}, year={2025}, eprint={2505.19641}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2505.19641}, }

提供机构：

maas

创建时间：

2025-05-29

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集