five

OmniThought

收藏
魔搭社区2026-01-06 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/PAI/OmniThought
下载链接
链接失效反馈
官方服务:
资源简介:
# OmniThought: A Large-Scale Chain-of-Thought Dataset for Advancing Large Reasoning Models ## Overview The rise of **Large Reasoning Models (LRMs)** has revolutionized **Natural Language Processing (NLP)**, enabling breakthroughs in complex tasks like **mathematical problem-solving** and **code generation**. These models rely on **Chain-of-Thought (CoT)** processes to mimic human-like reasoning. However, progress in LRMs is limited by the scarcity of **high-quality, large-scale CoT datasets**—existing resources often lack: - **Diverse reasoning problems** with well-structured CoT processes. - **Multi-teacher distillation** to ensure reasoning quality. - **Fine-grained annotations** describing CoT properties. To bridge this gap, we introduce **`OmniThought`**, a **2-million-scale CoT dataset** generated and validated by **multiple powerful LRMs**. Each CoT process is annotated with: - **Reasoning Verbosity (RV)**: Measures the optimal verbosity of reasoning traces. - **Cognitive Difficulty (CD)**: Assesses the difficulty and complexity of reasoning traces for model comprehension. We also propose a **customized training-data construction method**, ensuring high-quality reasoning traces that are compatible with the model's cognitive ability. For a given target model, we determine appropriate CD and RV ranges aligned with its cognitive capacity and then sample from OmniThought to construct a tailored training set for that model. **Models trained on such customized data produce reasoning outputs that align with their cognitive capacity, leading to stronger overall reasoning performance. They also exhibit adaptive thinking—shorter chains on simpler problems and longer chains on more difficult ones—thereby avoiding both over-thinking and under-thinking.** ## Key Features ✅ **2 million high-quality CoT processes** covering diverse reasoning tasks. ✅ **Multi-teacher distillation** for robust and coherent reasoning paths. ✅ **RV-CD scores** to guide model training for better reasoning performance and adaptive-thinking ability. ✅ **Customized training method** that leverages CD, RV, and the cognitive capacity of the target model. ✅ **Optimized for LRM training**—improves reasoning ability and output quality. ## Experiments & Results Extensive experiments with **Qwen2.5 models** (various sizes) and **Qwen3 models** (various sizes) confirm that: - Training with **RV-CD scores** enhances **LRM reasoning effectiveness**. - Models trained on `OmniThought` achieve **stronger reasoning abilities** with **optimal CoT length and difficulty**. Based on this dataset, we release **a series of high-performance LRMs** with superior reasoning capabilities and adaptive-thinking abilities: ThoughtX series and ThoughtY series. ThoughtX series uses the Qwen2.5-Instruct series as the base model, while ThoughtY series employs the Qwen3 series as the base model. Performance: | Model | AIME24 | MATH500 | GPQA Diamond | LiveCodeBench V2 | Avg. | |---------------------|----------|---------|--------------|------------------|------| | [DistillQwen-ThoughtY-4B](https://huggingface.co/alibaba-pai/DistillQwen-ThoughtY-4B) | 76.7 | 95.2 | 56.1 | 75.8 | 76.0 | | [DistillQwen-ThoughtY-8B](https://huggingface.co/alibaba-pai/DistillQwen-ThoughtY-8B) | 76.7 | 94.6 | 62.1 | 78.1 | 77.9 | | [DistillQwen-ThoughtY-32B](https://huggingface.co/alibaba-pai/DistillQwen-ThoughtY-32B) | 90.0 | 95.2 | 63.6 | 76.3 | 81.3 | | [DistillQwen-ThoughtX-7B](https://huggingface.co/alibaba-pai/DistilQwen-ThoughtX-7B) | 56.7 | 90.2 | 50.0 | 56.8 | 63.4 | | [DistillQwen-ThoughtX-32B](https://huggingface.co/alibaba-pai/DistilQwen-ThoughtX-32B) | 80.0 | 92.6 | 64.0 | 73.4 | 77.5 | ## Impact `OmniThought` significantly advances **LRM development**, enabling models of all scales to tackle complex reasoning tasks more effectively. We have recently released the [OminiThought-0528 dataset](https://huggingface.co/datasets/alibaba-pai/OmniThought-0528) as a supplement and extension to OmniThought. Feel free to use them! ## Reference For more detailed information about the customized training-data construction method, we encourage you to refer to our paper: - **Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations** Wenrui Cai, Chengyu Wang, Junbing Yan, Jun Huang, Xiangzhong Fang [arXiv:2505.10937](https://arxiv.org/abs/2505.10937) You can cite the paper using the following citation format: ```bibtex @misc{cai2025reasoningomnithoughtlargecot, title={Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations}, author={Wenrui Cai and Chengyu Wang and Junbing Yan and Jun Huang and Xiangzhong Fang}, year={2025}, eprint={2505.10937}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2505.10937} } ```

# OmniThought:面向进阶大推理模型的大规模思维链数据集 ## 概述 **大推理模型(Large Reasoning Models,LRMs)的兴起彻底革新了自然语言处理(Natural Language Processing,NLP)领域,在数学问题求解、代码生成等复杂任务中实现了突破性进展。这类模型依托**思维链(Chain-of-Thought,CoT)**流程来模拟类人推理过程。然而,大推理模型的发展受限于**高质量大规模思维链数据集**的匮乏——现有资源往往存在以下不足: - 缺乏具备结构化思维链流程的多样化推理问题 - 未采用多教师蒸馏机制以保障推理质量 - 缺少用于描述思维链属性的细粒度标注 为填补这一空白,我们推出**`OmniThought`**——一款由多个强大大推理模型生成并验证的**200万规模思维链数据集**。每条思维链流程均附带以下标注: - **推理冗长度(Reasoning Verbosity,RV)**:用于衡量推理轨迹的最优冗长程度 - **认知难度(Cognitive Difficulty,CD)**:用于评估推理轨迹在模型理解层面的难度与复杂度 我们同时提出了**定制化训练数据构建方法**,以确保生成的推理轨迹符合目标模型的认知能力。针对特定目标模型,我们可先确定与其认知能力匹配的CD与RV取值范围,再从OmniThought中采样数据,为该模型构建专属训练集。**基于此类定制化数据训练的模型,其推理输出可与自身认知能力相适配,整体推理性能更强;同时还具备自适应推理能力——在简单问题上生成更短的思维链,在复杂问题上生成更长的思维链**,从而避免过度推理与推理不足的问题。 ## 核心特性 ✅ 覆盖多样化推理任务的200万条高质量思维链流程 ✅ 采用多教师蒸馏机制,确保推理路径稳健且逻辑连贯 ✅ 附带RV-CD评分,可指导模型训练以优化推理性能与自适应推理能力 ✅ 依托CD、RV与目标模型认知能力的定制化训练方法 ✅ 专为大推理模型训练优化,可提升推理能力与输出质量 ## 实验与结果 针对**通义千问2.5(Qwen2.5)系列不同参数量模型**与**通义千问3(Qwen3)系列不同参数量模型**开展的大量实验证实: - 结合RV-CD评分进行训练,可提升大推理模型的推理有效性 - 基于OmniThought训练的模型,可在**最优思维链长度与难度**下实现**更强的推理能力** 基于该数据集,我们还推出了一系列具备卓越推理能力与自适应推理能力的高性能大推理模型:ThoughtX系列与ThoughtY系列。其中ThoughtX系列以通义千问2.5-Instruct系列为基础模型,ThoughtY系列则以通义千问3系列为基础模型。 性能表现如下表所示: | 模型名称 | AIME24 | MATH500 | GPQA Diamond | LiveCodeBench V2 | 平均分 | |---------------------|----------|---------|--------------|------------------|------| | [DistillQwen-ThoughtY-4B](https://huggingface.co/alibaba-pai/DistillQwen-ThoughtY-4B) | 76.7 | 95.2 | 56.1 | 75.8 | 76.0 | | [DistillQwen-ThoughtY-8B](https://huggingface.co/alibaba-pai/DistillQwen-ThoughtY-8B) | 76.7 | 94.6 | 62.1 | 78.1 | 77.9 | | [DistillQwen-ThoughtY-32B](https://huggingface.co/alibaba-pai/DistillQwen-ThoughtY-32B) | 90.0 | 95.2 | 63.6 | 76.3 | 81.3 | | [DistillQwen-ThoughtX-7B](https://huggingface.co/alibaba-pai/DistilQwen-ThoughtX-7B) | 56.7 | 90.2 | 50.0 | 56.8 | 63.4 | | [DistillQwen-ThoughtX-32B](https://huggingface.co/alibaba-pai/DistilQwen-ThoughtX-32B) | 80.0 | 92.6 | 64.0 | 73.4 | 77.5 | ## 影响力 `OmniThought`极大推动了**大推理模型的研发**,使不同参数量的模型均可更高效地应对复杂推理任务。 我们近期还发布了**OminiThought-0528数据集**,作为OmniThought的补充与扩展,欢迎大家使用。 ## 参考文献 如需了解定制化训练数据构建方法的更多细节,可参阅我们的论文: - **Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations** Wenrui Cai, Chengyu Wang, Junbing Yan, Jun Huang, Xiangzhong Fang [arXiv:2505.10937](https://arxiv.org/abs/2505.10937) 您可通过以下引用格式引用该论文: bibtex @misc{cai2025reasoningomnithoughtlargecot, title={Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations}, author={Wenrui Cai and Chengyu Wang and Junbing Yan and Jun Huang and Xiangzhong Fang}, year={2025}, eprint={2505.10937}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2505.10937} }
提供机构:
maas
创建时间:
2025-05-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作