five

Turing-Reason-CoT-Mini

收藏
魔搭社区2025-12-03 更新2025-09-20 收录
下载链接:
https://modelscope.cn/datasets/prithivMLmods/Turing-Reason-CoT-Mini
下载链接
链接失效反馈
官方服务:
资源简介:
![19.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/w79X7Ki7nZsy57ERWSiMN.png) # **Turing-Reason-CoT-Mini** > **Turing-Reason-CoT-Mini** is a high-quality, compact chain-of-thought reasoning dataset curated for tasks in mathematics, science, and coding. While the dataset spans diverse domains, it is primarily driven by mathematical reasoning, reflecting a major share of math-focused prompts and long-form logical solutions. ## Quick Start with Hugging Face Datasets🤗 ```py pip install -U datasets ``` ```py from datasets import load_dataset dataset = load_dataset("prithivMLmods/Turing-Reason-CoT-Mini", split="train") ``` ## Overview * **Total Samples**: \~557,536 * **Split**: `train` only * **Languages**: English * **Format**: Apache Arrow (auto-converted to Parquet) * **License**: Apache-2.0 * **Tags**: `math`, `code`, `science`, `reasoning`, `longcot` ## Highlights * Structured to promote **long-form, step-by-step reasoning**, ideal for training and evaluating chain-of-thought (CoT) capable models. * Reasoning traces include natural, human-like explanations for both simple and complex problems. * Fine-tuned across math word problems, logic-based questions, and technical prompts from STEM domains. ## Dataset Structure Each entry in the dataset includes: * **`problem`** (string): A math, science, or code problem. * **`solution`** (string): A detailed step-by-step solution crafted in a long-form reasoning style. The reasoning structure in solutions helps models understand logical flow, intermediate steps, and layered deductions—making this dataset suitable for advanced LLMs requiring interpretable outputs. ## Source & Derivation **Turing-Reason-CoT-Mini** is a derivative collection synthesized and optimized from: * A custom internal modular dataset tailored for logical and numeric reasoning tasks. * Chain-of-thought style responses generated with QwQ 32B-based models, carefully filtered and structured for quality. * Math chain-of-thoughts from PrimeIntellect/NuminaMath-QwQ-CoT-5M. This dataset was created with a focus on enhancing CoT capabilities in large-scale models working on math, science, and code. ## License Apache License 2.0

![19.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/w79X7Ki7nZsy57ERWSiMN.png) # **Turing-Reason-CoT-Mini** > **Turing-Reason-CoT-Mini** 是一款高质量、轻量化的思维链(chain-of-thought, CoT)推理数据集,专为数学、科学与编程任务精心打造。尽管该数据集覆盖多元领域,但核心以数学推理为主,数学导向的提示与长格式逻辑解答占比最高。 ## 🤗 Hugging Face 数据集快速上手 py pip install -U datasets py from datasets import load_dataset dataset = load_dataset("prithivMLmods/Turing-Reason-CoT-Mini", split="train") ## 数据集概览 * **总样本数**:约557,536条 * **数据划分**:仅包含训练集(train) * **语言**:英语 * **格式**:Apache Arrow格式(自动转换为Parquet格式) * **许可证**:Apache-2.0 * **标签**:`数学`、`代码`、`科学`、`推理`、`longcot` ## 核心亮点 * 结构设计旨在支持**长格式、分步式推理**,非常适合训练与评估具备思维链(CoT)能力的模型。 * 推理轨迹包含针对简单与复杂问题的自然类人解释。 * 覆盖STEM(科学、技术、工程、数学)领域的数学应用题、逻辑类问题与技术提示词,适配微调需求。 ## 数据集结构 数据集中的每条样本包含以下字段: * **`problem`**(字符串类型):数学、科学或编程问题。 * **`solution`**(字符串类型):采用长格式推理风格编写的详细分步解答。 解答中的推理结构可帮助模型理解逻辑流程、中间步骤与分层推导过程,因此该数据集非常适合需要可解释输出的高级大语言模型(Large Language Model, LLM)。 ## 数据集来源与衍生说明 **Turing-Reason-CoT-Mini** 是从以下内容合成并优化得到的衍生数据集: * 为逻辑与数值推理任务定制的自研内部模块化数据集。 * 基于QwQ 32B模型生成的思维链风格回复,经过严格筛选与结构化处理以保障质量。 * 来自PrimeIntellect/NuminaMath-QwQ-CoT-5M的数学思维链数据。 本数据集的构建目标是提升面向数学、科学与编程任务的大规模模型的思维链能力。 ## 许可证 Apache许可证2.0版
提供机构:
maas
创建时间:
2025-09-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作