Turing-Reason-CoT

Name: Turing-Reason-CoT
Creator: maas
Published: 2025-12-03 17:17:25
License: 暂无描述

魔搭社区2025-12-03 更新2025-11-29 收录

下载链接：

https://modelscope.cn/datasets/prithivMLmods/Turing-Reason-CoT

下载链接

链接失效反馈

官方服务：

资源简介：

![2.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/O9MWdf7m8nBfw9E0JaspD.png) # **Turing-Reason-CoT** > **Turing-Reason-CoT** is a high-quality, compact chain-of-thought reasoning dataset curated for tasks in mathematics, science, and coding. While the dataset spans diverse domains, it is primarily driven by mathematical reasoning, reflecting a major share of math-focused prompts and long-form logical solutions. ## Quick Start with Hugging Face Datasets🤗 ```py pip install -U datasets ``` ```py from datasets import load_dataset dataset = load_dataset("prithivMLmods/Turing-Reason-CoT", split="train") ``` ## Overview * **Total Samples**: \~ 4,993,463 * **Split**: `train` only * **Languages**: English * **Format**: Apache Arrow (auto-converted to Parquet) * **License**: Apache-2.0 * **Tags**: `math`, `code`, `science`, `reasoning`, `longcot` ## Highlights * Structured to promote **long-form, step-by-step reasoning**, ideal for training and evaluating chain-of-thought (CoT) capable models. * Reasoning traces include natural, human-like explanations for both simple and complex problems. * Fine-tuned across math word problems, logic-based questions, and technical prompts from STEM domains. ## Dataset Structure Each entry in the dataset includes: * **`problem`** (string): A math, science, or code problem. * **`solution`** (string): A detailed step-by-step solution crafted in a long-form reasoning style. The reasoning structure in solutions helps models understand logical flow, intermediate steps, and layered deductions—making this dataset suitable for advanced LLMs requiring interpretable outputs. ## Source & Derivation **Turing-Reason-CoT** is a derivative collection synthesized and optimized from: * A custom internal modular dataset tailored for logical and numeric reasoning tasks. * Chain-of-thought style responses generated with QwQ 32B-based models, carefully filtered and structured for quality. * Math chain-of-thoughts from PrimeIntellect/NuminaMath-QwQ-CoT-5M. This dataset was created with a focus on enhancing CoT capabilities in large-scale models working on math, science, and code. ## License Apache License 2.0

![2.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/O9MWdf7m8nBfw9E0JaspD.png) # **Turing-Reason-CoT** > **Turing-Reason-CoT** 是一款高质量、轻量化的思维链（Chain-of-Thought，CoT）推理数据集，专为数学、科学与编码领域的任务精心遴选构建。尽管该数据集覆盖多元应用领域，但核心聚焦于数学推理，其中数学相关提示与长格式逻辑解法占比最高。 ## Hugging Face 数据集快速入门🤗 py pip install -U datasets py from datasets import load_dataset dataset = load_dataset("prithivMLmods/Turing-Reason-CoT", split="train") ## 概览 * **总样本量**：约4,993,463条 * **数据集划分**：仅包含`train`（训练集）划分 * **语言**：英语 * **数据格式**：Apache Arrow格式（自动转换为Parquet格式） * **许可证**：Apache-2.0 * **标签**：`math`、`code`、`science`、`reasoning`、`longcot` ## 核心亮点 * 采用结构化设计，支持**长格式、分步式推理**，是训练与评估具备思维链推理能力模型的理想数据集。 * 推理轨迹包含贴近人类思维的自然解释，覆盖简单与复杂各类问题场景。 * 覆盖数学应用题、逻辑推理题以及STEM（科学、技术、工程、数学）领域的技术提示词，可适配多场景微调需求。 ## 数据集结构数据集中的每条样本包含以下字段： * **`problem`**（字符串类型）：数学、科学或编码类问题。 * **`solution`**（字符串类型）：采用长格式推理风格撰写的详细分步式解法。数据集中的解法推理结构可帮助模型理解逻辑流程、中间步骤与分层推导过程，因此该数据集非常适合需要可解释输出的进阶大语言模型（Large Language Model，LLM）训练。 ## 来源与衍生 **Turing-Reason-CoT** 是一套整合并优化后的衍生数据集，其资源源自： * 专为逻辑与数值推理任务定制的内部自定义模块化数据集。 * 基于QwQ 32B模型生成的思维链风格响应，经严格筛选与结构化处理以保障质量。 * 源自`PrimeIntellect/NuminaMath-QwQ-CoT-5M`的数学思维链数据。本数据集的构建目标为提升面向数学、科学与编码领域的大规模模型的思维链推理能力。 ## 许可证 Apache许可证2.0（Apache License 2.0）

提供机构：

maas

创建时间：

2025-09-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集