Turing-Reason-CoT
收藏魔搭社区2025-12-03 更新2025-11-29 收录
下载链接:
https://modelscope.cn/datasets/prithivMLmods/Turing-Reason-CoT
下载链接
链接失效反馈官方服务:
资源简介:

# **Turing-Reason-CoT**
> **Turing-Reason-CoT** is a high-quality, compact chain-of-thought reasoning dataset curated for tasks in mathematics, science, and coding. While the dataset spans diverse domains, it is primarily driven by mathematical reasoning, reflecting a major share of math-focused prompts and long-form logical solutions.
## Quick Start with Hugging Face Datasets🤗
```py
pip install -U datasets
```
```py
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/Turing-Reason-CoT", split="train")
```
## Overview
* **Total Samples**: \~ 4,993,463
* **Split**: `train` only
* **Languages**: English
* **Format**: Apache Arrow (auto-converted to Parquet)
* **License**: Apache-2.0
* **Tags**: `math`, `code`, `science`, `reasoning`, `longcot`
## Highlights
* Structured to promote **long-form, step-by-step reasoning**, ideal for training and evaluating chain-of-thought (CoT) capable models.
* Reasoning traces include natural, human-like explanations for both simple and complex problems.
* Fine-tuned across math word problems, logic-based questions, and technical prompts from STEM domains.
## Dataset Structure
Each entry in the dataset includes:
* **`problem`** (string): A math, science, or code problem.
* **`solution`** (string): A detailed step-by-step solution crafted in a long-form reasoning style.
The reasoning structure in solutions helps models understand logical flow, intermediate steps, and layered deductions—making this dataset suitable for advanced LLMs requiring interpretable outputs.
## Source & Derivation
**Turing-Reason-CoT** is a derivative collection synthesized and optimized from:
* A custom internal modular dataset tailored for logical and numeric reasoning tasks.
* Chain-of-thought style responses generated with QwQ 32B-based models, carefully filtered and structured for quality.
* Math chain-of-thoughts from PrimeIntellect/NuminaMath-QwQ-CoT-5M.
This dataset was created with a focus on enhancing CoT capabilities in large-scale models working on math, science, and code.
## License
Apache License 2.0

# **Turing-Reason-CoT**
> **Turing-Reason-CoT** 是一款高质量、轻量化的思维链(Chain-of-Thought,CoT)推理数据集,专为数学、科学与编码领域的任务精心遴选构建。尽管该数据集覆盖多元应用领域,但核心聚焦于数学推理,其中数学相关提示与长格式逻辑解法占比最高。
## Hugging Face 数据集快速入门🤗
py
pip install -U datasets
py
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/Turing-Reason-CoT", split="train")
## 概览
* **总样本量**:约4,993,463条
* **数据集划分**:仅包含`train`(训练集)划分
* **语言**:英语
* **数据格式**:Apache Arrow格式(自动转换为Parquet格式)
* **许可证**:Apache-2.0
* **标签**:`math`、`code`、`science`、`reasoning`、`longcot`
## 核心亮点
* 采用结构化设计,支持**长格式、分步式推理**,是训练与评估具备思维链推理能力模型的理想数据集。
* 推理轨迹包含贴近人类思维的自然解释,覆盖简单与复杂各类问题场景。
* 覆盖数学应用题、逻辑推理题以及STEM(科学、技术、工程、数学)领域的技术提示词,可适配多场景微调需求。
## 数据集结构
数据集中的每条样本包含以下字段:
* **`problem`**(字符串类型):数学、科学或编码类问题。
* **`solution`**(字符串类型):采用长格式推理风格撰写的详细分步式解法。
数据集中的解法推理结构可帮助模型理解逻辑流程、中间步骤与分层推导过程,因此该数据集非常适合需要可解释输出的进阶大语言模型(Large Language Model,LLM)训练。
## 来源与衍生
**Turing-Reason-CoT** 是一套整合并优化后的衍生数据集,其资源源自:
* 专为逻辑与数值推理任务定制的内部自定义模块化数据集。
* 基于QwQ 32B模型生成的思维链风格响应,经严格筛选与结构化处理以保障质量。
* 源自`PrimeIntellect/NuminaMath-QwQ-CoT-5M`的数学思维链数据。
本数据集的构建目标为提升面向数学、科学与编码领域的大规模模型的思维链推理能力。
## 许可证
Apache许可证2.0(Apache License 2.0)
提供机构:
maas
创建时间:
2025-09-15



