agentlans/OpenSeek-Text-Generation-Reasoning

Name: agentlans/OpenSeek-Text-Generation-Reasoning
Creator: agentlans
Published: 2026-04-24 11:29:02
License: 暂无描述

Hugging Face2026-04-24 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/agentlans/OpenSeek-Text-Generation-Reasoning

下载链接

链接失效反馈

官方服务：

资源简介：

OpenSeek-Text-Generation-Reasoning数据集是基于BAAI/OpenSeek-Synthetic-Reasoning-Data-Examples数据集CC子集的精炼版本，专门针对带有详细思维链(CoT)推理的文本生成任务进行了优化。该数据集主要适用于监督微调(SFT)，通过清晰分离初始提示、逐步推理过程(thought)和最终输出，帮助模型学习如何生成结构化的、要求严格的文本。数据集包含1,267,534行数据，格式为JSONL/Parquet，语言为英语，采用CC-BY-SA 4.0许可证。每个条目包含question(用户指令)、thought(逐步推理过程)、answer(最终生成的文本)和source(来源元数据)字段。数据集经过了去重、清理(去除缺失字段或空值)和格式化等处理，以确保高质量。

OpenSeek-Text-Generation-Reasoning is a refined version of the CC subset of BAAI/OpenSeek-Synthetic-Reasoning-Data-Examples, focusing on text generation tasks accompanied by detailed Chain-of-Thought (CoT) reasoning. Adapted specifically for Supervised Fine-Tuning (SFT), this dataset features clean structural separation of the initial prompt, step-by-step reasoning process (thought), and final output to teach models not just what to generate but how. It contains 1,267,534 rows in JSONL/Parquet format, is in English, and licensed under CC-BY-SA 4.0. Each entry includes question (user instruction), thought (reasoning process), answer (final output), and source (origin metadata). The dataset underwent deduplication, cleaning (removing null/missing fields), and formatting for high-quality instruction tuning.

提供机构：

agentlans

5,000+

优质数据集

54 个

任务类型

进入经典数据集