agentlans/OpenSeek-Text-Generation-Reasoning
收藏Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/agentlans/OpenSeek-Text-Generation-Reasoning
下载链接
链接失效反馈官方服务:
资源简介:
OpenSeek-Text-Generation-Reasoning数据集是基于BAAI/OpenSeek-Synthetic-Reasoning-Data-Examples数据集CC子集的精炼版本,专门针对带有详细思维链(CoT)推理的文本生成任务进行了优化。该数据集主要适用于监督微调(SFT),通过清晰分离初始提示、逐步推理过程(thought)和最终输出,帮助模型学习如何生成结构化的、要求严格的文本。数据集包含1,267,534行数据,格式为JSONL/Parquet,语言为英语,采用CC-BY-SA 4.0许可证。每个条目包含question(用户指令)、thought(逐步推理过程)、answer(最终生成的文本)和source(来源元数据)字段。数据集经过了去重、清理(去除缺失字段或空值)和格式化等处理,以确保高质量。
OpenSeek-Text-Generation-Reasoning is a refined version of the CC subset of BAAI/OpenSeek-Synthetic-Reasoning-Data-Examples, focusing on text generation tasks accompanied by detailed Chain-of-Thought (CoT) reasoning. Adapted specifically for Supervised Fine-Tuning (SFT), this dataset features clean structural separation of the initial prompt, step-by-step reasoning process (thought), and final output to teach models not just what to generate but how. It contains 1,267,534 rows in JSONL/Parquet format, is in English, and licensed under CC-BY-SA 4.0. Each entry includes question (user instruction), thought (reasoning process), answer (final output), and source (origin metadata). The dataset underwent deduplication, cleaning (removing null/missing fields), and formatting for high-quality instruction tuning.
提供机构:
agentlans



