five

agentlans/OpenSeek-Text-Generation-Reasoning

收藏
Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/agentlans/OpenSeek-Text-Generation-Reasoning
下载链接
链接失效反馈
官方服务:
资源简介:
OpenSeek-Text-Generation-Reasoning数据集是基于BAAI/OpenSeek-Synthetic-Reasoning-Data-Examples数据集CC子集的精炼版本,专门针对带有详细思维链(CoT)推理的文本生成任务进行了优化。该数据集主要适用于监督微调(SFT),通过清晰分离初始提示、逐步推理过程(thought)和最终输出,帮助模型学习如何生成结构化的、要求严格的文本。数据集包含1,267,534行数据,格式为JSONL/Parquet,语言为英语,采用CC-BY-SA 4.0许可证。每个条目包含question(用户指令)、thought(逐步推理过程)、answer(最终生成的文本)和source(来源元数据)字段。数据集经过了去重、清理(去除缺失字段或空值)和格式化等处理,以确保高质量。

OpenSeek-Text-Generation-Reasoning is a refined version of the CC subset of BAAI/OpenSeek-Synthetic-Reasoning-Data-Examples, focusing on text generation tasks accompanied by detailed Chain-of-Thought (CoT) reasoning. Adapted specifically for Supervised Fine-Tuning (SFT), this dataset features clean structural separation of the initial prompt, step-by-step reasoning process (thought), and final output to teach models not just what to generate but how. It contains 1,267,534 rows in JSONL/Parquet format, is in English, and licensed under CC-BY-SA 4.0. Each entry includes question (user instruction), thought (reasoning process), answer (final output), and source (origin metadata). The dataset underwent deduplication, cleaning (removing null/missing fields), and formatting for high-quality instruction tuning.
提供机构:
agentlans
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作