five

knowurknottty/GLM-5.1-Reasoning-1M-Cleaned

收藏
Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/knowurknottty/GLM-5.1-Reasoning-1M-Cleaned
下载链接
链接失效反馈
官方服务:
资源简介:
GLM-5.1-Reasoning-1M-Cleaned是一个经过清理和重新格式化的数据集,源自Kassadin88/GLM-5.1-1000000x。它保留了原始数据集的四个子集(main、PHD-Science、Multilingual-STEM、Math),并将每个示例转换为统一的SFT-ready模式,包含明确的conversations、input、output、domain和meta字段。该数据集主要用于文本生成和问答任务,特别关注推理、思维链、指令调优和蒸馏技术。清理过程移除了不完整、重复或无法解析的记录,确保了数据质量。数据集包含746,321条记录,覆盖通用推理、研究生级科学、多语言STEM和数学等领域。

GLM-5.1-Reasoning-1M-Cleaned is a cleaned and reformatted derivative of Kassadin88/GLM-5.1-1000000x. It preserves the original four-subset layout (main, PHD-Science, Multilingual-STEM, Math) while converting every example into a unified SFT-ready schema with explicit conversations, input, output, domain, and meta fields. The dataset is designed for text-generation and question-answering tasks, with a focus on reasoning, chain-of-thought, instruction-tuning, and distillation. The cleaning process removed incomplete, duplicated, or unparseable records, resulting in a high-quality dataset. It contains 746,321 records covering general reasoning, graduate-level science, multilingual STEM, and mathematics.
提供机构:
knowurknottty
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作