insym/GLM-5.1-Reasoning-1M-Cleaned
收藏Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/insym/GLM-5.1-Reasoning-1M-Cleaned
下载链接
链接失效反馈官方服务:
资源简介:
GLM-5.1-Reasoning-1M-Cleaned 是一个经过清理和重新格式化的数据集,源自 GLM-5.1-1000000x。它保留了原始数据集的四个子集(main、PHD-Science、Multilingual-STEM、Math),并将每个示例转换为统一的 SFT-ready 模式,包含明确的 conversations、input、output、domain 和 meta 字段。该数据集主要用于文本生成和问答任务,特别关注推理、思维链、指令调整等领域。清理过程中移除了不完整、重复或无法解析的记录,确保了数据质量。数据集包含 746,321 条记录,覆盖多种语言(如英语和中文)和多个学科领域。
GLM-5.1-Reasoning-1M-Cleaned is a cleaned and reformatted derivative of the GLM-5.1-1000000x dataset. It preserves the original four-subset layout (main, PHD-Science, Multilingual-STEM, Math) while converting every example into a unified SFT-ready schema with explicit conversations, input, output, domain, and meta fields. The dataset is designed for text-generation and question-answering tasks, with a focus on reasoning, chain-of-thought, instruction-tuning, and other related areas. The cleaning process removed incomplete, duplicated, or unparseable records, ensuring high data quality. The dataset contains 746,321 records, covering multiple languages (e.g., English and Chinese) and various academic domains.
提供机构:
insym



