abhishekgahlot/GLM-5.1-1000000x
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/abhishekgahlot/GLM-5.1-1000000x
下载链接
链接失效反馈官方服务:
资源简介:
GLM-5.1-1000000x是一个由GLM-5.1模型生成的推理轨迹数据集,包含1,003,589个条目。每个条目包含完整的思维链(chain-of-thought)推理轨迹和最终答案。数据集分为四个子集:main(通用推理和指令遵循,占59.6%)、Math(数学,占20.8%)、PHD-Science(研究生级物理、化学、生物,占10.3%)和Multilingual-STEM(多语言STEM,占9.3%)。数据集总估计token数约为5.36B,平均每个记录约5,338个token。支持英语和中文,主要用于文本生成和问答任务。
GLM-5.1-1000000x is a dataset of reasoning traces generated by the GLM-5.1 model, containing 1,003,589 entries. Each entry includes a full chain-of-thought reasoning trace followed by the final answer. The dataset is divided into four subsets: main (general reasoning & instruction-following, 59.6%), Math (mathematics, 20.8%), PHD-Science (graduate-level Physics, Chemistry, Biology, 10.3%), and Multilingual-STEM (STEM in Chinese, English & other languages, 9.3%). The total estimated tokens are ~5.36B, with an average of ~5,338 tokens per record. It supports English and Chinese and is primarily used for text-generation and question-answering tasks.
提供机构:
abhishekgahlot



