five

Jongsim/GLM-5.1-Reasoning-1M-filtered

收藏
Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Jongsim/GLM-5.1-Reasoning-1M-filtered
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation language: - en - ko - multilingual tags: - reasoning - math - science - stem - filtered size_categories: - 100K<n<1M --- # GLM-5.1-Reasoning-1M-filtered Refusal-filtered subset of `zai-org/GLM-5.1-Reasoning-1M`. ## Filtering Conservative regex/string match for explicit refusal patterns (e.g., "I cannot", "I can't", "I'm not able to", "as an AI", "죄송합니다", etc.) applied on the assistant response field. Drop ratio: ~0.00% (very few refusals in source). ## Files | File | Size | Records (kept) | |---|---|---| | `main.jsonl` | 19 GB | (full main subset, refusals removed) | | `Math.jsonl` | 4.1 GB | mathematics reasoning | | `PHD-Science.jsonl` | 3.8 GB | 103,705 | | `Multilingual-STEM.jsonl` | 3.6 GB | 92,781 | Total: 30 GB JSONL. ## Source - Original: [zai-org/GLM-5.1-Reasoning-1M](https://huggingface.co/datasets/zai-org/GLM-5.1-Reasoning-1M) - Filter script: `filter_glm_refusals.py` (sequential to avoid HDD I/O contention) ## License Inherits Apache-2.0 from the source dataset.
提供机构:
Jongsim
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作