five

reasoning-degeneration-dev/algorithmic-sft-distillation-training-data-v1

收藏
Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/reasoning-degeneration-dev/algorithmic-sft-distillation-training-data-v1
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit tags: - algorithmic-sft - distillation - qwq-32b - training-data --- # algorithmic-sft-distillation-training-data-v1 QwQ-32B distillation training data for 5 algorithmic domains. Correct responses filtered by collect_distill_results.py with truncation rejection. KNOWN ISSUE: countdown domain has 74.2% QwQ repetition loops (model repeats \boxed{} answer until hitting token limit). Countdown is being regenerated with v3 pipeline (32k tokens). Other 4 domains are clean (>99% quality). ## Dataset Info - **Rows**: 24133 - **Columns**: 6 ## Columns | Column | Type | Description | |--------|------|-------------| | domain | Value('string') | Algorithmic task domain: countdown, formal_logic, long_arithmetic, conlang_morphology, cellular_automata | | question | Value('string') | Full problem prompt sent to QwQ-32B | | response | Value('string') | Raw QwQ-32B reasoning trace + answer. Includes <think> tags for chain-of-thought | | response_chars | Value('int64') | Response length in characters | | has_answer_line | Value('bool') | Whether response contains an 'Answer:' line | | quality_flag | Value('string') | clean = good response, repetition_loop = QwQ repeats answer in loop until max_tokens, no_answer_marker = no Answer: or \boxed{} found | ## Generation Parameters ```json { "script_name": "scripts/distill_offline_batch.py + scripts/collect_distill_results.py", "model": "Qwen/QwQ-32B", "student_model": "Qwen/Qwen2.5-1.5B-Instruct", "description": "QwQ-32B distillation training data for 5 algorithmic domains. Correct responses filtered by collect_distill_results.py with truncation rejection. KNOWN ISSUE: countdown domain has 74.2% QwQ repetition loops (model repeats \\boxed{} answer until hitting token limit). Countdown is being regenerated with v3 pipeline (32k tokens). Other 4 domains are clean (>99% quality).", "generation_parameters": { "temperature": 0.6, "top_p": 0.95, "max_tokens": "8192 (long_arithmetic, formal_logic) / 32768 (conlang, cellular_automata)" }, "domains": { "countdown": { "examples": 4133, "quality": "NEEDS_RERUN", "max_tokens": 8192, "difficulty": "d7" }, "formal_logic": { "examples": 5000, "quality": "CLEAN", "max_tokens": 8192, "difficulty": "d5" }, "long_arithmetic": { "examples": 5000, "quality": "CLEAN", "max_tokens": 8192, "difficulty": "d4" }, "conlang_morphology": { "examples": 5000, "quality": "CLEAN", "max_tokens": 32768, "difficulty": "d5+d7" }, "cellular_automata": { "examples": 5000, "quality": "CLEAN", "max_tokens": 32768, "difficulty": "d5" } }, "hyperparameters": {}, "input_datasets": [] } ``` ## Experiment Documentation For complete experiment details, see [https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/algorithmic_sft_vs_distillation](https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/algorithmic_sft_vs_distillation) ## Usage ```python from datasets import load_dataset dataset = load_dataset("reasoning-degeneration-dev/algorithmic-sft-distillation-training-data-v1", split="train") print(f"Loaded {len(dataset)} rows") ``` --- *This dataset is tracked in [reasoning-degeneration-dev/PROJECT-MANIFEST](https://huggingface.co/datasets/reasoning-degeneration-dev/PROJECT-MANIFEST)*

许可证:MIT 标签: - 算法监督微调(algorithmic-sft) - 知识蒸馏(distillation) - QwQ-32B - 训练数据 # 算法监督微调-知识蒸馏训练数据v1 QwQ-32B知识蒸馏训练数据,覆盖5个算法领域。经collect_distill_results.py脚本过滤并剔除截断响应后,得到有效正确回复。已知问题:倒计时(countdown)领域中存在74.2%的QwQ-32B重复循环情况(模型会重复输出oxed{}格式的答案直至达到令牌上限)。当前正使用v3流水线(32k令牌)重新生成该领域数据;其余4个领域的数据均符合质量要求(质量合格率>99%)。 ## 数据集概况 - **数据条数**:24133 - **列数**:6 ## 字段说明 | 字段名 | 数据类型 | 字段描述 | |--------|----------|----------| | domain | 值类型(字符串) | 算法任务领域,可选值包括:倒计时(countdown)、形式逻辑(formal_logic)、长算术运算(long_arithmetic)、人工语言形态学(conlang_morphology)、元胞自动机(cellular_automata) | | question | 值类型(字符串) | 发送至QwQ-32B的完整问题提示词 | | response | 值类型(字符串) | QwQ-32B生成的原始推理轨迹与答案,包含用于思维链的<think>标签 | | response_chars | 值类型(int64) | 响应内容的字符长度 | | has_answer_line | 值类型(布尔值) | 响应是否包含“Answer:”标识行 | | quality_flag | 值类型(字符串) | 质量标记:`clean`表示响应合格;`repetition_loop`表示模型循环重复答案直至达到最大令牌数;`no_answer_marker`表示未找到“Answer:”或oxed{}格式标记 | ## 生成参数 json { "脚本名称": "scripts/distill_offline_batch.py + scripts/collect_distill_results.py", "教师模型": "Qwen/QwQ-32B", "学生模型": "Qwen/Qwen2.5-1.5B-Instruct", "描述": "QwQ-32B知识蒸馏训练数据,覆盖5个算法领域。经collect_distill_results.py脚本过滤并剔除截断响应后,得到有效正确回复。已知问题:倒计时领域中存在74.2%的QwQ-32B重复循环情况(模型会重复输出oxed{}格式的答案直至达到令牌上限)。当前正使用v3流水线(32k令牌)重新生成该领域数据;其余4个领域的数据均符合质量要求(质量合格率>99%)。", "生成参数配置": { "温度系数(temperature)": 0.6, "Top-p采样率(top_p)": 0.95, "最大令牌数(max_tokens)": "8192(长算术运算、形式逻辑) / 32768(人工语言形态学、元胞自动机)" }, "各领域详情": { "countdown(倒计时)": { "样本数量": 4133, "质量状态": "需重新生成", "最大令牌数": 8192, "难度等级": "d7" }, "formal_logic(形式逻辑)": { "样本数量": 5000, "质量状态": "合格", "最大令牌数": 8192, "难度等级": "d5" }, "long_arithmetic(长算术运算)": { "样本数量": 5000, "质量状态": "合格", "最大令牌数": 8192, "难度等级": "d4" }, "conlang_morphology(人工语言形态学)": { "样本数量": 5000, "质量状态": "合格", "最大令牌数": 32768, "难度等级": "d5+d7" }, "cellular_automata(元胞自动机)": { "样本数量": 5000, "质量状态": "合格", "最大令牌数": 32768, "难度等级": "d5" } }, "超参数": {}, "输入数据集": [] } ## 实验文档 完整实验细节请参阅:[https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/algorithmic_sft_vs_distillation](https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/algorithmic_sft_vs_distillation) ## 使用方法 python # 从Hugging Face数据集Hub加载训练集 from datasets import load_dataset dataset = load_dataset("reasoning-degeneration-dev/algorithmic-sft-distillation-training-data-v1", split="train") print(f"已加载 {len(dataset)} 条数据") --- *本数据集已在[reasoning-degeneration-dev/PROJECT-MANIFEST](https://huggingface.co/datasets/reasoning-degeneration-dev/PROJECT-MANIFEST)中进行追踪备案*
提供机构:
reasoning-degeneration-dev
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作