reasoning-degeneration-dev/algorithmic-sft-distillation-training-data-v1

Name: reasoning-degeneration-dev/algorithmic-sft-distillation-training-data-v1
Creator: reasoning-degeneration-dev
Published: 2026-03-21 23:17:50
License: 暂无描述

Hugging Face2026-03-21 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/reasoning-degeneration-dev/algorithmic-sft-distillation-training-data-v1

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit tags: - algorithmic-sft - distillation - qwq-32b - training-data --- # algorithmic-sft-distillation-training-data-v1 QwQ-32B distillation training data for 5 algorithmic domains. Correct responses filtered by collect_distill_results.py with truncation rejection. KNOWN ISSUE: countdown domain has 74.2% QwQ repetition loops (model repeats \boxed{} answer until hitting token limit). Countdown is being regenerated with v3 pipeline (32k tokens). Other 4 domains are clean (>99% quality). ## Dataset Info - **Rows**: 24133 - **Columns**: 6 ## Columns | Column | Type | Description | |--------|------|-------------| | domain | Value('string') | Algorithmic task domain: countdown, formal_logic, long_arithmetic, conlang_morphology, cellular_automata | | question | Value('string') | Full problem prompt sent to QwQ-32B | | response | Value('string') | Raw QwQ-32B reasoning trace + answer. Includes <think> tags for chain-of-thought | | response_chars | Value('int64') | Response length in characters | | has_answer_line | Value('bool') | Whether response contains an 'Answer:' line | | quality_flag | Value('string') | clean = good response, repetition_loop = QwQ repeats answer in loop until max_tokens, no_answer_marker = no Answer: or \boxed{} found | ## Generation Parameters ```json { "script_name": "scripts/distill_offline_batch.py + scripts/collect_distill_results.py", "model": "Qwen/QwQ-32B", "student_model": "Qwen/Qwen2.5-1.5B-Instruct", "description": "QwQ-32B distillation training data for 5 algorithmic domains. Correct responses filtered by collect_distill_results.py with truncation rejection. KNOWN ISSUE: countdown domain has 74.2% QwQ repetition loops (model repeats \\boxed{} answer until hitting token limit). Countdown is being regenerated with v3 pipeline (32k tokens). Other 4 domains are clean (>99% quality).", "generation_parameters": { "temperature": 0.6, "top_p": 0.95, "max_tokens": "8192 (long_arithmetic, formal_logic) / 32768 (conlang, cellular_automata)" }, "domains": { "countdown": { "examples": 4133, "quality": "NEEDS_RERUN", "max_tokens": 8192, "difficulty": "d7" }, "formal_logic": { "examples": 5000, "quality": "CLEAN", "max_tokens": 8192, "difficulty": "d5" }, "long_arithmetic": { "examples": 5000, "quality": "CLEAN", "max_tokens": 8192, "difficulty": "d4" }, "conlang_morphology": { "examples": 5000, "quality": "CLEAN", "max_tokens": 32768, "difficulty": "d5+d7" }, "cellular_automata": { "examples": 5000, "quality": "CLEAN", "max_tokens": 32768, "difficulty": "d5" } }, "hyperparameters": {}, "input_datasets": [] } ``` ## Experiment Documentation For complete experiment details, see [https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/algorithmic_sft_vs_distillation](https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/algorithmic_sft_vs_distillation) ## Usage ```python from datasets import load_dataset dataset = load_dataset("reasoning-degeneration-dev/algorithmic-sft-distillation-training-data-v1", split="train") print(f"Loaded {len(dataset)} rows") ``` --- *This dataset is tracked in [reasoning-degeneration-dev/PROJECT-MANIFEST](https://huggingface.co/datasets/reasoning-degeneration-dev/PROJECT-MANIFEST)*

许可证：MIT 标签： - 算法监督微调（algorithmic-sft） - 知识蒸馏（distillation） - QwQ-32B - 训练数据 # 算法监督微调-知识蒸馏训练数据v1 QwQ-32B知识蒸馏训练数据，覆盖5个算法领域。经collect_distill_results.py脚本过滤并剔除截断响应后，得到有效正确回复。已知问题：倒计时（countdown）领域中存在74.2%的QwQ-32B重复循环情况（模型会重复输出oxed{}格式的答案直至达到令牌上限）。当前正使用v3流水线（32k令牌）重新生成该领域数据；其余4个领域的数据均符合质量要求（质量合格率>99%）。 ## 数据集概况 - **数据条数**：24133 - **列数**：6 ## 字段说明 | 字段名 | 数据类型 | 字段描述 | |--------|----------|----------| | domain | 值类型（字符串） | 算法任务领域，可选值包括：倒计时（countdown）、形式逻辑（formal_logic）、长算术运算（long_arithmetic）、人工语言形态学（conlang_morphology）、元胞自动机（cellular_automata） | | question | 值类型（字符串） | 发送至QwQ-32B的完整问题提示词 | | response | 值类型（字符串） | QwQ-32B生成的原始推理轨迹与答案，包含用于思维链的<think>标签 | | response_chars | 值类型（int64） | 响应内容的字符长度 | | has_answer_line | 值类型（布尔值） | 响应是否包含“Answer:”标识行 | | quality_flag | 值类型（字符串） | 质量标记：`clean`表示响应合格；`repetition_loop`表示模型循环重复答案直至达到最大令牌数；`no_answer_marker`表示未找到“Answer:”或oxed{}格式标记 | ## 生成参数 json { "脚本名称": "scripts/distill_offline_batch.py + scripts/collect_distill_results.py", "教师模型": "Qwen/QwQ-32B", "学生模型": "Qwen/Qwen2.5-1.5B-Instruct", "描述": "QwQ-32B知识蒸馏训练数据，覆盖5个算法领域。经collect_distill_results.py脚本过滤并剔除截断响应后，得到有效正确回复。已知问题：倒计时领域中存在74.2%的QwQ-32B重复循环情况（模型会重复输出oxed{}格式的答案直至达到令牌上限）。当前正使用v3流水线（32k令牌）重新生成该领域数据；其余4个领域的数据均符合质量要求（质量合格率>99%）。", "生成参数配置": { "温度系数（temperature）": 0.6, "Top-p采样率（top_p）": 0.95, "最大令牌数（max_tokens）": "8192（长算术运算、形式逻辑） / 32768（人工语言形态学、元胞自动机）" }, "各领域详情": { "countdown（倒计时）": { "样本数量": 4133, "质量状态": "需重新生成", "最大令牌数": 8192, "难度等级": "d7" }, "formal_logic（形式逻辑）": { "样本数量": 5000, "质量状态": "合格", "最大令牌数": 8192, "难度等级": "d5" }, "long_arithmetic（长算术运算）": { "样本数量": 5000, "质量状态": "合格", "最大令牌数": 8192, "难度等级": "d4" }, "conlang_morphology（人工语言形态学）": { "样本数量": 5000, "质量状态": "合格", "最大令牌数": 32768, "难度等级": "d5+d7" }, "cellular_automata（元胞自动机）": { "样本数量": 5000, "质量状态": "合格", "最大令牌数": 32768, "难度等级": "d5" } }, "超参数": {}, "输入数据集": [] } ## 实验文档完整实验细节请参阅：[https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/algorithmic_sft_vs_distillation](https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/algorithmic_sft_vs_distillation) ## 使用方法 python # 从Hugging Face数据集Hub加载训练集 from datasets import load_dataset dataset = load_dataset("reasoning-degeneration-dev/algorithmic-sft-distillation-training-data-v1", split="train") print(f"已加载 {len(dataset)} 条数据") --- *本数据集已在[reasoning-degeneration-dev/PROJECT-MANIFEST](https://huggingface.co/datasets/reasoning-degeneration-dev/PROJECT-MANIFEST)中进行追踪备案*

提供机构：

reasoning-degeneration-dev

5,000+

优质数据集

54 个

任务类型

进入经典数据集