reasoning-degeneration-dev/algorithmic-sft-distillation-training-data-v1
收藏Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/reasoning-degeneration-dev/algorithmic-sft-distillation-training-data-v1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
tags:
- algorithmic-sft
- distillation
- qwq-32b
- training-data
---
# algorithmic-sft-distillation-training-data-v1
QwQ-32B distillation training data for 5 algorithmic domains. Correct responses filtered by collect_distill_results.py with truncation rejection. KNOWN ISSUE: countdown domain has 74.2% QwQ repetition loops (model repeats \boxed{} answer until hitting token limit). Countdown is being regenerated with v3 pipeline (32k tokens). Other 4 domains are clean (>99% quality).
## Dataset Info
- **Rows**: 24133
- **Columns**: 6
## Columns
| Column | Type | Description |
|--------|------|-------------|
| domain | Value('string') | Algorithmic task domain: countdown, formal_logic, long_arithmetic, conlang_morphology, cellular_automata |
| question | Value('string') | Full problem prompt sent to QwQ-32B |
| response | Value('string') | Raw QwQ-32B reasoning trace + answer. Includes <think> tags for chain-of-thought |
| response_chars | Value('int64') | Response length in characters |
| has_answer_line | Value('bool') | Whether response contains an 'Answer:' line |
| quality_flag | Value('string') | clean = good response, repetition_loop = QwQ repeats answer in loop until max_tokens, no_answer_marker = no Answer: or \boxed{} found |
## Generation Parameters
```json
{
"script_name": "scripts/distill_offline_batch.py + scripts/collect_distill_results.py",
"model": "Qwen/QwQ-32B",
"student_model": "Qwen/Qwen2.5-1.5B-Instruct",
"description": "QwQ-32B distillation training data for 5 algorithmic domains. Correct responses filtered by collect_distill_results.py with truncation rejection. KNOWN ISSUE: countdown domain has 74.2% QwQ repetition loops (model repeats \\boxed{} answer until hitting token limit). Countdown is being regenerated with v3 pipeline (32k tokens). Other 4 domains are clean (>99% quality).",
"generation_parameters": {
"temperature": 0.6,
"top_p": 0.95,
"max_tokens": "8192 (long_arithmetic, formal_logic) / 32768 (conlang, cellular_automata)"
},
"domains": {
"countdown": {
"examples": 4133,
"quality": "NEEDS_RERUN",
"max_tokens": 8192,
"difficulty": "d7"
},
"formal_logic": {
"examples": 5000,
"quality": "CLEAN",
"max_tokens": 8192,
"difficulty": "d5"
},
"long_arithmetic": {
"examples": 5000,
"quality": "CLEAN",
"max_tokens": 8192,
"difficulty": "d4"
},
"conlang_morphology": {
"examples": 5000,
"quality": "CLEAN",
"max_tokens": 32768,
"difficulty": "d5+d7"
},
"cellular_automata": {
"examples": 5000,
"quality": "CLEAN",
"max_tokens": 32768,
"difficulty": "d5"
}
},
"hyperparameters": {},
"input_datasets": []
}
```
## Experiment Documentation
For complete experiment details, see [https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/algorithmic_sft_vs_distillation](https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/algorithmic_sft_vs_distillation)
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("reasoning-degeneration-dev/algorithmic-sft-distillation-training-data-v1", split="train")
print(f"Loaded {len(dataset)} rows")
```
---
*This dataset is tracked in [reasoning-degeneration-dev/PROJECT-MANIFEST](https://huggingface.co/datasets/reasoning-degeneration-dev/PROJECT-MANIFEST)*
许可证:MIT
标签:
- 算法监督微调(algorithmic-sft)
- 知识蒸馏(distillation)
- QwQ-32B
- 训练数据
# 算法监督微调-知识蒸馏训练数据v1
QwQ-32B知识蒸馏训练数据,覆盖5个算法领域。经collect_distill_results.py脚本过滤并剔除截断响应后,得到有效正确回复。已知问题:倒计时(countdown)领域中存在74.2%的QwQ-32B重复循环情况(模型会重复输出oxed{}格式的答案直至达到令牌上限)。当前正使用v3流水线(32k令牌)重新生成该领域数据;其余4个领域的数据均符合质量要求(质量合格率>99%)。
## 数据集概况
- **数据条数**:24133
- **列数**:6
## 字段说明
| 字段名 | 数据类型 | 字段描述 |
|--------|----------|----------|
| domain | 值类型(字符串) | 算法任务领域,可选值包括:倒计时(countdown)、形式逻辑(formal_logic)、长算术运算(long_arithmetic)、人工语言形态学(conlang_morphology)、元胞自动机(cellular_automata) |
| question | 值类型(字符串) | 发送至QwQ-32B的完整问题提示词 |
| response | 值类型(字符串) | QwQ-32B生成的原始推理轨迹与答案,包含用于思维链的<think>标签 |
| response_chars | 值类型(int64) | 响应内容的字符长度 |
| has_answer_line | 值类型(布尔值) | 响应是否包含“Answer:”标识行 |
| quality_flag | 值类型(字符串) | 质量标记:`clean`表示响应合格;`repetition_loop`表示模型循环重复答案直至达到最大令牌数;`no_answer_marker`表示未找到“Answer:”或oxed{}格式标记 |
## 生成参数
json
{
"脚本名称": "scripts/distill_offline_batch.py + scripts/collect_distill_results.py",
"教师模型": "Qwen/QwQ-32B",
"学生模型": "Qwen/Qwen2.5-1.5B-Instruct",
"描述": "QwQ-32B知识蒸馏训练数据,覆盖5个算法领域。经collect_distill_results.py脚本过滤并剔除截断响应后,得到有效正确回复。已知问题:倒计时领域中存在74.2%的QwQ-32B重复循环情况(模型会重复输出oxed{}格式的答案直至达到令牌上限)。当前正使用v3流水线(32k令牌)重新生成该领域数据;其余4个领域的数据均符合质量要求(质量合格率>99%)。",
"生成参数配置": {
"温度系数(temperature)": 0.6,
"Top-p采样率(top_p)": 0.95,
"最大令牌数(max_tokens)": "8192(长算术运算、形式逻辑) / 32768(人工语言形态学、元胞自动机)"
},
"各领域详情": {
"countdown(倒计时)": {
"样本数量": 4133,
"质量状态": "需重新生成",
"最大令牌数": 8192,
"难度等级": "d7"
},
"formal_logic(形式逻辑)": {
"样本数量": 5000,
"质量状态": "合格",
"最大令牌数": 8192,
"难度等级": "d5"
},
"long_arithmetic(长算术运算)": {
"样本数量": 5000,
"质量状态": "合格",
"最大令牌数": 8192,
"难度等级": "d4"
},
"conlang_morphology(人工语言形态学)": {
"样本数量": 5000,
"质量状态": "合格",
"最大令牌数": 32768,
"难度等级": "d5+d7"
},
"cellular_automata(元胞自动机)": {
"样本数量": 5000,
"质量状态": "合格",
"最大令牌数": 32768,
"难度等级": "d5"
}
},
"超参数": {},
"输入数据集": []
}
## 实验文档
完整实验细节请参阅:[https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/algorithmic_sft_vs_distillation](https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/algorithmic_sft_vs_distillation)
## 使用方法
python
# 从Hugging Face数据集Hub加载训练集
from datasets import load_dataset
dataset = load_dataset("reasoning-degeneration-dev/algorithmic-sft-distillation-training-data-v1", split="train")
print(f"已加载 {len(dataset)} 条数据")
---
*本数据集已在[reasoning-degeneration-dev/PROJECT-MANIFEST](https://huggingface.co/datasets/reasoning-degeneration-dev/PROJECT-MANIFEST)中进行追踪备案*
提供机构:
reasoning-degeneration-dev



