reasoning-degeneration-dev/algorithmic-sft-eval-sets-v1
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/reasoning-degeneration-dev/algorithmic-sft-eval-sets-v1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
tags:
- algorithmic-sft
- evaluation-data
- algorithmic-sft-vs-distillation
---
# algorithmic-sft-eval-sets-v1
Evaluation sets for algorithmic SFT experiment: test (1000/domain), val (200/domain), harder variant (500/domain), structural OOD (500/domain). 5 domains total.
## Dataset Info
- **Rows**: 11000
- **Columns**: 9
## Columns
| Column | Type | Description |
|--------|------|-------------|
| question | Value('string') | The problem statement |
| answer | Value('string') | The correct answer |
| sft_trace | Value('null') | Reference algorithmic trace (golden solution) |
| difficulty | Value('int64') | Difficulty level used for generation |
| algorithm | Value('string') | Algorithm variant |
| task | Value('string') | Domain name: cellular_automata, conlang_morphology, countdown, formal_logic, long_arithmetic |
| metadata | Value('string') | Generation metadata as JSON string (task-specific parameters, schema varies by domain) |
| split_type | Value('string') | Evaluation split: test (in-distribution), val (validation), test_harder (harder variant), test_ood (structural OOD) |
| source_file | Value('string') | Original filename |
## Generation Parameters
```json
{
"script_name": "upload_eval_sets.py",
"description": "Evaluation sets for algorithmic SFT experiment: test (1000/domain), val (200/domain), harder variant (500/domain), structural OOD (500/domain). 5 domains total.",
"model": "N/A (programmatically generated evaluation problems)",
"hyperparameters": {},
"input_datasets": []
}
```
## Experiment Documentation
For complete experiment details, see [https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/algorithmic_sft_vs_distillation](https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/algorithmic_sft_vs_distillation)
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("reasoning-degeneration-dev/algorithmic-sft-eval-sets-v1", split="train")
print(f"Loaded {len(dataset)} rows")
```
---
*This dataset is tracked in [reasoning-degeneration-dev/PROJECT-MANIFEST](https://huggingface.co/datasets/reasoning-degeneration-dev/PROJECT-MANIFEST)*
许可证:MIT许可证
标签:
- 算法监督微调(algorithmic Supervised Fine-Tuning, SFT)
- 评估数据集
- 算法监督微调 vs 知识蒸馏
# 算法监督微调评估集v1
本数据集为算法监督微调实验配套的评估集,共包含5个领域,各领域分别设置:分布内测试集(1000条/领域)、验证集(200条/领域)、困难变体集(500条/领域)、结构性分布外(Out-of-Distribution, OOD)测试集(500条/领域)。
## 数据集信息
- **总数据行数**:11000
- **总列数**:9
## 列信息
| 列名 | 数据类型 | 描述 |
|------|----------|------|
| question | 字符串类型 | 问题题干 |
| answer | 字符串类型 | 标准答案 |
| sft_trace | 空值类型 | 参考算法轨迹(黄金解决方案) |
| difficulty | 64位整型 | 生成时使用的难度等级 |
| algorithm | 字符串类型 | 算法变体 |
| task | 字符串类型 | 领域名称,包含:细胞自动机(cellular_automata)、构词语言形态学(conlang_morphology)、倒计时游戏(countdown)、形式逻辑(formal_logic)、长算术运算(long_arithmetic) |
| metadata | 字符串类型 | 以JSON字符串形式存储的生成元数据,包含任务专属参数,各领域的schema存在差异 |
| split_type | 字符串类型 | 评估集拆分类型:test(分布内测试集)、val(验证集)、test_harder(困难变体测试集)、test_ood(结构性分布外测试集) |
| source_file | 字符串类型 | 原始文件名 |
## 生成参数
json
{
"script_name": "upload_eval_sets.py",
"description": "算法监督微调实验配套评估集:各领域分别设置测试集(1000条)、验证集(200条)、困难变体集(500条)、结构性分布外测试集(500条),共包含5个领域。",
"model": "N/A(通过编程生成评估题目)",
"hyperparameters": {},
"input_datasets": []
}
## 实验文档
完整实验细节可参阅:https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/algorithmic_sft_vs_distillation
## 使用方法
python
from datasets import load_dataset
dataset = load_dataset("reasoning-degeneration-dev/algorithmic-sft-eval-sets-v1", split="train")
print(f"已加载 {len(dataset)} 条数据")
---
*本数据集已在 [reasoning-degeneration-dev/PROJECT-MANIFEST](https://huggingface.co/datasets/reasoning-degeneration-dev/PROJECT-MANIFEST) 中进行追踪
提供机构:
reasoning-degeneration-dev



