five

reasoning-degeneration-dev/algorithmic-sft-eval-sets-v1

收藏
Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/reasoning-degeneration-dev/algorithmic-sft-eval-sets-v1
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit tags: - algorithmic-sft - evaluation-data - algorithmic-sft-vs-distillation --- # algorithmic-sft-eval-sets-v1 Evaluation sets for algorithmic SFT experiment: test (1000/domain), val (200/domain), harder variant (500/domain), structural OOD (500/domain). 5 domains total. ## Dataset Info - **Rows**: 11000 - **Columns**: 9 ## Columns | Column | Type | Description | |--------|------|-------------| | question | Value('string') | The problem statement | | answer | Value('string') | The correct answer | | sft_trace | Value('null') | Reference algorithmic trace (golden solution) | | difficulty | Value('int64') | Difficulty level used for generation | | algorithm | Value('string') | Algorithm variant | | task | Value('string') | Domain name: cellular_automata, conlang_morphology, countdown, formal_logic, long_arithmetic | | metadata | Value('string') | Generation metadata as JSON string (task-specific parameters, schema varies by domain) | | split_type | Value('string') | Evaluation split: test (in-distribution), val (validation), test_harder (harder variant), test_ood (structural OOD) | | source_file | Value('string') | Original filename | ## Generation Parameters ```json { "script_name": "upload_eval_sets.py", "description": "Evaluation sets for algorithmic SFT experiment: test (1000/domain), val (200/domain), harder variant (500/domain), structural OOD (500/domain). 5 domains total.", "model": "N/A (programmatically generated evaluation problems)", "hyperparameters": {}, "input_datasets": [] } ``` ## Experiment Documentation For complete experiment details, see [https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/algorithmic_sft_vs_distillation](https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/algorithmic_sft_vs_distillation) ## Usage ```python from datasets import load_dataset dataset = load_dataset("reasoning-degeneration-dev/algorithmic-sft-eval-sets-v1", split="train") print(f"Loaded {len(dataset)} rows") ``` --- *This dataset is tracked in [reasoning-degeneration-dev/PROJECT-MANIFEST](https://huggingface.co/datasets/reasoning-degeneration-dev/PROJECT-MANIFEST)*

许可证:MIT许可证 标签: - 算法监督微调(algorithmic Supervised Fine-Tuning, SFT) - 评估数据集 - 算法监督微调 vs 知识蒸馏 # 算法监督微调评估集v1 本数据集为算法监督微调实验配套的评估集,共包含5个领域,各领域分别设置:分布内测试集(1000条/领域)、验证集(200条/领域)、困难变体集(500条/领域)、结构性分布外(Out-of-Distribution, OOD)测试集(500条/领域)。 ## 数据集信息 - **总数据行数**:11000 - **总列数**:9 ## 列信息 | 列名 | 数据类型 | 描述 | |------|----------|------| | question | 字符串类型 | 问题题干 | | answer | 字符串类型 | 标准答案 | | sft_trace | 空值类型 | 参考算法轨迹(黄金解决方案) | | difficulty | 64位整型 | 生成时使用的难度等级 | | algorithm | 字符串类型 | 算法变体 | | task | 字符串类型 | 领域名称,包含:细胞自动机(cellular_automata)、构词语言形态学(conlang_morphology)、倒计时游戏(countdown)、形式逻辑(formal_logic)、长算术运算(long_arithmetic) | | metadata | 字符串类型 | 以JSON字符串形式存储的生成元数据,包含任务专属参数,各领域的schema存在差异 | | split_type | 字符串类型 | 评估集拆分类型:test(分布内测试集)、val(验证集)、test_harder(困难变体测试集)、test_ood(结构性分布外测试集) | | source_file | 字符串类型 | 原始文件名 | ## 生成参数 json { "script_name": "upload_eval_sets.py", "description": "算法监督微调实验配套评估集:各领域分别设置测试集(1000条)、验证集(200条)、困难变体集(500条)、结构性分布外测试集(500条),共包含5个领域。", "model": "N/A(通过编程生成评估题目)", "hyperparameters": {}, "input_datasets": [] } ## 实验文档 完整实验细节可参阅:https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/algorithmic_sft_vs_distillation ## 使用方法 python from datasets import load_dataset dataset = load_dataset("reasoning-degeneration-dev/algorithmic-sft-eval-sets-v1", split="train") print(f"已加载 {len(dataset)} 条数据") --- *本数据集已在 [reasoning-degeneration-dev/PROJECT-MANIFEST](https://huggingface.co/datasets/reasoning-degeneration-dev/PROJECT-MANIFEST) 中进行追踪
提供机构:
reasoning-degeneration-dev
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作