STORM-BORN
收藏STORM-BORN 数据集概述
数据集简介
- 名称: STORM-BORN
- 类型: 数学推导数据集
- 特点: 专注于密集、近似丰富的推导,包含启发式线索
- 来源: 最新学术论文,经人类数学家通过多代理、人机交互框架审核
- 用途:
- 微调大型语言模型(LLMs),增强其推理泛化能力
- 评估模型推理能力的基准
数据集内容
- 文件结构:
data/storm_born_top100.jsonl: 100个最困难的问题(来自2000个样本)data/storm_born_top100_choice.jsonl: 转换为多选题格式的数据
- 数据格式: jsonc { "paper": "数据来源", "question": "数学推导/证明的问题", "whole_label": "人类风格的推导/证明" }
数据处理与生成
-
数据清理: bash python data_generation/clean_data.py --input raw_outputs.jsonl --output data/storm-born.jsonl
-
数据生成: bash python data_generation/generate_v1.py --config configs/gen_v1.yaml --output-dir data/tmp
评估方法
基准评估
-
LLM-as-Judge: bash python data_evaluation/benchmark_evaluation/llm_as_judge.py --dataset data/storm-born.jsonl --model gpt-4 --output results/benchmark.json
-
多选题评估: bash python data_evaluation/benchmark_evaluation/multiple_choice_eval.py --dataset data/storm-born-choice.jsonl --model gpt-4 --output results/benchmark.json
下游任务评估
-
同分布(i.i.d)评估: bash python data_evaluation/i.i.d_evaluation/eval_iid.py --model_path checkpoints/storm-born-sft --dataset data/iid_task.jsonl --output results/iid_results.json
-
非分布(o.o.d)评估: bash python data_evaluation/o.o.d_evaluation/eval_ood.py --model_path checkpoints/storm-born-sft --dataset data/ood_task.jsonl --output results/ood_results.json
微调(SFT)
- 框架: Axolotl
- 命令: bash cd train/axolotl python train.py --model_name_or_path elephantai/llama-13b --data_path ../../data/storm-born.jsonl --output_dir ../../checkpoints/storm-born-sft --batch_size 4 --epochs 3 --lr 2e-5
引用
bibtex @inproceedings{liu2025stormborn, title = {{STORM}-{BORN}: A Challenging Mathematical Derivations Dataset Curated via a Human-in-the-Loop Multi-Agent Framework}, author = {Liu, Wenhao and Lu, Zhenyi and Hu, Xinyu and Zhang, Jerry and Li, Dailin and Cen, Jiacheng and Cao, Huilin and Wang, Haiteng and Li, Yuhan and Xie, Kun and Li, Dandan and Zhang, Pei and Zhang, Chengbo and Ren, Yuxiang and Ma, Yan and Huang, Xiaohong}, booktitle = {The 63rd Annual Meeting of the Association for Computational Linguistics}, year = {2025}, url = {https://github.com/lwhere/STORM-BORN} }
许可证
- 类型: MIT License

- 1STORM-BORN: A Challenging Mathematical Derivations Dataset Curated via a Human-in-the-Loop Multi-Agent Framework北京邮电大学 · 2025年



