five

WeihangSu/SRA-Bench

收藏
Hugging Face2026-04-22 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/WeihangSu/SRA-Bench
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - question-answering - text-generation language: - en tags: - skill-retrieval - llm-agents - tool-use - benchmark - reasoning pretty_name: SRA-Bench configs: - config_name: instances data_files: - split: theoremqa path: instances/theoremqa.json - split: logicbench path: instances/logicbench.json - split: toolqa path: instances/toolqa.json - split: medcalcbench path: instances/medcalcbench.json - split: champ path: instances/champ.json - split: bigcodebench path: instances/bigcodebench.json - config_name: corpus data_files: - split: corpus path: corpus/corpus.json --- # SRA-Bench A benchmark for **skill-retrieval-augmented LLM agents** (paper: *Skill-Retrieval Augmented Agents*). Code and baselines live at [github.com/oneal2000/SR-Agents](https://github.com/oneal2000/SR-Agents). **5,400 test instances · 636 gold skills** embedded in a skill library of **26,262 skills** (2.4% gold, 25,626 web-collected distractors). | Dataset | Capability Type | #Inst. | #Skills | Skill Mapping | Evaluation | |---|---|---:|---:|---|---| | TheoremQA | Theorem Application | 747 | 320 | Single | Rule-Based | | LogicBench | Logical Reasoning Patterns | 760 | 19 | Single | Rule-Based | | ToolQA | Tool-Use Workflows | 1,430 | 14 | Single | Rule-Based | | MedCalc-Bench | Medical Calculators | 1,100 | 55 | Single | Rule-Based | | CHAMP | Mathematical Concepts | 223 | 89 | Multi | Rule-Based | | BigCodeBench | Software Libraries | 1,140 | 139 | Multi | Execution | ## Files ``` corpus/corpus.json # skill library (array of skills) instances/{dataset}.json # per-dataset test sets (array of instances) ``` - **Skill**: `{skill_id, name, description, content, tools?}` - **Instance**: `{instance_id, dataset, question, skill_annotations, eval_data}` `eval_data` fields vary per dataset (answer, units, tolerance, etc.) and are consumed by the per-dataset evaluator in SR-Agents. ## Download ```python from huggingface_hub import snapshot_download snapshot_download(repo_id="WeihangSu/SRA-Bench", repo_type="dataset", local_dir="data/bench") ``` The resulting layout matches what SR-Agents expects under `data/bench/`.
提供机构:
WeihangSu
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作