WeihangSu/SRA-Bench
收藏Hugging Face2026-04-22 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/WeihangSu/SRA-Bench
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- question-answering
- text-generation
language:
- en
tags:
- skill-retrieval
- llm-agents
- tool-use
- benchmark
- reasoning
pretty_name: SRA-Bench
configs:
- config_name: instances
data_files:
- split: theoremqa
path: instances/theoremqa.json
- split: logicbench
path: instances/logicbench.json
- split: toolqa
path: instances/toolqa.json
- split: medcalcbench
path: instances/medcalcbench.json
- split: champ
path: instances/champ.json
- split: bigcodebench
path: instances/bigcodebench.json
- config_name: corpus
data_files:
- split: corpus
path: corpus/corpus.json
---
# SRA-Bench
A benchmark for **skill-retrieval-augmented LLM agents** (paper:
*Skill-Retrieval Augmented Agents*). Code and baselines live at
[github.com/oneal2000/SR-Agents](https://github.com/oneal2000/SR-Agents).
**5,400 test instances · 636 gold skills** embedded in a skill library
of **26,262 skills** (2.4% gold, 25,626 web-collected distractors).
| Dataset | Capability Type | #Inst. | #Skills | Skill Mapping | Evaluation |
|---|---|---:|---:|---|---|
| TheoremQA | Theorem Application | 747 | 320 | Single | Rule-Based |
| LogicBench | Logical Reasoning Patterns | 760 | 19 | Single | Rule-Based |
| ToolQA | Tool-Use Workflows | 1,430 | 14 | Single | Rule-Based |
| MedCalc-Bench | Medical Calculators | 1,100 | 55 | Single | Rule-Based |
| CHAMP | Mathematical Concepts | 223 | 89 | Multi | Rule-Based |
| BigCodeBench | Software Libraries | 1,140 | 139 | Multi | Execution |
## Files
```
corpus/corpus.json # skill library (array of skills)
instances/{dataset}.json # per-dataset test sets (array of instances)
```
- **Skill**: `{skill_id, name, description, content, tools?}`
- **Instance**: `{instance_id, dataset, question, skill_annotations, eval_data}`
`eval_data` fields vary per dataset (answer, units, tolerance, etc.)
and are consumed by the per-dataset evaluator in SR-Agents.
## Download
```python
from huggingface_hub import snapshot_download
snapshot_download(repo_id="WeihangSu/SRA-Bench", repo_type="dataset",
local_dir="data/bench")
```
The resulting layout matches what SR-Agents expects under `data/bench/`.
提供机构:
WeihangSu



