acp_bench
收藏魔搭社区2025-11-27 更新2025-10-04 收录
下载链接:
https://modelscope.cn/datasets/ibm-research/acp_bench
下载链接
链接失效反馈官方服务:
资源简介:
# ACP Bench
<p align="center">
<a href="https://ibm.github.io/ACPBench" target="_blank">🏠 Homepage</a> •
<a href="https://doi.org/10.1609/aaai.v39i25.34857" target="_blank">📄 Paper</a> •
<a href="https://arxiv.org/abs/2503.24378" target="_blank">📄 Paper</a>
</p>
ACPBench is a benchmark dataset designed to evaluate the reasoning capabilities of large language models (LLMs) in the context of Action, Change, and Planning. It spans 13 diverse domains:
* Blocksworld
* Logistics
* Grippers
* Grid
* Ferry
* FloorTile
* Rovers
* VisitAll
* Depot
* Goldminer
* Satellite
* Swap
* Alfworld
## Task Types in ACPBench
ACPBench includes the following 8 reasoning tasks:
1. Action Applicability (app)
2. Progression (prog)
3. Atom Reachability (reach)
4. Validation (val)
5. Action Reachability (areach)
6. Justification (just)
7. Landmarks (land)
8. Next Action (nexta)
## Task Formats
The first 7 tasks are available in:
* Boolean (yes/no) format
* Multiple-choice format
* Generative format
The Next Action task is provided only in generative format.
## Access
Development and test sets are available for download via:
* ACPBench GitHub Repository
* Hugging Face Dataset Hub
```
@inproceedings{KokelKSS25ACP
author = {Harsha Kokel and
Michael Katz and
Kavitha Srinivas and
Shirin Sohrabi},
title = {ACPBench: Reasoning about Action, Change, and Planning},
booktitle = {{AAAI}},
publisher = {{AAAI} Press},
year = {2025}
url = {https://doi.org/10.1609/aaai.v39i25.34857}
}
```
```
@misc{KokelKSS25ACPHard,
title = {ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning},
author = {Harsha Kokel and
Michael Katz and
Kavitha Srinivas and
Shirin Sohrabi},
year = {2025},
eprint = {2503.24378},
archivePrefix = {arXiv},
primaryClass = {cs.AI},
url = {https://arxiv.org/abs/2503.24378},
}
```
# ACP Bench
<p align="center">
<a href="https://ibm.github.io/ACPBench" target="_blank">🏠 项目主页</a> •
<a href="https://doi.org/10.1609/aaai.v39i25.34857" target="_blank">📄 论文</a> •
<a href="https://arxiv.org/abs/2503.24378" target="_blank">📄 论文</a>
</p>
ACPBench是一款专为评估大语言模型(Large Language Model,LLM)在行动、变化与规划场景下推理能力而构建的基准数据集,涵盖13个多样化领域:
* 积木世界(Blocksworld)
* 物流规划(Logistics)
* 抓取器领域(Grippers)
* 网格世界(Grid)
* 渡轮任务(Ferry)
* 地砖任务(FloorTile)
* 漫游车任务(Rovers)
* 全访问任务(VisitAll)
* 仓库任务(Depot)
* 金矿矿工任务(Goldminer)
* 卫星任务(Satellite)
* 交换任务(Swap)
* Alfworld(Alfworld)
## ACPBench中的任务类型
ACPBench包含以下8类推理任务:
1. 行动适用性(app)
2. 状态演进(prog)
3. 原子可达性(reach)
4. 有效性验证(val)
5. 行动可达性(areach)
6. 合理性论证(just)
7. 关键节点(land)
8. 下一行动预测(nexta)
## 任务格式
前7项任务支持以下三种格式:
* 布尔型(是/否)格式
* 多项选择格式
* 生成式格式
下一行动预测任务仅提供生成式格式。
## 数据集获取
开发集与测试集可通过以下渠道下载:
* ACPBench GitHub 仓库
* Hugging Face 数据集枢纽(Hugging Face Dataset Hub)
@inproceedings{KokelKSS25ACP
author = {Harsha Kokel and
Michael Katz and
Kavitha Srinivas and
Shirin Sohrabi},
title = {ACPBench: Reasoning about Action, Change, and Planning},
booktitle = {{AAAI}},
publisher = {{AAAI} Press},
year = {2025}
url = {https://doi.org/10.1609/aaai.v39i25.34857}
}
@misc{KokelKSS25ACPHard,
title = {ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning},
author = {Harsha Kokel and
Michael Katz and
Kavitha Srinivas and
Shirin Sohrabi},
year = {2025},
eprint = {2503.24378},
archivePrefix = {arXiv},
primaryClass = {cs.AI},
url = {https://arxiv.org/abs/2503.24378},
}
提供机构:
maas
创建时间:
2025-10-03



