five

acp_bench

收藏
魔搭社区2025-11-27 更新2025-10-04 收录
下载链接:
https://modelscope.cn/datasets/ibm-research/acp_bench
下载链接
链接失效反馈
官方服务:
资源简介:
# ACP Bench <p align="center"> <a href="https://ibm.github.io/ACPBench" target="_blank">🏠 Homepage</a> • <a href="https://doi.org/10.1609/aaai.v39i25.34857" target="_blank">📄 Paper</a> • <a href="https://arxiv.org/abs/2503.24378" target="_blank">📄 Paper</a> </p> ACPBench is a benchmark dataset designed to evaluate the reasoning capabilities of large language models (LLMs) in the context of Action, Change, and Planning. It spans 13 diverse domains: * Blocksworld * Logistics * Grippers * Grid * Ferry * FloorTile * Rovers * VisitAll * Depot * Goldminer * Satellite * Swap * Alfworld ## Task Types in ACPBench ACPBench includes the following 8 reasoning tasks: 1. Action Applicability (app) 2. Progression (prog) 3. Atom Reachability (reach) 4. Validation (val) 5. Action Reachability (areach) 6. Justification (just) 7. Landmarks (land) 8. Next Action (nexta) ## Task Formats The first 7 tasks are available in: * Boolean (yes/no) format * Multiple-choice format * Generative format The Next Action task is provided only in generative format. ## Access Development and test sets are available for download via: * ACPBench GitHub Repository * Hugging Face Dataset Hub ``` @inproceedings{KokelKSS25ACP author = {Harsha Kokel and Michael Katz and Kavitha Srinivas and Shirin Sohrabi}, title = {ACPBench: Reasoning about Action, Change, and Planning}, booktitle = {{AAAI}}, publisher = {{AAAI} Press}, year = {2025} url = {https://doi.org/10.1609/aaai.v39i25.34857} } ``` ``` @misc{KokelKSS25ACPHard, title = {ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning}, author = {Harsha Kokel and Michael Katz and Kavitha Srinivas and Shirin Sohrabi}, year = {2025}, eprint = {2503.24378}, archivePrefix = {arXiv}, primaryClass = {cs.AI}, url = {https://arxiv.org/abs/2503.24378}, } ```

# ACP Bench <p align="center"> <a href="https://ibm.github.io/ACPBench" target="_blank">🏠 项目主页</a> • <a href="https://doi.org/10.1609/aaai.v39i25.34857" target="_blank">📄 论文</a> • <a href="https://arxiv.org/abs/2503.24378" target="_blank">📄 论文</a> </p> ACPBench是一款专为评估大语言模型(Large Language Model,LLM)在行动、变化与规划场景下推理能力而构建的基准数据集,涵盖13个多样化领域: * 积木世界(Blocksworld) * 物流规划(Logistics) * 抓取器领域(Grippers) * 网格世界(Grid) * 渡轮任务(Ferry) * 地砖任务(FloorTile) * 漫游车任务(Rovers) * 全访问任务(VisitAll) * 仓库任务(Depot) * 金矿矿工任务(Goldminer) * 卫星任务(Satellite) * 交换任务(Swap) * Alfworld(Alfworld) ## ACPBench中的任务类型 ACPBench包含以下8类推理任务: 1. 行动适用性(app) 2. 状态演进(prog) 3. 原子可达性(reach) 4. 有效性验证(val) 5. 行动可达性(areach) 6. 合理性论证(just) 7. 关键节点(land) 8. 下一行动预测(nexta) ## 任务格式 前7项任务支持以下三种格式: * 布尔型(是/否)格式 * 多项选择格式 * 生成式格式 下一行动预测任务仅提供生成式格式。 ## 数据集获取 开发集与测试集可通过以下渠道下载: * ACPBench GitHub 仓库 * Hugging Face 数据集枢纽(Hugging Face Dataset Hub) @inproceedings{KokelKSS25ACP author = {Harsha Kokel and Michael Katz and Kavitha Srinivas and Shirin Sohrabi}, title = {ACPBench: Reasoning about Action, Change, and Planning}, booktitle = {{AAAI}}, publisher = {{AAAI} Press}, year = {2025} url = {https://doi.org/10.1609/aaai.v39i25.34857} } @misc{KokelKSS25ACPHard, title = {ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning}, author = {Harsha Kokel and Michael Katz and Kavitha Srinivas and Shirin Sohrabi}, year = {2025}, eprint = {2503.24378}, archivePrefix = {arXiv}, primaryClass = {cs.AI}, url = {https://arxiv.org/abs/2503.24378}, }
提供机构:
maas
创建时间:
2025-10-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作