acp_bench

Name: acp_bench
Creator: maas
Published: 2025-11-27 16:50:04
License: 暂无描述

魔搭社区2025-11-27 更新2025-10-04 收录

下载链接：

https://modelscope.cn/datasets/ibm-research/acp_bench

下载链接

链接失效反馈

官方服务：

资源简介：

# ACP Bench <p align="center"> <a href="https://ibm.github.io/ACPBench" target="_blank">🏠 Homepage</a> • <a href="https://doi.org/10.1609/aaai.v39i25.34857" target="_blank">📄 Paper</a> • <a href="https://arxiv.org/abs/2503.24378" target="_blank">📄 Paper</a> </p> ACPBench is a benchmark dataset designed to evaluate the reasoning capabilities of large language models (LLMs) in the context of Action, Change, and Planning. It spans 13 diverse domains: * Blocksworld * Logistics * Grippers * Grid * Ferry * FloorTile * Rovers * VisitAll * Depot * Goldminer * Satellite * Swap * Alfworld ## Task Types in ACPBench ACPBench includes the following 8 reasoning tasks: 1. Action Applicability (app) 2. Progression (prog) 3. Atom Reachability (reach) 4. Validation (val) 5. Action Reachability (areach) 6. Justification (just) 7. Landmarks (land) 8. Next Action (nexta) ## Task Formats The first 7 tasks are available in: * Boolean (yes/no) format * Multiple-choice format * Generative format The Next Action task is provided only in generative format. ## Access Development and test sets are available for download via: * ACPBench GitHub Repository * Hugging Face Dataset Hub ``` @inproceedings{KokelKSS25ACP author = {Harsha Kokel and Michael Katz and Kavitha Srinivas and Shirin Sohrabi}, title = {ACPBench: Reasoning about Action, Change, and Planning}, booktitle = {{AAAI}}, publisher = {{AAAI} Press}, year = {2025} url = {https://doi.org/10.1609/aaai.v39i25.34857} } ``` ``` @misc{KokelKSS25ACPHard, title = {ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning}, author = {Harsha Kokel and Michael Katz and Kavitha Srinivas and Shirin Sohrabi}, year = {2025}, eprint = {2503.24378}, archivePrefix = {arXiv}, primaryClass = {cs.AI}, url = {https://arxiv.org/abs/2503.24378}, } ```

# ACP Bench <p align="center"> <a href="https://ibm.github.io/ACPBench" target="_blank">🏠 项目主页</a> • <a href="https://doi.org/10.1609/aaai.v39i25.34857" target="_blank">📄 论文</a> • <a href="https://arxiv.org/abs/2503.24378" target="_blank">📄 论文</a> </p> ACPBench是一款专为评估大语言模型（Large Language Model，LLM）在行动、变化与规划场景下推理能力而构建的基准数据集，涵盖13个多样化领域： * 积木世界（Blocksworld） * 物流规划（Logistics） * 抓取器领域（Grippers） * 网格世界（Grid） * 渡轮任务（Ferry） * 地砖任务（FloorTile） * 漫游车任务（Rovers） * 全访问任务（VisitAll） * 仓库任务（Depot） * 金矿矿工任务（Goldminer） * 卫星任务（Satellite） * 交换任务（Swap） * Alfworld（Alfworld） ## ACPBench中的任务类型 ACPBench包含以下8类推理任务： 1. 行动适用性（app） 2. 状态演进（prog） 3. 原子可达性（reach） 4. 有效性验证（val） 5. 行动可达性（areach） 6. 合理性论证（just） 7. 关键节点（land） 8. 下一行动预测（nexta） ## 任务格式前7项任务支持以下三种格式： * 布尔型（是/否）格式 * 多项选择格式 * 生成式格式下一行动预测任务仅提供生成式格式。 ## 数据集获取开发集与测试集可通过以下渠道下载： * ACPBench GitHub 仓库 * Hugging Face 数据集枢纽（Hugging Face Dataset Hub） @inproceedings{KokelKSS25ACP author = {Harsha Kokel and Michael Katz and Kavitha Srinivas and Shirin Sohrabi}, title = {ACPBench: Reasoning about Action, Change, and Planning}, booktitle = {{AAAI}}, publisher = {{AAAI} Press}, year = {2025} url = {https://doi.org/10.1609/aaai.v39i25.34857} } @misc{KokelKSS25ACPHard, title = {ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning}, author = {Harsha Kokel and Michael Katz and Kavitha Srinivas and Shirin Sohrabi}, year = {2025}, eprint = {2503.24378}, archivePrefix = {arXiv}, primaryClass = {cs.AI}, url = {https://arxiv.org/abs/2503.24378}, }

提供机构：

maas

创建时间：

2025-10-03

5,000+

优质数据集

54 个

任务类型

进入经典数据集