Plancraft
收藏arXiv2025-09-30 收录
下载链接:
https://gautierdag.github.io/plancraft/
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为Plancraft,旨在通过文本和基于Minecraft制作界面的多模态接口,测试大型语言模型代理的规划能力。它不仅包含故意设计的无解示例,以挑战代理在决策制定上的能力,而且还对开源和闭源的大型语言模型与手工策划的规划器进行了基准测试。该数据集的任务是评估大型语言模型在决策制定和规划能力方面的表现。
The dataset named Plancraft is designed to test the planning capabilities of large language model (LLM) agents through a multimodal interface integrating text and Minecraft's crafting interface. It intentionally incorporates unsolvable examples to challenge agents' decision-making skills, and also conducts benchmark evaluations for both open-source and closed-source large language models as well as hand-crafted planners. The core task of this dataset is to evaluate the performance of large language models in decision-making and planning.



