Café Scenario Evaluation Dataset
收藏arXiv2025-09-30 收录
下载链接:
https://dids-ei.github.io/Project/LLM-OBTEA/
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了100条指令及其相应的预期目标,这些指令被分为三个难度级别(简单、中等、困难),旨在评估大型语言模型(LLMs)的意图理解能力。此外,数据集还包含了评估LLMs性能的语法准确性和解释准确性指标。规模上,这100条指令按照难度分为三个级别。任务内容涉及目标解释和行为规划评估。
This dataset contains 100 instructions and their corresponding expected targets, which are categorized into three difficulty levels: easy, medium, and hard, aiming to evaluate the intent understanding capability of Large Language Models (LLMs). In addition, the dataset provides metrics for assessing the grammatical accuracy and explanatory accuracy of LLMs' performance. In terms of scale, the 100 instructions are divided into the three aforementioned difficulty levels. The task content covers target explanation and behavior planning evaluation.



