ibm-research/nestful
收藏Hugging Face2025-05-22 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/ibm-research/nestful
下载链接
链接失效反馈官方服务:
资源简介:
NESTFUL是一个用于评估大型语言模型(LLM)在嵌套API调用序列上的性能的基准数据集。该数据集包含超过1800个嵌套序列,主要来自数学推理和编码工具两个领域。数学推理部分来源于MathQA数据集,而编码部分来源于StarCoder2-Instruct数据集。数据集中的所有函数调用都是可执行的。每个样本包含用户查询、可用的工具目录、地面真实函数调用序列以及执行这些函数调用后的最终答案。
NESTFUL is a benchmark to evaluate LLMs on nested sequences of API calls. The dataset includes over 1800 nested sequences from two main areas: mathematical reasoning and coding tools. The mathematical reasoning portion is derived from the MathQA dataset, while the coding portion is derived from the StarCoder2-Instruct dataset. All function calls in the dataset are executable, and each sample consists of a user query, a catalog of available tools, a ground truth sequence of function calls, and the final answer after executing these function calls.
提供机构:
ibm-research



