zjunlp/WorFBench_test
收藏Hugging Face2025-02-26 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/zjunlp/WorFBench_test
下载链接
链接失效反馈官方服务:
资源简介:
WorFBench是一个用于评估大型语言模型代理工作流生成能力的统一基准,包含多方面的场景和复杂的工作流图结构。该数据集旨在准确量化LLM代理的工作流生成能力,并通过子序列和子图匹配算法来评估。它还包括了一个系统性的评估协议WorFEval。
WorFBench is a unified benchmark for evaluating the workflow generation capabilities of large language model agents, featuring multifaceted scenarios and intricate graph workflow structures. The dataset is designed to accurately quantify the LLM agents workflow generation abilities using subsequence and subgraph matching algorithms, and includes a systemic evaluation protocol called WorFEval.
提供机构:
zjunlp



