zjunlp/Chat2Workflow-Evaluation
收藏Hugging Face2026-04-27 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/zjunlp/Chat2Workflow-Evaluation
下载链接
链接失效反馈官方服务:
资源简介:
Chat2Workflow是一个基准测试,旨在评估大型语言模型(LLM)从自然语言指令生成可执行视觉工作流的能力。该基准测试基于真实业务工作流构建,每个实例的设计使得生成的输出可以转换并直接部署到Dify和Coze等实际平台。它评估LLM在捕获高级意图、生成稳定可执行逻辑方面的能力,特别是在复杂或不断变化的需求下。
Chat2Workflow is a benchmark designed for evaluating the ability of Large Language Models (LLMs) to generate executable visual workflows from natural language instructions. The benchmark is built from a collection of real-world business workflows. Each instance is designed so that the generated output can be transformed and directly deployed to practical platforms like Dify and Coze. It evaluates LLMs on their capacity to capture high-level intent and produce stable, executable logic, particularly under complex or evolving requirements.
提供机构:
zjunlp



