Limitus/agentic-ai-eval

Name: Limitus/agentic-ai-eval
Creator: Limitus
Published: 2025-01-09 16:40:51
License: 暂无描述

Hugging Face2025-01-09 更新2025-04-26 收录

下载链接：

https://hf-mirror.com/datasets/Limitus/agentic-ai-eval

下载链接

链接失效反馈

官方服务：

资源简介：

Limitus开发的AI代理基准数据集，用于评估代理在处理具有不同步骤数量、持续时间和特定排序约束的多样化任务时的性能。数据集包含动态元素，如资源限制、工具依赖和时间敏感操作，以测试代理在面对变化场景时的适应性。性能通过多个指标进行评估，包括节点F1分数、工具F1分数、结构相似性指数、节点标签相似度和图编辑距离。

A benchmark dataset for the AI agent developed by Limitus, designed to evaluate the agents performance on a diverse set of tasks with varying numbers of steps, durations, and specific ordering constraints. The dataset includes dynamic elements such as resource limitations, tool dependencies, and time-sensitive operations to assess the agents adaptability to changing scenarios. Performance is evaluated using multiple metrics, including Node F1 Score, Tool F1 Score, Structural Similarity Index (SSI), Node Label Similarity, and Graph Edit Distance (GED).

提供机构：

Limitus

5,000+

优质数据集

54 个

任务类型

进入经典数据集