WebSuite
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/nat/natbot
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个旨在评估通用型网络代理在不同网络操作与任务上的性能的基准测试套件,重点关注识别失败的模式。此外,该基准测试套件易于扩展,可用于测试和分析更多网络代理的结果。该数据集在多个任务上进行了评估,每个任务运行了八次,以评估网络代理在单个任务以及端到端任务上的性能,从而识别特定的操作失败。
This dataset is a benchmark suite designed to evaluate the performance of general-purpose network agents across various network operations and tasks, with a primary focus on identifying failure patterns. Additionally, this benchmark suite is highly extensible, allowing for the testing and analysis of results from a broader range of network agents. The dataset has been evaluated across multiple tasks, with eight runs per task, to assess the performance of network agents on both individual tasks and end-to-end tasks, thus enabling the identification of specific operational failures.



