NeMoEval
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/microsoft/NeMoEval
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个旨在评估基于大型语言模型(LLM)的网络管理系统有效性的通用基准测试,它包含了黄金答案选择、结果评估和日志记录等组件。此外,该基准测试还包括由人类专家创建的黄金答案以及按复杂度级别分类的各种查询。规模方面,数据集包含一个含有5493个节点和6424条边的有向图,以及不同大小的合成通信图。任务方面,该数据集用于评估针对网络管理查询的大型语言模型生成的代码。
This dataset is a general-purpose benchmark intended to evaluate the efficacy of large language model (LLM)-based network management systems. It includes core components such as golden answer selection, result evaluation, and log recording. Furthermore, the benchmark provides golden answers developed by human experts, alongside a range of queries categorized by their complexity levels. Regarding scale, the dataset features a directed graph with 5,493 nodes and 6,424 edges, as well as synthetic communication graphs of varying sizes. This dataset is used to evaluate the code generated by large language models in response to network management queries.



