NoFunEval
收藏arXiv2025-09-30 收录
下载链接:
https://aka.ms/NoFunEval
下载链接
链接失效反馈官方服务:
资源简介:
该数据集为评估代码语言模型在功能性和非功能性需求方面的表现提供了一个基准,涵盖了代码编辑和分类任务。此外,它还包含了针对非功能性需求的定制评估指标和样本生成技术。该数据集对27个不同规模的代码语言模型进行了评估,这些模型的参数量从10亿到34亿不等,主要针对代码编辑和分类任务。
This dataset acts as a benchmark for evaluating the performance of code language models on both functional and non-functional requirements, encompassing code editing and classification tasks. Furthermore, it incorporates custom evaluation metrics and sample generation techniques specifically designed for non-functional requirements. This dataset has been used to evaluate 27 code language models of varying scales, with their parameter counts ranging from 1 billion to 3.4 billion, focusing primarily on code editing and classification tasks.
提供机构:
OpenAI



