GitTaskBench
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/wanghuacan/RepoMaster
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为GitTaskBench,旨在评估大型语言模型代理在实际现实问题上的表现,这些问题涉及常见任务,并利用现有的代码仓库。该数据集包含18个代码仓库和54个跨越不同领域的任务。评估指标特别为这些多样化任务定制,包括执行完成率和任务通过率。其规模涵盖18个代码仓库和54个任务,主要任务是利用现有代码仓库评估大型语言模型代理在实践任务中的表现。
GitTaskBench is a dataset designed to evaluate the performance of large language model agents on real-world practical tasks that involve common workflows and leverage existing code repositories. It includes 18 code repositories and 54 tasks across diverse domains, with customized evaluation metrics tailored for these varied tasks, covering execution completion rate and task pass rate. With a total of 18 code repositories and 54 tasks, its core objective is to assess the performance of large language model agents on practical tasks using existing code repositories.



