gso-bench/gso
收藏Hugging Face2025-08-28 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/gso-bench/gso
下载链接
链接失效反馈官方服务:
资源简介:
GSO是一个用于评估语言模型和LLM代理在开发高性能软件方面的能力的数据库。该数据库从10个流行的Python仓库中收集了102个软件优化任务。评估通过性能测试验证正确性,并使用专家的人类提交行为作为目标优化性能。数据集作为《GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents》的一部分发布。
GSO is a dataset for evaluating the capabilities of language models and LLM Agents in developing high-performance software. The dataset collects 102 software optimization tasks from 10 popular Python repositories. Evaluation is performed by performance tests verifying correctness and using an expert human commit behavior as the target optimization performance. The dataset was released as part of GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents.
提供机构:
gso-bench



