OpenMOSS-Team/ABC-Bench
收藏Hugging Face2026-01-20 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/OpenMOSS-Team/ABC-Bench
下载链接
链接失效反馈官方服务:
资源简介:
ABC-Bench是一个用于评估代码代理在后端编码任务中表现的基准测试。它测试代理是否能够探索真实仓库、编辑代码、配置环境、部署容器化服务,并通过外部端到端API测试(基于HTTP的集成测试)。数据集包含224个任务,来自127个MIT许可的仓库,涵盖8种语言和19种框架。其中92个任务需要自主环境配置和容器化服务启动。数据集通过ABC-Pipeline自动构建,最小化人工干预,支持可扩展的任务创建和未来扩展。即使是最先进的模型,其表现也远未达到完全可靠的水平。
ABC-Bench is a benchmark for Agentic Backend Coding. It evaluates whether code agents can explore real repositories, edit code, configure environments, deploy containerized services, and pass external end-to-end API tests (HTTP-based integration tests) across realistic backend stacks. The benchmark includes 224 tasks curated from 127 MIT-licensed repositories, spanning 8 languages and 19 frameworks. Among these, 92 tasks require autonomous environment configuration and containerized service startup. The dataset is built via ABC-Pipeline with minimal manual intervention, enabling scalable task creation and future expansions. Even state-of-the-art models remain far from fully reliable.
提供机构:
OpenMOSS-Team



