henryen/hwe-bench
收藏Hugging Face2026-04-23 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/henryen/hwe-bench
下载链接
链接失效反馈官方服务:
资源简介:
HWE-bench是一个用于评估LLM代理在真实世界硬件错误修复任务上的基准。它包含来自六个开源硬件存储库的417个案例,涵盖Verilog、SystemVerilog和Chisel项目。每个案例都是一个从失败到通过的任务:提供的测试在错误基线中失败,并在应用真实修复后通过。数据集包括评估脚本、Docker镜像说明和代理运行代码。
HWE-bench is a benchmark for evaluating LLM agents on real-world hardware bug repair tasks. It contains 417 cases from six open-source hardware repositories covering Verilog, SystemVerilog, and Chisel projects. Each case is a fail-to-pass task: the provided test fails on the buggy baseline and passes after the ground-truth fix. Evaluation scripts, Docker image instructions, and agent-running code are available in the project repository.
提供机构:
henryen



