LogicStar/BaxBench
收藏Hugging Face2025-02-19 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/LogicStar/BaxBench
下载链接
链接失效反馈官方服务:
资源简介:
BaxBench是一个用于衡量代码生成模型和代理生成正确和安全代码能力的编码基准。它由392个后端开发任务组成,这些任务是通过结合描述要实现的后端功能的28个场景和定义实现工具的14个后端框架构建的。为了评估解决方案的正确性和安全性,基准测试使用了端到端的功能测试和实际的攻击利用。数据集包含了重现我们论文中使用的评估提示所需的所有必要工件,还允许通过形成新的提示类型来测试不同的提示结构或模型。
BaxBench is a coding benchmark constructed to measure the ability of code generation models and agents to generate correct and secure code. It consists of 392 backend development tasks, which are constructed by combining 28 scenarios that describe the backend functionalities to implement and 14 backend frameworks defining the implementation tools. The benchmark uses end-to-end functional tests and practical security exploits to assess the correctness and security of the solutions. The dataset contains all necessary artifacts to reproduce the evaluation prompts used in our paper and enables testing different prompt structures or models by forming new prompt types.
提供机构:
LogicStar
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



