m-a-p/CodeCriticBench
收藏Hugging Face2025-11-02 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/m-a-p/CodeCriticBench
下载链接
链接失效反馈官方服务:
资源简介:
CodeCriticBench是一个用于系统评估大型语言模型在代码生成和代码问答任务中批判能力的全面基准。它包含算法问题、调试子集,以及结合StackOverflow回复和多样化问题生成的代码问答任务。每个样本都有一系列精心设计的评估清单,涵盖10个不同标准。数据集样本根据难度分为三个级别:简单、中等和困难。
CodeCriticBench is a comprehensive benchmark designed to systematically evaluate the critique capabilities of large language models (LLMs) in both code generation and code-question answering tasks. It includes algorithmic problems, a specialized Debug subset, and code-question answering based on real-world programming scenarios combining StackOverflow responses and diverse question generation. Each sample is accompanied by a series of meticulously designed evaluation checklists covering 10 distinct criteria, and samples are categorized into three difficulty levels: Easy, Medium, and Hard.
提供机构:
m-a-p



