five

m-a-p/CodeCriticBench

收藏
Hugging Face2025-11-02 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/m-a-p/CodeCriticBench
下载链接
链接失效反馈
官方服务:
资源简介:
CodeCriticBench是一个用于系统评估大型语言模型在代码生成和代码问答任务中批判能力的全面基准。它包含算法问题、调试子集,以及结合StackOverflow回复和多样化问题生成的代码问答任务。每个样本都有一系列精心设计的评估清单,涵盖10个不同标准。数据集样本根据难度分为三个级别:简单、中等和困难。

CodeCriticBench is a comprehensive benchmark designed to systematically evaluate the critique capabilities of large language models (LLMs) in both code generation and code-question answering tasks. It includes algorithmic problems, a specialized Debug subset, and code-question answering based on real-world programming scenarios combining StackOverflow responses and diverse question generation. Each sample is accompanied by a series of meticulously designed evaluation checklists covering 10 distinct criteria, and samples are categorized into three difficulty levels: Easy, Medium, and Hard.
提供机构:
m-a-p
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作