jie-jw-wu/HumanEvalComm
收藏Hugging Face2024-12-17 更新2024-12-21 收录
下载链接:
https://hf-mirror.com/datasets/jie-jw-wu/HumanEvalComm
下载链接
链接失效反馈官方服务:
资源简介:
HumanEvalComm是一个基准数据集,用于评估大型语言模型(LLMs)在代码生成任务中的沟通能力。该数据集基于广泛使用的HumanEval基准,包含762个修改后的问题描述,这些修改通过引入模糊性、不一致性和不完整性来触发澄清问题。数据集的目的是评估LLMs在面对不完整、不一致或模糊的编码问题要求时提出澄清问题的能力。数据集结构包括多个字段,每个字段对应不同类型的澄清问题。数据集的提示格式包括两轮,第一轮要求模型生成代码或提出澄清问题,第二轮则根据澄清问题的答案生成代码。
HumanEvalComm is a benchmark dataset for evaluating the communication skills of Large Language Models (LLMs) in code generation tasks. It is built upon the widely used HumanEval benchmark, containing 762 modified problem descriptions based on the 164 problems in the HumanEval dataset. The modifications include ambiguity, inconsistency, and incompleteness to trigger the model to ask clarifying questions. The dataset structure is the same as the HumanEval benchmark but adds specific type prompt fields. Each task has clear instructions and function signatures, with the model prompted for input in two rounds, the first round requiring the generation of Python3 code or asking clarifying questions, and the second round generating code based on clarifying questions and answers.
提供机构:
jie-jw-wu



