RaymondLi/perturbed_humaneval

Name: RaymondLi/perturbed_humaneval
Creator: RaymondLi
Published: 2023-08-23 19:41:28
License: 暂无描述

Hugging Face2023-08-23 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/RaymondLi/perturbed_humaneval

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 --- # Dataset Card for Dataset Name ## Dataset Description - **Repository:** https://github.com/amazon-science/recode/tree/main - **Paper:** https://arxiv.org/abs/2212.10264 ### Dataset Summary The Recode benchmark proposes to apply code and natural language transformations to code-generation benchmarks to evaluate the robustness of code-generation models. This dataset contains the perturbed version of HumanEval that they released. It was automatically generated from the [HumanEval](https://huggingface.co/datasets/openai_humaneval) dataset. ### Subsets There are four transformation categories that form the subsets of this dataset: `func_name`, `nlaugmenter`, `natgen` and `format`. ### Languages The programming problems are written in Python and contains docstrings and comments in English. ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields - `task_id`: ID of the original HumanEval example - `prompt`: the perturbed prompt - `entry_point`: entry point for test - `canonical_solution`: solution for the problem in the `prompt` - `test`: contains function to test generated code for correctness - `seed`: seed of the perturbed prompt - `perturbation_name`: name of the perturbation - `partial`: partial solution to the problem. This field is only present for transformation categories that affect a partial solution: `natgen` and `format`. ### Data Splits The dataset only has a test split. ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information ``` @article{wang2022recode, title={ReCode: Robustness Evaluation of Code Generation Models}, author={Wang, Shiqi and Li, Zheng and Qian, Haifeng and Yang, Chenghao and Wang, Zijian and Shang, Mingyue and Kumar, Varun and Tan, Samson and Ray, Baishakhi and Bhatia, Parminder and others}, journal={arXiv preprint arXiv:2212.10264}, year={2022} } ``` ### Contributions [More Information Needed]

提供机构：

RaymondLi

原始信息汇总

数据集概述

数据集描述

名称: Recode Benchmark
目的: 评估代码生成模型的鲁棒性
来源: 自动生成的扰动版本HumanEval数据集
语言: Python编程语言，包含英文的docstrings和comments

数据集结构

数据字段

task_id: 原始HumanEval示例的ID
prompt: 扰动后的提示
entry_point: 测试入口点
canonical_solution: 问题的解决方案
test: 用于测试生成代码正确性的函数
seed: 扰动提示的种子
perturbation_name: 扰动名称
partial: 问题的部分解决方案（仅在natgen和format类别中存在）

数据分割

分割: 仅包含测试集

数据集创建

数据集子集

子集: 四个变换类别：func_name, nlaugmenter, natgen, format

数据来源

原始数据: HumanEval数据集
扰动生成: 自动生成

许可证

许可证: Apache-2.0

5,000+

优质数据集

54 个

任务类型

进入经典数据集