# CodeFuseEval
## Dataset Description
[CodeFuseEval](https://github.com/codefuse-ai/codefuse-evaluation) is a benchmark for evaluating the multilingual ability of code generative models. It consists of 820 high-quality human-crafted data samples (each with test cases) in Python, C++, Java, JavaScript, and Go, and can be used for various tasks, such as code generation and translation.
## Languages
The dataset contains coding problems in 4 programming tasks: CodeCompletion, NL2Code, CodeTranslation and CodeDataScience.
## Dataset Structure
To load the dataset you need to specify a subset name among the subdataset names
```['humaneval_python', 'humaneval_python_cn', 'humaneval_js', 'humaneval_java', 'humaneval_go', 'humaneval_rust', 'humaneval_cpp', 'mbpp', 'codeTrans_python_to_java', 'codeTrans_python_to_cpp', 'codeTrans_cpp_to_java', 'codeTrans_cpp_to_python', 'codeTrans_java_to_python', 'codeTrans_java_to_cpp', 'codeCompletion_matplotlib', 'codeCompletion_numpy', 'codeCompletion_pandas', 'codeCompletion_pytorch', 'codeCompletion_scipy', 'codeCompletion_sklearn', 'codeCompletion_tensorflow', 'codeInsertion_matplotlib', 'codeInsertion_numpy', 'codeInsertion_pandas', 'codeInsertion_pytorch', 'codeInsertion_scipy', 'codeInsertion_sklearn', 'codeInsertion_tensorflow']```.
By default `humaneval_python` is loaded.
```python
from datasets import load_dataset
load_dataset("codefuse-ai/CodeFuseEval", "humaneval_python")
```
## Data Fields
Different subdataset has different fields. You can check the fields of each subdataset by calling `dataset["test"].features`. For example, for `humaneval_python`:
* ``task_id``: indicates the target language and ID of the problem. Language is one of ["Python", "Java", "JavaScript", "CPP", "Go"].
* ``prompt``: the function declaration and docstring, used for code generation.
* ``declaration``: only the function declaration, used for code translation.
* ``canonical_solution``: human-crafted example solutions.
* ``test``: hidden test samples, used for evaluation.
* ``example_test``: public test samples (appeared in prompt), used for evaluation.
## Data Splits
Each subset has one split: test.
## Citation Information
Refer to https://github.com/codefuse-ai/codefuse-evaluation.
# CodeFuseEval
## 数据集简介
[CodeFuseEval](https://github.com/codefuse-ai/codefuse-evaluation) 是一款用于评估代码生成模型多语言能力的基准测试集。该数据集涵盖Python、C++、Java、JavaScript及Go五种编程语言的820条高质量人工撰写的数据样本(每条均附带测试用例),可适用于代码生成、代码翻译等多种任务场景。
## 任务类型
该数据集涵盖4类编程任务下的编码问题:代码补全(CodeCompletion)、自然语言转代码(NL2Code)、代码翻译(CodeTranslation)以及代码数据科学(CodeDataScience)。
## 数据集结构
加载该数据集时,需从以下子数据集名称中指定所需子集:
['humaneval_python', 'humaneval_python_cn', 'humaneval_js', 'humaneval_java', 'humaneval_go', 'humaneval_rust', 'humaneval_cpp', 'mbpp', 'codeTrans_python_to_java', 'codeTrans_python_to_cpp', 'codeTrans_cpp_to_java', 'codeTrans_cpp_to_python', 'codeTrans_java_to_python', 'codeTrans_java_to_cpp', 'codeCompletion_matplotlib', 'codeCompletion_numpy', 'codeCompletion_pandas', 'codeCompletion_pytorch', 'codeCompletion_scipy', 'codeCompletion_sklearn', 'codeCompletion_tensorflow', 'codeInsertion_matplotlib', 'codeInsertion_numpy', 'codeInsertion_pandas', 'codeInsertion_pytorch', 'codeInsertion_scipy', 'codeInsertion_sklearn', 'codeInsertion_tensorflow']
默认加载的子集为`humaneval_python`。
python
from datasets import load_dataset
load_dataset("codefuse-ai/CodeFuseEval", "humaneval_python")
## 数据字段
不同子数据集的数据字段存在差异。你可通过调用`dataset["test"].features`查看各子集的字段详情。以`humaneval_python`子集为例:
* ``task_id``:表示问题对应的目标编程语言与唯一标识,编程语言可选值为["Python", "Java", "JavaScript", "CPP", "Go"]。
* ``prompt``:函数声明与文档字符串,用于代码生成任务。
* ``declaration``:仅包含函数声明部分,用于代码翻译任务。
* ``canonical_solution``:人工撰写的标准示例解决方案。
* ``test``:用于模型评估的隐藏测试样本。
* ``example_test``:出现在提示词中的公开测试样本,可用于模型评估。
## 数据划分
每个子集仅包含一个数据划分:test(测试集)。
## 引用信息
引用详情请参考 https://github.com/codefuse-ai/codefuse-evaluation。