five

humaneval-x

收藏
魔搭社区2026-01-07 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/ZhipuAI/humaneval-x
下载链接
链接失效反馈
官方服务:
资源简介:
# HumanEval-X ## Dataset Description [HumanEval-X](https://github.com/THUDM/CodeGeeX) is a benchmark for evaluating the multilingual ability of code generative models. It consists of 820 high-quality human-crafted data samples (each with test cases) in Python, C++, Java, JavaScript, and Go, and can be used for various tasks, such as code generation and translation. ## Languages The dataset contains coding problems in 5 programming languages: Python, C++, Java, JavaScript, and Go. ## Dataset Structure To load the dataset you need to specify a subset among the 5 exiting languages `[python, cpp, go, java, js]`. By default `python` is loaded. ```python from datasets import load_dataset load_dataset("THUDM/humaneval-x", "js") DatasetDict({ test: Dataset({ features: ['task_id', 'prompt', 'declaration', 'canonical_solution', 'test', 'example_test'], num_rows: 164 }) }) ``` ```python next(iter(data["test"])) {'task_id': 'JavaScript/0', 'prompt': '/* Check if in given list of numbers, are any two numbers closer to each other than\n given threshold.\n >>> hasCloseElements([1.0, 2.0, 3.0], 0.5)\n false\n >>> hasCloseElements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n true\n */\nconst hasCloseElements = (numbers, threshold) => {\n', 'declaration': '\nconst hasCloseElements = (numbers, threshold) => {\n', 'canonical_solution': ' for (let i = 0; i < numbers.length; i++) {\n for (let j = 0; j < numbers.length; j++) {\n if (i != j) {\n let distance = Math.abs(numbers[i] - numbers[j]);\n if (distance < threshold) {\n return true;\n }\n }\n }\n }\n return false;\n}\n\n', 'test': 'const testHasCloseElements = () => {\n console.assert(hasCloseElements([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) === true)\n console.assert(\n hasCloseElements([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) === false\n )\n console.assert(hasCloseElements([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) === true)\n console.assert(hasCloseElements([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) === false)\n console.assert(hasCloseElements([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) === true)\n console.assert(hasCloseElements([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) === true)\n console.assert(hasCloseElements([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) === false)\n}\n\ntestHasCloseElements()\n', 'example_test': 'const testHasCloseElements = () => {\n console.assert(hasCloseElements([1.0, 2.0, 3.0], 0.5) === false)\n console.assert(\n hasCloseElements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3) === true\n )\n}\ntestHasCloseElements()\n'} ``` ## Data Fields * ``task_id``: indicates the target language and ID of the problem. Language is one of ["Python", "Java", "JavaScript", "CPP", "Go"]. * ``prompt``: the function declaration and docstring, used for code generation. * ``declaration``: only the function declaration, used for code translation. * ``canonical_solution``: human-crafted example solutions. * ``test``: hidden test samples, used for evaluation. * ``example_test``: public test samples (appeared in prompt), used for evaluation. ## Data Splits Each subset has one split: test. ## Citation Information Refer to https://github.com/THUDM/CodeGeeX.

# HumanEval-X ## 数据集描述 [HumanEval-X](https://github.com/THUDM/CodeGeeX) 是用于评估代码生成模型多语言能力的基准测试集。该数据集包含820份高质量的人工编写数据样本(每份均附带测试用例),涵盖Python、C++、Java、JavaScript及Go五种编程语言,可应用于代码生成、代码翻译等多种任务。 ## 支持语言 该数据集包含5种编程语言的编程问题:Python、C++、Java、JavaScript和Go。 ## 数据集结构 加载该数据集时,需从现有的5种语言子集`[python, cpp, go, java, js]`中指定其一,默认加载`python`子集。 python from datasets import load_dataset load_dataset("THUDM/humaneval-x", "js") DatasetDict({ test: Dataset({ features: ['task_id', 'prompt', 'declaration', 'canonical_solution', 'test', 'example_test'], num_rows: 164 }) }) python next(iter(data["test"])) {'task_id': 'JavaScript/0', 'prompt': '/* Check if in given list of numbers, are any two numbers closer to each other than given threshold. >>> hasCloseElements([1.0, 2.0, 3.0], 0.5) false >>> hasCloseElements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3) true */ const hasCloseElements = (numbers, threshold) => { ', 'declaration': ' const hasCloseElements = (numbers, threshold) => { ', 'canonical_solution': ' for (let i = 0; i < numbers.length; i++) { for (let j = 0; j < numbers.length; j++) { if (i != j) { let distance = Math.abs(numbers[i] - numbers[j]); if (distance < threshold) { return true; } } } } return false; } ', 'test': 'const testHasCloseElements = () => { console.assert(hasCloseElements([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) === true) console.assert( hasCloseElements([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) === false ) console.assert(hasCloseElements([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) === true) console.assert(hasCloseElements([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) === false) console.assert(hasCloseElements([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) === true) console.assert(hasCloseElements([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) === true) console.assert(hasCloseElements([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) === false) } testHasCloseElements() ', 'example_test': 'const testHasCloseElements = () => { console.assert(hasCloseElements([1.0, 2.0, 3.0], 0.5) === false) console.assert( hasCloseElements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3) === true ) } testHasCloseElements() '} ## 数据字段 * ``task_id``:标识问题的目标语言与编号,语言取值为["Python", "Java", "JavaScript", "CPP", "Go"]之一。 * ``prompt``:包含函数声明与文档字符串,用于代码生成任务。 * ``declaration``:仅包含函数声明部分,用于代码翻译任务。 * ``canonical_solution``:人工编写的标准示例解决方案。 * ``test``:隐藏测试用例集,用于模型评估。 * ``example_test``:公开测试用例集(已出现在prompt中),用于模型评估。 ## 数据划分 每个子集仅包含一个划分:test划分。 ## 引用信息 请参考 https://github.com/THUDM/CodeGeeX。
提供机构:
maas
创建时间:
2024-08-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作