kogi-jwu/jhumaneval

Name: kogi-jwu/jhumaneval
Creator: kogi-jwu
Published: 2025-10-20 11:28:26
License: 暂无描述

Hugging Face2025-10-20 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/kogi-jwu/jhumaneval

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - ja - en license: mit size_categories: - n<1K source_datasets: - openai_humaneval task_categories: - text-generation dataset_info: config_name: jhumaneval features: - name: task_id dtype: string - name: prompt_en dtype: string - name: prompt dtype: string - name: entry_point dtype: string - name: canonical_solution dtype: string - name: test dtype: string splits: - name: test num_bytes: 275012 num_examples: 164 download_size: 125206 dataset_size: 275012 configs: - config_name: jhumaneval data_files: - split: test path: jhumaneval/test-* --- # Dataset Card for JHumanEval: Japanese Hand-Translated HumanEval ## Dataset Description - **Repository:** [GitHub Repository](https://github.com/KuramitsuLab/jhuman-eval) ## Dataset Summary This is a Japanese translated version of HumanEval, an evaluation harness for the HumanEval problem solving dataset described in the paper "Evaluating Large Language Models Trained on Code". LLM のコード生成能力の標準ベンチマーク HumanEval の日本語翻訳版です。機械翻訳(DeepL, GPT-4)の翻訳結果を全て人手によって再修正し、訳文を日本人のプログラマが読んで理解し、コードが書ける内容かチェックしました。ただし、英語版 HumanEval の間違いは、修正せずに残して、 HumanEval 同様に不完全なドキュメントからの生成能力を見るようになっています。日本語LLM のベンチマークとしてお使いください。 ## Languages The programming problems are written in Python and contain English and Japanese natural text in comments and docstrings. Python で書かれたプログラミング問題のデータセットには、英語と日本語のコメントやドキュメント文字列がそれぞれ別々に含まれています。 ## Dataset Structure ```python from datasets import load_dataset load_dataset("kogi-jwu/jhumaneval") DatasetDict({ test: Dataset({ features: ['task_id', 'prompt_en', 'prompt', 'entry_point', 'canonical_solution', 'test'], num_rows: 164 }) }) ``` ## Data Instances An example of a dataset instance: ``` { "task_id": "test/0", "prompt_en": "def return1():\n \"\"\"\n A simple function that returns the integer 1.\n \"\"\"\n", "prompt": "def return1():\n \"\"\"\n 整数1を返すシンプルな関数。\n \"\"\"\n", "canonical_solution": " return 1", "test": "def check(candidate):\n assert candidate() == 1", "entry_point": "return1" } ``` ## Data Fields - `task_id` : Unique identifier for a task. - `prompt_en` : Function header and English docstrings as model input. - `prompt` : Function header and Japanese docstrings, parallel to prompt_en. - `canonical_solution` : The expected function implementation. - `test` : Function to verify the correctness of generated code. - `entry_point` : Function name to initiate the test. ## Data Splits The dataset only consists of a test split with 164 samples. ## How to Use 参照コードで pass@1 を算出する例： ```python import os from datasets import load_dataset from evaluate import load os.environ["HF_ALLOW_CODE_EVAL"] = "1" ds = load_dataset("kogi-jwu/jhumaneval")['test'] code_eval = load("code_eval") candidates = [] test_cases = [] for d in ds: # FIXME: 参照コードをそのまま入れているが、予測コードに置き換えるべき candidates.append([d['prompt']+d['canonical_solution']]) # テストケースを実行可能な形式にする text_cases.append([d['test']+f"\n\ncheck({d['entry_point']})\n"]) pass_at_k, results = code_eval.compute(references=test_cases, predictions=candidates, k=[1]) print(pass_at_k) ``` ## Additional Information ### Licensing Information MIT License

提供机构：

kogi-jwu

原始信息汇总

数据集卡片 JHumanEval：日本语手工翻译的 HumanEval

数据集描述

数据集概述

这是一个 HumanEval 的日语翻译版本，HumanEval 是一个用于评估代码生成能力的数据集，描述在论文 "Evaluating Large Language Models Trained on Code" 中。

语言

编程问题以 Python 编写，包含英语和日语的自然文本，如注释和文档字符串。

数据集结构

加载数据集

python from datasets import load_dataset load_dataset("kogi-jwu/jhumaneval")

数据集示例

python DatasetDict({ test: Dataset({ features: [task_id, prompt_en, prompt, entry_point, canonical_solution, test], num_rows: 164 }) })

数据实例

json { "task_id": "test/0", "prompt_en": "def return1(): """ A simple function that returns the integer 1. """ ", "prompt": "def return1(): """ 整数1を返すシンプルな関数。 """ ", "canonical_solution": " return 1", "test": "def check(candidate): assert candidate() == 1", "entry_point": "return1" }

数据字段

task_id : 任务的唯一标识符。
prompt_en : 函数头和英语文档字符串，作为模型输入。
prompt : 函数头和日语文档字符串，与 prompt_en 并行。
canonical_solution : 预期的函数实现。
test : 验证生成代码正确性的函数。
entry_point : 启动测试的函数名。

数据分割

数据集仅包含一个测试分割，包含 164 个样本。

如何使用

计算 pass@1 的示例：

python import os from datasets import load_dataset from evaluate import load

os.environ["HF_ALLOW_CODE_EVAL"] = "1"

ds = load_dataset("kogi-jwu/jhumaneval")[test] code_eval = load("code_eval")

candidates = [] test_cases = []

for d in ds: candidates.append([d[prompt]+d[canonical_solution]]) test_cases.append([d[test]+f"

check({d[entry_point]}) "])

pass_at_k, results = code_eval.compute(references=test_cases, predictions=candidates, k=[1]) print(pass_at_k)

附加信息

许可信息

MIT 许可证

5,000+

优质数据集

54 个

任务类型

进入经典数据集