JohnDoeAliceBob/ExtraMaterial

Name: JohnDoeAliceBob/ExtraMaterial
Creator: JohnDoeAliceBob
Published: 2024-12-05 13:46:06
License: 暂无描述

Hugging Face2024-12-05 更新2024-12-14 收录

下载链接：

https://hf-mirror.com/datasets/JohnDoeAliceBob/ExtraMaterial

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集用于研究大型语言模型（LLMs）在不同语言中生成代码注释的能力。数据集分为五个子集，分别对应英语、中文、荷兰语、波兰语和希腊语，每个子集包含500条代码注释。数据集的构建过程包括从GitHub API获取数据、过滤文件长度、提取和验证注释等步骤。每个子集包含原始文件和相应的注释，以及每个模型的推理输出和评估结果。此外，数据集还包含各种度量指标，用于评估原始注释与生成注释的匹配程度。

This dataset is used to study how Large Language Models (LLMs) generate code comments in different languages. The dataset is divided into five subsets, each corresponding to English, Chinese, Dutch, Polish, and Greek, with each subset containing 500 code comments. The construction process of the dataset includes obtaining data from the GitHub API, filtering file lengths, extracting and verifying comments, and other steps. Each subset contains the original file and the corresponding comment, as well as the inference output and evaluation results of each model. Additionally, the dataset includes various metrics for evaluating the match between the original comment and the generated comment.

提供机构：

JohnDoeAliceBob

5,000+

优质数据集

54 个

任务类型

进入经典数据集