five

Multilingual paired code and comment changes

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10138302
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset used for the master's thesis "LLMs for Code Comment Consistency." Covers the languages Go, Java, JavaScript, TypeScript, and Python. All data is mined from permissively-licensed GitHub public projects. This dataset consists of pairs of function/method code blocks and their documentation comments, before and after commits.Examples are labeled 0 if the comment was not changed before and after, and 1 if the comment was changed. For the purpose of comment consistency, that means a 1-labeled example has an old comment that is inconsistent with the new code.If you're training a code summarization or comment generation task, then of course ignore the classification label. All-22k contains the training, validation, and test set used in the models trained in the paper. The examples are balanced by language and between the positive and negative classes. Any code repository is only present in one of these sets.
创建时间:
2023-11-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作