JetBrains-Research/lca-commit-message-generation
收藏Hugging Face2025-01-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/JetBrains-Research/lca-commit-message-generation
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是用于提交消息生成任务的基准数据集,属于Long Code Arena基准的一部分。数据集是从CommitChronicle数据集中手动筛选的Python测试集,适用于较大的提交。数据集包含两个配置:默认配置和标签配置。默认配置包含提交的详细信息,如提交哈希、仓库、日期、许可证、提交消息和文件修改列表。标签配置则额外包含提交的标签和注释。所有仓库均使用宽松许可证发布,数据点可根据请求删除。
This dataset is a manually curated subset of the Python test set from the CommitChronicle dataset, tailored for larger commits. It includes two configurations: default and labels. The default configuration features include commit hash, repository, date, license, commit message, and modifications. The labels configuration adds a label and comment for each commit. The dataset is used for the Commit message generation task as part of the Long Code Arena benchmark. The dataset is licensed under Apache-2.0.
提供机构:
JetBrains-Research
原始信息汇总
数据集概述
数据集配置
默认配置
- 配置名称: default
- 特征:
hash: 字符串repo: 字符串date: 字符串license: 字符串message: 字符串mods: 列表change_type: 字符串old_path: 字符串new_path: 字符串diff: 字符串
- 拆分:
test: 包含163个样本
标签配置
- 配置名称: labels
- 特征:
hash: 字符串repo: 字符串date: 字符串license: 字符串message: 字符串label: 整数 (int8)comment: 字符串
- 拆分:
test: 包含858个样本,占用272359字节
数据文件
默认配置
- 拆分: test
- 路径: commitchronicle-py-long/test-*
标签配置
- 拆分: test
- 路径: commitchronicle-py-long-labels/test-*
许可证
- 类型: Apache-2.0



