poojaruhal/Code-comment-classification
收藏Hugging Face2022-10-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/poojaruhal/Code-comment-classification
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为Code-comment-classification,主要用于代码注释的分类任务。数据集包含了从Java、Smalltalk和Python三种编程语言的开源项目中提取的类注释。数据集支持单标签和多标签文本分类任务,数据实例包括类名、注释内容以及分类类别等信息。数据集的创建目的是为了识别不同项目和编程语言中类注释所包含的信息。数据集的注释由四位具有至少四年编程经验的评估者完成,并通过10折交叉验证进行数据划分。
This dataset, named Code-comment-classification, is primarily designed for code comment classification tasks. It contains class comments extracted from open-source projects developed in three programming languages: Java, Smalltalk, and Python. The dataset supports both single-label and multi-label text classification tasks, with each data instance including information such as class name, comment content, and classification category. The purpose of creating this dataset is to identify the information contained in class comments across different projects and programming languages. All comments in the dataset were annotated by four evaluators with at least four years of programming experience, and the dataset was split using 10-fold cross-validation.
提供机构:
poojaruhal
原始信息汇总
数据集概述
数据集名称
- 名称: Code-comment-classification
数据集属性
- 语言: 英语 (en)
- 多语言性: 单语种 (monolingual)
- 许可证: cc-by-nc-sa-4.0
- 大小: 1K<n<10K
- 源数据: 原始数据 (original)
- 标签:
- 源代码注释
- Java类注释
- Python类注释
- Smalltalk类注释
- 任务类别: 文本分类 (text-classification)
- 任务ID:
- 意图分类 (intent-classification)
- 多标签分类 (multi-label-classification)
数据集内容
- 摘要: 该数据集包含从Java、Smalltalk和Python三种编程语言的多个大型和多样化的开源项目中提取的类注释。
- 支持的任务:
- 单标签文本分类
- 多标签文本分类
- 语言: Java, Python, Smalltalk
数据集结构
- 数据实例: 每个实例包含类名、类注释及其摘要等信息。
- 数据字段:
- class: 类名,包含语言扩展。
- comment: 类注释。
- categories: 分类,指示特定类型的信息。
- 数据分割: 10折交叉验证。
数据集创建
- 采集理由: 识别嵌入在不同项目和编程语言的类注释中的信息。
- 源数据: 数据集从Java、Smalltalk和Python的开源项目中提取。
- 注释:
- 注释过程: 由四位具有至少四年编程经验的评估者进行,每位评估者负责一部分数据,每个分类由三位评估者复核。
- 注释者: 文章作者。
- 个人和敏感信息: 文本中嵌入的作者信息。
附加信息
-
数据集创建者: Pooja Rani, Ivan, Manuel
-
许可证信息: cc-by-nc-sa-4.0
-
引用信息:
@article{RANI2021111047, title = {How to identify class comment types? A multi-language approach for class comment classification}, journal = {Journal of Systems and Software}, volume = {181}, pages = {111047}, year = {2021}, issn = {0164-1212}, doi = {https://doi.org/10.1016/j.jss.2021.111047}, url = {https://www.sciencedirect.com/science/article/pii/S0164121221001448}, author = {Pooja Rani and Sebastiano Panichella and Manuel Leuenberger and Andrea {Di Sorbo} and Oscar Nierstrasz}, keywords = {Natural language processing technique, Code comment analysis, Software documentation} }



