NLBSE'24 Tool Competition dataset
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/nlbse2024/code-comment-classification
下载链接
链接失效反馈官方服务:
资源简介:
该数据集为三种编程语言(Python、Java和Pharo)提供了二进制的评论分类数据,涵盖了19个类别,每个类别对应一个分类器。此外,该数据集经过了模型选择的处理,并采用了分层抽样的方法进行数据划分。规模上,这是一组跨越多种语言和类别的二分类数据,其任务是对评论进行分类。
This dataset provides binary comment classification data for three programming languages: Python, Java, and Pharo, covering 19 distinct categories, with one classifier corresponding to each category. Furthermore, this dataset has undergone model selection processing, and stratified sampling was adopted for data partitioning. In terms of scale, this is a set of binary classification data spanning multiple languages and categories, with the core task being comment classification.
提供机构:
NLBSE'24 Tool Competition



