five

GitHub Issue and PR Comments Bot Identification Dataset

收藏
arXiv2021-01-19 更新2024-06-21 收录
下载链接:
http://doi.org/10.5281/zenodo.4000388
下载链接
链接失效反馈
官方服务:
资源简介:
本数据集由比利时蒙斯大学软件工程实验室创建,专注于GitHub上的Issue和PR评论,旨在通过手动分析识别机器人账户。数据集包含5000个独特的GitHub账户,其中527个被确认为机器人。该数据集用于开发和评估一个自动分类模型,以区分机器人和人类账户,主要通过分析账户的评论数量、模式和内容不平等性。此数据集的应用领域包括提高开源软件开发的安全性和效率,以及研究机器人对软件开发过程的影响。

This dataset was created by the Software Engineering Laboratory of the University of Mons, Belgium, with a focus on Issues and Pull Request (PR) comments on GitHub. Its objective is to identify bot accounts via manual analysis. The dataset comprises 5,000 unique GitHub accounts, 527 of which have been verified as bot accounts. This dataset is utilized for developing and evaluating an automated classification model to distinguish between bot and human accounts, mainly by analyzing the comment volume, patterns, and content inequality associated with these accounts. The application scope of this dataset includes improving the security and efficiency of open-source software development, as well as investigating the impact of bots on the software development process.
提供机构:
软件工程实验室
创建时间:
2020-10-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作