five

SE toxicity dataset

收藏
arXiv2020-09-20 更新2024-06-21 收录
下载链接:
https://github.com/WSU-SEAL/toxicity-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
SE toxicity dataset是由韦恩州立大学计算机科学系创建的大型数据集,旨在评估和改进软件工程领域的毒性检测工具。该数据集包含6533条代码审查评论和4140条Gitter消息,总计10673条记录。数据来源于Android、Chromium OS、LibreOffice和Ethereum等开源项目,通过手动标记和使用Google的Perspective API进行筛选。数据集的创建过程涉及手动标记和使用分层抽样策略。该数据集主要用于研究和开发针对软件工程领域的毒性检测工具,以提高软件开发社区的交流质量和效率。

The SE Toxicity Dataset was developed by the Department of Computer Science at Wayne State University as a large-scale dataset intended for evaluating and improving toxicity detection tools in the software engineering domain. This dataset comprises 6,533 code review comments and 4,140 Gitter messages, amounting to a total of 10,673 records. The data is sourced from open-source projects including Android, Chromium OS, LibreOffice, and Ethereum, and was filtered through manual annotation and Google's Perspective API. The dataset creation process involved manual annotation and the application of a stratified sampling strategy. This dataset is primarily used for research and development of toxicity detection tools targeting the software engineering field, with the aim of enhancing the communication quality and efficiency within software development communities.
提供机构:
韦恩州立大学计算机科学系
创建时间:
2020-09-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作