five

Twitter上的仇恨言论检测基准数据集

收藏
arXiv2021-11-10 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2106.09775v3
下载链接
链接失效反馈
官方服务:
资源简介:
本数据集由德克萨斯大学奥斯汀分校的研究人员创建,旨在为Twitter上的仇恨言论检测提供一个更广泛的覆盖范围。数据集包含9667条推文,这些推文通过标准的信息检索技术(如池化和主动学习)智能且高效地选择进行标注。数据集的创建过程涉及使用多种机器学习模型对推文进行预测,并根据预测结果选择推文进行人工标注。此数据集的应用领域主要集中在解决社交媒体上的仇恨言论问题,通过提供更全面的数据支持,帮助开发更有效的仇恨言论检测模型。

This dataset was created by researchers at The University of Texas at Austin, aiming to provide broader coverage for hate speech detection tasks on Twitter. It contains 9,667 tweets that were intelligently and efficiently selected for manual annotation via standard information retrieval techniques such as pooling and active learning. The dataset development process utilized multiple machine learning models to generate predictions for the collected tweets, and selected samples for manual annotation based on these predictive outcomes. The primary application of this dataset centers on addressing the problem of hate speech on social media, by offering comprehensive data support to facilitate the development of more effective hate speech detection models.
提供机构:
德克萨斯大学奥斯汀分校
创建时间:
2021-06-18
二维码
社区交流群
二维码
科研交流群
商业服务