five

NoisyHate

收藏
arXiv2023-03-18 更新2024-06-21 收录
下载链接:
https://github.com/YiranYe/toxic-detection-testset
下载链接
链接失效反馈
官方服务:
资源简介:
NoisyHate数据集是由宾夕法尼亚州立大学创建,用于评估机器学习模型在处理社交媒体中人类编写的毒性内容扰动的能力。该数据集包含1707条从真实社交平台收集的人类编写的扰动样本,旨在帮助开发更有效的毒性言论检测模型。数据集创建过程中,使用了多种扰动策略,如字符重复、删除、特殊字符替换等,并通过人工评估确保扰动质量。NoisyHate数据集的应用领域主要集中在社交媒体内容监控,旨在解决现有模型在面对人类编写的扰动时的不足。

The NoisyHate dataset was created by Pennsylvania State University to evaluate the capability of machine learning models to handle human-written perturbed toxic content on social media. This dataset contains 1707 human-written perturbed samples collected from real social platforms, aiming to facilitate the development of more effective toxic speech detection models. During the dataset construction process, multiple perturbation strategies were adopted, including character repetition, deletion, special character replacement, and so on, and manual evaluation was performed to ensure the quality of the perturbed samples. The NoisyHate dataset is primarily applied in social media content monitoring, with the goal of addressing the limitations of existing models when confronted with human-written perturbed content.
提供机构:
宾夕法尼亚州立大学
创建时间:
2023-03-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作