five

ToLD-Br (Toxic Language Dataset for Brazilian Portuguese)

收藏
arXiv2020-10-09 更新2024-06-21 收录
下载链接:
https://github.com/JAugusto97/ToLD-Br
下载链接
链接失效反馈
官方服务:
资源简介:
ToLD-Br是由圣卡塔琳娜联邦大学创建的一个大规模数据集,专门用于巴西葡萄牙语社交媒体中的有毒语言检测。该数据集包含21,000条推文,这些推文被手动标注为有毒或非有毒,以及不同类型的毒性。数据集的创建过程考虑了多样的群体覆盖,旨在通过减少群体偏见来提高数据集的平衡性。ToLD-Br的应用领域主要集中在自动识别社交媒体中的有毒评论,以帮助平台管理员和特定用户(如儿童)筛选内容,从而解决在线毒性扩散的问题。

ToLD-Br is a large-scale dataset developed by the Federal University of Santa Catarina, specifically tailored for toxic language detection in Brazilian Portuguese social media. It comprises 21,000 tweets that have been manually annotated as either toxic or non-toxic, with additional labels for different types of toxicity. The dataset's construction process takes diverse demographic coverage into account, aiming to improve dataset balance by mitigating group bias. Core application scenarios of ToLD-Br focus on automatically identifying toxic comments on social media, to assist platform administrators and vulnerable groups such as children in content filtering, thereby addressing the issue of online toxicity proliferation.
提供机构:
圣卡塔琳娜联邦大学
创建时间:
2020-10-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作