YouTube Abusive Language Dataset

Name: YouTube Abusive Language Dataset
Creator: 伊利诺伊大学厄巴纳-香槟分校
Published: 2021-05-24 14:50:19
License: 暂无描述

arXiv2021-05-24 更新2024-06-21 收录

下载链接：

https://github.com/HongyuGong/Abusive-Language-Detection-Categorization

下载链接

链接失效反馈

官方服务：

资源简介：

YouTube Abusive Language Dataset是由伊利诺伊大学厄巴纳-香槟分校等机构创建的一个包含超过11,000条来自YouTube的评论数据集，专门用于研究网络中的辱骂性语言。该数据集通过精细的标注，区分了评论中的辱骂性和非辱骂性内容，并进一步细分为四个类别：性别与性取向、种族、外貌与个人特征、意识形态与宗教政治倾向。数据集的创建过程严格遵循理论指导，确保了标注的准确性和可靠性。该数据集主要用于开发和评估自动检测和分类辱骂性语言的算法，旨在解决网络平台上的语言滥用问题，提高用户体验和网络环境的健康度。

The YouTube Abusive Language Dataset is a dataset comprising over 11,000 YouTube comments, developed by institutions including the University of Illinois Urbana-Champaign for research on online abusive language. Through fine-grained annotation, it differentiates between abusive and non-abusive comment content, and further classifies abusive content into four subcategories: gender and sexual orientation, race, appearance and personal characteristics, as well as ideology, religion and political orientations. The dataset was constructed under strict theoretical guidance to guarantee the accuracy and reliability of its annotations. It is primarily utilized to develop and evaluate algorithms for automatic detection and classification of abusive language, with the goal of mitigating language abuse on online platforms, enhancing user experience and promoting a healthier online environment.

提供机构：

伊利诺伊大学厄巴纳-香槟分校

创建时间：

2021-05-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集