OLID (Offensive Language Identification Dataset)

Name: OLID (Offensive Language Identification Dataset)
Creator: OpenDataLab
Published: 2026-05-24 09:30:25
License: 暂无描述

OpenDataLab2026-05-24 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/OLID

下载链接

链接失效反馈

官方服务：

资源简介：

OLID 是一个分层数据集，用于识别社交媒体中攻击性文本的类型和目标。该数据集在 Twitter 上收集并公开提供。总共有 14100 条推文，其中 13240 条在训练集中，860 条在测试集中。对于每条推文，标签分为三个级别：（A）攻击性/非攻击性，（B）有针对性的侮辱/无针对性，（C）个人/团体/其他。它们之间的关系是分层的。如果一条推文具有攻击性，它可以有目标，也可以没有目标。如果对特定目标具有攻击性，则目标可以是个人、群体或其他一些对象。该数据集用于 SemEval-2019 的 OffensEval-2019 竞赛。

OLID is a hierarchical dataset designed to identify the types and targets of aggressive text in social media. It is collected from Twitter and publicly accessible. The dataset comprises 14,100 Tweets in total, with 13,240 assigned to the training set and 860 to the test set. For each Tweet, the annotation labels follow a three-level hierarchical structure: (A) Aggressive/Non-Aggressive, (B) Targeted Insult/Non-Targeted, and (C) Individual/Group/Other. The hierarchical relationship between the labels is as follows: a Tweet labeled aggressive can be either targeted or non-targeted; if it is aggressive towards a specific target, the target can be categorized as an individual, a group, or other entities. This dataset was utilized for the OffensEval-2019 shared task at SemEval-2019.

提供机构：

OpenDataLab

创建时间：

2022-08-19

搜集汇总

数据集介绍