tharindu/SOLID
收藏Hugging Face2022-01-03 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/tharindu/SOLID
下载链接
链接失效反馈官方服务:
资源简介:
# SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification
The widespread use of offensive content in social media has led to an abundance of research in detecting language such as hate speech, cyberbullying, and cyber-aggression. Recent work presented the OLID dataset, which follows a taxonomy for offensive language identification that provides meaningful information for understanding the type and the target of offensive messages. However, it is limited in size and it might be biased towards offensive language as it was collected using keywords. In this work, we present SOLID, an expanded dataset, where the tweets were collected in a more principled manner. SOLID contains over nine million English tweets labelled in a semisupervised fashion.
If you are using this dataset, please cite the following paper.
```bibtex
@inproceedings{rosenthal-etal-2021-solid,
title = "{SOLID}: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification",
author = "Rosenthal, Sara and
Atanasova, Pepa and
Karadzhov, Georgi and
Zampieri, Marcos and
Nakov, Preslav",
booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.findings-acl.80",
doi = "10.18653/v1/2021.findings-acl.80",
pages = "915--928",
}
```
提供机构:
tharindu
原始信息汇总
数据集概述
数据集名称
- 名称: SOLID
- 全称: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification
数据集描述
- 目的: 用于识别社交媒体中的攻击性语言,如仇恨言论、网络欺凌和网络攻击。
- 特点: 包含超过九百万条英语推文,采用半监督方式进行标注。
数据集来源
- 收集方法: 采用更系统的方法收集推文,不同于以往使用关键词收集的方式。
引用信息
- 引用文献: bibtex @inproceedings{rosenthal-etal-2021-solid, title = "{SOLID}: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification", author = "Rosenthal, Sara and Atanasova, Pepa and Karadzhov, Georgi and Zampieri, Marcos and Nakov, Preslav", booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.findings-acl.80", doi = "10.18653/v1/2021.findings-acl.80", pages = "915--928", }



