five

tharindu/SOLID

收藏
Hugging Face2022-01-03 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/tharindu/SOLID
下载链接
链接失效反馈
官方服务:
资源简介:
# SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification The widespread use of offensive content in social media has led to an abundance of research in detecting language such as hate speech, cyberbullying, and cyber-aggression. Recent work presented the OLID dataset, which follows a taxonomy for offensive language identification that provides meaningful information for understanding the type and the target of offensive messages. However, it is limited in size and it might be biased towards offensive language as it was collected using keywords. In this work, we present SOLID, an expanded dataset, where the tweets were collected in a more principled manner. SOLID contains over nine million English tweets labelled in a semisupervised fashion. If you are using this dataset, please cite the following paper. ```bibtex @inproceedings{rosenthal-etal-2021-solid, title = "{SOLID}: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification", author = "Rosenthal, Sara and Atanasova, Pepa and Karadzhov, Georgi and Zampieri, Marcos and Nakov, Preslav", booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.findings-acl.80", doi = "10.18653/v1/2021.findings-acl.80", pages = "915--928", } ```
提供机构:
tharindu
原始信息汇总

数据集概述

数据集名称

  • 名称: SOLID
  • 全称: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification

数据集描述

  • 目的: 用于识别社交媒体中的攻击性语言,如仇恨言论、网络欺凌和网络攻击。
  • 特点: 包含超过九百万条英语推文,采用半监督方式进行标注。

数据集来源

  • 收集方法: 采用更系统的方法收集推文,不同于以往使用关键词收集的方式。

引用信息

  • 引用文献: bibtex @inproceedings{rosenthal-etal-2021-solid, title = "{SOLID}: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification", author = "Rosenthal, Sara and Atanasova, Pepa and Karadzhov, Georgi and Zampieri, Marcos and Nakov, Preslav", booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.findings-acl.80", doi = "10.18653/v1/2021.findings-acl.80", pages = "915--928", }
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作