valurank/offensive-multi
收藏Hugging Face2022-10-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/valurank/offensive-multi
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: other
multilinguality:
- monolingual
size_categories:
- 10K<n<100K
source_datasets:
- derived
task_categories:
- text-classification
---
# Dataset Card for hate-multi
## Table of Contents
- [Table of Contents](#table-of-contents)
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Dataset Creation](#dataset-creation)
- [Source Data](#source-data)
## Dataset Description
### Dataset Summary
This dataset contains a collection of text labeled as offensive (class 1) or not (class 0).
## Dataset Creation
The dataset was creating by aggregating multiple publicly available datasets.
### Source Data
The following datasets were used:
* https://huggingface.co/datasets/hate_speech_offensive - Tweet text cleaned by lower casing, removing mentions and urls. Dropped instanced labeled as 'hate speech'
* https://sites.google.com/site/offensevalsharedtask/olid - Tweet text cleaned by lower casing, removing mentions and urls. Used 'subtask_a' column for labeling.
提供机构:
valurank
原始信息汇总
数据集卡片 for hate-multi
数据集描述
数据集概述
该数据集包含一系列文本,标记为冒犯性(类别1)或非冒犯性(类别0)。
数据集创建
该数据集是通过聚合多个公开可用的数据集创建的。
源数据
以下数据集被使用:
- https://huggingface.co/datasets/hate_speech_offensive - 通过小写化、移除提及和URL清理的推文文本。删除了标记为“仇恨言论”的实例。
- https://sites.google.com/site/offensevalsharedtask/olid - 通过小写化、移除提及和URL清理的推文文本。使用subtask_a列进行标记。



