valurank/offensive-multi

Name: valurank/offensive-multi
Creator: valurank
Published: 2022-10-25 09:57:14
License: 暂无描述

Hugging Face2022-10-25 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/valurank/offensive-multi

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: other multilinguality: - monolingual size_categories: - 10K<n<100K source_datasets: - derived task_categories: - text-classification --- # Dataset Card for hate-multi ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Dataset Creation](#dataset-creation) - [Source Data](#source-data) ## Dataset Description ### Dataset Summary This dataset contains a collection of text labeled as offensive (class 1) or not (class 0). ## Dataset Creation The dataset was creating by aggregating multiple publicly available datasets. ### Source Data The following datasets were used: * https://huggingface.co/datasets/hate_speech_offensive - Tweet text cleaned by lower casing, removing mentions and urls. Dropped instanced labeled as 'hate speech' * https://sites.google.com/site/offensevalsharedtask/olid - Tweet text cleaned by lower casing, removing mentions and urls. Used 'subtask_a' column for labeling.

提供机构：

valurank

原始信息汇总

数据集卡片 for hate-multi

数据集描述

数据集概述

该数据集包含一系列文本，标记为冒犯性（类别1）或非冒犯性（类别0）。

数据集创建

该数据集是通过聚合多个公开可用的数据集创建的。

源数据

以下数据集被使用：

https://huggingface.co/datasets/hate_speech_offensive - 通过小写化、移除提及和URL清理的推文文本。删除了标记为“仇恨言论”的实例。
https://sites.google.com/site/offensevalsharedtask/olid - 通过小写化、移除提及和URL清理的推文文本。使用subtask_a列进行标记。

5,000+

优质数据集

54 个

任务类型

进入经典数据集