Dehumanizing Language Detection Dataset

Name: Dehumanizing Language Detection Dataset
Creator: 哥本哈根信息技术大学
Published: 2024-02-14 03:58:24
License: 暂无描述

arXiv2024-02-14 更新2024-08-06 收录

下载链接：

http://arxiv.org/abs/2402.08764v1

下载链接

链接失效反馈

官方服务：

资源简介：

本数据集名为‘Dehumanizing Language Detection Dataset’，由哥本哈根信息技术大学创建，旨在识别和分析去人性化语言。数据集包含两个部分：一个大规模自动收集的语料库和一个较小的人工标注数据集，两者均结合了政治话语和电影字幕对话。数据集内容丰富，涵盖多种去人性化表达，如针对特定群体的负面评价、否认代理、道德厌恶、动物隐喻和物化等。创建过程中，数据集通过关键词提取，包括目标群体、动物隐喻关键词和道德厌恶词汇。该数据集主要应用于自动分类和探索性分析去人性化模式，以解决在线滥用问题。

This dataset, named 'Dehumanizing Language Detection Dataset', was created by the IT University of Copenhagen to identify and analyze dehumanizing language. The dataset consists of two components: a large-scale automatically collected corpus and a smaller manually annotated dataset, both combining political discourse and movie subtitle dialogues. It encompasses a diverse range of dehumanizing expressions, including negative evaluations targeting specific demographic groups, denial of agency, moral disgust, animal metaphors, and objectification, among others. During its curation, the dataset was developed using keyword extraction techniques covering target group terms, animal metaphor-related keywords, and moral disgust lexicons. This dataset is primarily utilized for automatic classification and exploratory analysis of dehumanizing patterns to address online abuse problems.

提供机构：

哥本哈根信息技术大学

创建时间：

2024-02-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集