NiGuLa/Russian_Sensitive_Topics

Name: NiGuLa/Russian_Sensitive_Topics
Creator: NiGuLa
Published: 2023-05-12 13:36:44
License: 暂无描述

Hugging Face2023-05-12 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/NiGuLa/Russian_Sensitive_Topics

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集用于分类敏感话题中的不当言论，涉及18个敏感话题，如恐同、政治、种族主义等。数据集最初版本在EACL-2021会议的Balto-Slavic NLP研讨会上发布，当前版本更大且经过过滤。

提供机构：

NiGuLa

原始信息汇总

数据集概述

数据集语言

俄语

数据集标签

有毒评论分类

许可证

创意共享署名-非商业性使用-相同方式共享 4.0 国际许可协议

任务类别

文本分类

大小类别

10,000 < 数据集大小 < 100,000

模型概念

数据集关注于敏感话题，如恐同、政治、种族主义等，共涉及18个话题。

引用信息

@inproceedings{babakov-etal-2021-detecting, title = "Detecting Inappropriate Messages on Sensitive Topics that Could Harm a Company{}s Reputation", author = "Babakov, Nikolay and Logacheva, Varvara and Kozlova, Olga and Semenov, Nikita and Panchenko, Alexander", booktitle = "Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing", month = apr, year = "2021", address = "Kiyv, Ukraine", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2021.bsnlp-1.4", pages = "26--36", abstract = "Not all topics are equally {``}flammable{} in terms of toxicity: a calm discussion of turtles or fishing less often fuels inappropriate toxic dialogues than a discussion of politics or sexual minorities. We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labelling a dataset for appropriateness. While toxicity in user-generated data is well-studied, we aim at defining a more fine-grained notion of inappropriateness. The core of inappropriateness is that it can harm the reputation of a speaker. This is different from toxicity in two respects: (i) inappropriateness is topic-related, and (ii) inappropriate message is not toxic but still unacceptable. We collect and release two datasets for Russian: a topic-labelled dataset and an appropriateness-labelled dataset. We also release pre-trained classification models trained on this data.", }

5,000+

优质数据集

54 个

任务类型

进入经典数据集