five

NiGuLa/Russian_Sensitive_Topics

收藏
Hugging Face2023-05-12 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/NiGuLa/Russian_Sensitive_Topics
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集用于分类敏感话题中的不当言论,涉及18个敏感话题,如恐同、政治、种族主义等。数据集最初版本在EACL-2021会议的Balto-Slavic NLP研讨会上发布,当前版本更大且经过过滤。

该数据集用于分类敏感话题中的不当言论,涉及18个敏感话题,如恐同、政治、种族主义等。数据集最初版本在EACL-2021会议的Balto-Slavic NLP研讨会上发布,当前版本更大且经过过滤。
提供机构:
NiGuLa
原始信息汇总

数据集概述

数据集语言

  • 俄语

数据集标签

  • 有毒评论分类

许可证

  • 创意共享署名-非商业性使用-相同方式共享 4.0 国际许可协议

任务类别

  • 文本分类

大小类别

  • 10,000 < 数据集大小 < 100,000

模型概念

  • 数据集关注于敏感话题,如恐同、政治、种族主义等,共涉及18个话题。

引用信息

@inproceedings{babakov-etal-2021-detecting, title = "Detecting Inappropriate Messages on Sensitive Topics that Could Harm a Company{}s Reputation", author = "Babakov, Nikolay and Logacheva, Varvara and Kozlova, Olga and Semenov, Nikita and Panchenko, Alexander", booktitle = "Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing", month = apr, year = "2021", address = "Kiyv, Ukraine", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2021.bsnlp-1.4", pages = "26--36", abstract = "Not all topics are equally {``}flammable{} in terms of toxicity: a calm discussion of turtles or fishing less often fuels inappropriate toxic dialogues than a discussion of politics or sexual minorities. We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labelling a dataset for appropriateness. While toxicity in user-generated data is well-studied, we aim at defining a more fine-grained notion of inappropriateness. The core of inappropriateness is that it can harm the reputation of a speaker. This is different from toxicity in two respects: (i) inappropriateness is topic-related, and (ii) inappropriate message is not toxic but still unacceptable. We collect and release two datasets for Russian: a topic-labelled dataset and an appropriateness-labelled dataset. We also release pre-trained classification models trained on this data.", }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作