s-nlp/ru_paradetox_content
收藏ParaDetox: Detoxification with Parallel Data (Russian)
数据集概述
- 许可证: openrail++
- 任务类别: 文本分类
- 语言: 俄语
数据集内容
- 数据集来源: 来自Russian Paradetox dataset的收集管道。
- 收集过程: 通过Yandex.Toloka众包平台进行,分为三个步骤:
- 任务1: 生成同义句,要求用户消除句子中的毒性同时保持内容不变。
- 任务2: 内容保持检查,展示生成的同义句及其原始版本,询问用户它们是否意义相近。
- 任务3: 毒性检查,检查工作者是否成功移除了毒性。
- 本仓库内容: 包含任务2: 内容保持检查的结果,样本的标记置信度大于等于90%。数据集中包含10,975对文本,其中2,812对为负例。
数据集规模
- 总对数: 10,975对
- 负例对数: 2,812对
引用信息
@inproceedings{logacheva-etal-2022-study, title = "A Study on Manual and Automatic Evaluation for Text Style Transfer: The Case of Detoxification", author = "Logacheva, Varvara and Dementieva, Daryna and Krotova, Irina and Fenogenova, Alena and Nikishina, Irina and Shavrina, Tatiana and Panchenko, Alexander", booktitle = "Proceedings of the 2nd Workshop on Human Evaluation of NLP Systems (HumEval)", month = may, year = "2022", address = "Dublin, Ireland", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.humeval-1.8", doi = "10.18653/v1/2022.humeval-1.8", pages = "90--101", abstract = "It is often difficult to reliably evaluate models which generate text. Among them, text style transfer is a particularly difficult to evaluate, because its success depends on a number of parameters.We conduct an evaluation of a large number of models on a detoxification task. We explore the relations between the manual and automatic metrics and find that there is only weak correlation between them, which is dependent on the type of model which generated text. Automatic metrics tend to be less reliable for better-performing models. However, our findings suggest that, ChrF and BertScore metrics can be used as a proxy for human evaluation of text detoxification to some extent.", }



