s-nlp/ru_non_detoxified
收藏ParaDetox: Detoxification with Parallel Data (Russian)
数据集概述
- 任务类别: 文本分类
- 语言: 俄语
- 许可证: openrail++
数据集内容
- 数据集名称: ParaDetox
- 数据收集平台: Yandex.Toloka
- 数据收集步骤:
- 任务1: 生成同义句,要求用户在不改变原意的情况下消除句子中的毒性。
- 任务2: 内容保持检查,展示生成的同义句及其原始版本,询问用户两者是否意义相近。
- 任务3: 毒性检查,检查工人是否成功移除了毒性。
- 数据集大小: 约11,446样本
- 特殊样本: 包含被标注者标记为无法解毒的样本,原因可能包括非毒性文本、毒性内容、不清晰文本。
引用信息
@inproceedings{logacheva-etal-2022-study, title = "A Study on Manual and Automatic Evaluation for Text Style Transfer: The Case of Detoxification", author = "Logacheva, Varvara and Dementieva, Daryna and Krotova, Irina and Fenogenova, Alena and Nikishina, Irina and Shavrina, Tatiana and Panchenko, Alexander", booktitle = "Proceedings of the 2nd Workshop on Human Evaluation of NLP Systems (HumEval)", month = may, year = "2022", address = "Dublin, Ireland", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.humeval-1.8", doi = "10.18653/v1/2022.humeval-1.8", pages = "90--101", abstract = "It is often difficult to reliably evaluate models which generate text. Among them, text style transfer is a particularly difficult to evaluate, because its success depends on a number of parameters.We conduct an evaluation of a large number of models on a detoxification task. We explore the relations between the manual and automatic metrics and find that there is only weak correlation between them, which is dependent on the type of model which generated text. Automatic metrics tend to be less reliable for better-performing models. However, our findings suggest that, ChrF and BertScore metrics can be used as a proxy for human evaluation of text detoxification to some extent.", }



