five

SINAI/CONAN-SP

收藏
Hugging Face2024-08-07 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/SINAI/CONAN-SP
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-sa-4.0 language: - es tags: - counternarrative - counter-speech pretty_name: CONAN-SP configs: - config_name: default data_files: - split: exp1 path: data/CONAN-SP_GPT3-exp1.csv - split: exp2 path: data/CONAN-SP_GPT3-exp2.csv - split: exp3 path: data/CONAN-SP_GPT3-exp3.csv --- ### Dataset Description **Paper**: [Automatic counter-narrative generation for hate speech in Spanish](http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/download/6556/3956) **Point of Contact**: mevallec@ujaen.es CONAN-SP is a a new dataset for the Spanish counter-narrative. It includes a hate-speech comment (HS) and the corresponding counter-narrative (CN). #### How is it constructed? CONAN-SP is based on CONAN-KN ([Yi-Ling Chung et al. , 2021](https://aclanthology.org/2021.findings-acl.79.pdf)). CONAN-KN consists of 195 HS-CN pairs covering multiple hate targets (islamophobia, misogyny, antisemitism, racism, and homophobia), provided along with the relevant knowledge automatically retrieved. Since CONAN-KN is in English, we use DeepL, an automatic translator tool to translate English pairs to Spanish. To construct CONAN-SP, we remove the pairs that contain duplicates of hate-speech texts and the examples used to calculate the agreement between annotators. The structure of CONAN-SP is the hate-speech provided by CONAN-KN and the counter-narrative texts generated by GPT-3.5 model. We do not apply any filter to the CN generated by GPT-3. Furthermore, we associated the target of the offensive comment with the hate speech and counter-narrative pair. To obtain the CN generated by GPT-3.5, we follow 3 different prompt strategies: - **Exp1: General prompt** task definition + 5 examples (1 for each target). - **Exp2: 5 Specific prompt** (1 for target) task definition + 3 examples for the same target. - **Exp3: General prompt** 5 examples (1 for each target) |Experiment | #Instances| |--|--| |Experiment 1| 84| |Experiment 2| 70| |Experiment 3| 84| Finally, we obtained 238 pairs of hate-speech and counter-narrative among the 3 experiments. All of these pairs are labeled by human annotators in different proposed metrics (Offensiveness, Stance, and Informativeness). ### Citation Information ```bibtex @article{Vallecillo2023, author = "Vallecillo, E. and Montejo, A. and Martín-Valdivia, M.T.", title = "{Automatic counter-narrative generation for hate speech in Spanish}", journal = "Procesamiento del Lenguaje Natural", year = 2023, volume = "71", number = "", pages = "", note = "", month = "" } ```
提供机构:
SINAI
原始信息汇总

数据集描述

数据集名称: CONAN-SP

联系人: mevallec@ujaen.es

数据集简介: CONAN-SP 是一个针对西班牙语反叙事的新数据集。它包含仇恨言论评论(HS)和相应的反叙事(CN)。

构建方式

CONAN-SP 基于 CONAN-KN(Yi-Ling Chung et al. , 2021)构建。CONAN-KN 包含 195 对 HS-CN 对,涵盖多个仇恨目标(伊斯兰恐惧症、厌女症、反犹太主义、种族主义和同性恋恐惧症),并提供自动检索的相关知识。由于 CONAN-KN 是英文的,我们使用 DeepL 自动翻译工具将英文对翻译成西班牙文。

在构建 CONAN-SP 时,我们移除了包含仇恨言论文本重复的配对以及用于计算注释者之间一致性的示例。CONAN-SP 的结构是 CONAN-KN 提供的仇恨言论和 GPT-3.5 模型生成的反叙事文本。我们不对 GPT-3 生成的 CN 应用任何过滤器。此外,我们将攻击性评论的目标与仇恨言论和反叙事对关联起来。

为了获得 GPT-3.5 生成的 CN,我们遵循了三种不同的提示策略:

  • Exp1: 通用提示 任务定义 + 5 个示例(每个目标 1 个)。
  • Exp2: 5 个特定提示(每个目标 1 个)任务定义 + 同一目标的 3 个示例。
  • Exp3: 通用提示 5 个示例(每个目标 1 个)
实验 实例数量
实验 1 84
实验 2 70
实验 3 84

最终,我们在三个实验中获得了 238 对仇恨言论和反叙事。所有这些对都由人工注释者在不同的提议指标(冒犯性、立场和信息性)上进行了标记。

许可信息

CONAN-SP 数据集在 Apache-2.0 许可 下发布。

引用信息

bibtex @article{Vallecillo2023, author = "Vallecillo, E. and Montejo, A. and Martín-Valdivia, M.T.", title = "{Automatic counter-narrative generation for hate speech in Spanish}", journal = "Procesamiento del Lenguaje Natural", year = 2023, volume = "71", number = "", pages = "", note = "", month = "" }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作