five

HiTZ/CONAN-EUS

收藏
Hugging Face2024-03-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/HiTZ/CONAN-EUS
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - text-generation language: - eu - es - en tags: - counternarratives - hate speech - multilinguality - LLMs - LLM pretty_name: conan_eus size_categories: - 10K<n<100K configs: - config_name: eu data_files: - split: train path: - data/eu/eu_train.csv - data/eu/eu_train_MT.csv - split: validation path: - data/eu/eu_val.csv - data/eu/eu_val_MT.csv - split: test path: - data/eu/eu_test.csv - data/eu/eu_test_MT.csv - config_name: es data_files: - split: train path: - data/es/es_train.csv - data/es/es_train_MT.csv - split: validation path: - data/es/es_val.csv - data/es/es_val_MT.csv - split: test path: - data/es/es_test.csv - data/es/es_test_MT.csv - config_name: en data_files: - split: train path: data/en/en_train.csv - split: validation path: data/en/en_val.csv - split: test path: data/en/en_test.csv --- **Content Warning**: This dataset contains examples of offensive language that do not reflect the authors’ views # CONAN-EUS: Basque and Spanish Parallel Counter Narratives Dataset CONAN-EUS was created by professionally translating all 6654 English HS-CN pairs of the original [CONAN](https://aclanthology.org/P19-1271.pdf) dataset into **Basque and Spanish**. For experimentation we generated train, validation and test splits in a way that no HS-CN pairs occurred across them. <table style="width:33%"> <tr> <th>CONAN-EUS Splits</th> <th>Total HS-CN Count</th> <tr> <td>train</td> <td>4833</td> </tr> <tr> <td>validation</td> <td>537</td> </tr> <tr> <td>test</td> <td>1278</td> </tr> </table> - 📖 Paper: [Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation](https://arxiv.org/abs/2403.09159) In LREC-COLING 2024. - 💻 Github Repo (Data and Code): [https://github.com/ixa-ehu/conan-e/](https://github.com/ixa-ehu/conan-e/) The CONAN (COunter NArratives through Nichesourcing) dataset was published by [Chung et al., 2019](https://aclanthology.org/P19-1271.pdf) and is publicly available in [https://github.com/marcoguerini/CONAN](https://github.com/marcoguerini/CONAN). ## Data This repository contains the Basque and Spanish CONAN-EUS data, in Machine Translated (MT) and post-edited forms. Furthermore, to facilitate experimentation we also share the generated splits of the original CONAN English data ([https://github.com/marcoguerini/CONAN](https://github.com/marcoguerini/CONAN)). + data/eu: train/val/test sets with both the MT post-edited Basque data + data/es: train/val/test sets with both the MT post edited Spanish data + data/en: train/val/test sets of the original CONAN English data ## HS-CN example The CONAN dataset includes HS-CN pairs, regarding the topic of Muslims and Islam. An example of a HS-CN pair in Basque, Spanish and English is illustrated below: | HS | CN | |-------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------| | Musulmanok ez dugu gure kultura aberastu dezakeen ezer erabilgarririk. | Zer egin dute guretzat musulmanek? Beno, kafea, erlojuak, kamerak, higienea, unibertsitateak, tresna kirurgikoak, mapak, musika, aljebra. | | Los musulmanes no tienen nada útil que pueda enriquecer nuestra cultura.| ¿Qué han hecho los musulmanes por nosotros? Bueno, café, relojes, cámaras, higiene, universidades, instrumentos quirúrgicos, mapas, música, álgebra.| | Muslims do not have anything useful that can enrich our culture.| What have Muslims ever done for us? Well, Coffee, Clocks, Cameras, Hygiene, Universities, Surgical Instruments, Maps, Music, Algebra.| If you use CONAN-EUS please **cite the following paper**: ## Citation ```bibtex @inproceedings{bengoetxea-et-al-2024, title={{B}asque and {S}panish {C}ounter {N}arrative {G}eneration: {D}ata {C}reation and {E}valuation}, author={Jaione Bengoetxea and Yi-Ling Chung and Marco Guerini and Rodrigo Agerri}, year={2024}, publisher = "Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING)", } ``` If you also use the English splits then you **should also cite the original CONAN paper**: ```bibtex @inproceedings{chung-etal-2019-conan, title = "{CONAN} - {CO}unter {NA}rratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech", author = "Chung, Yi-Ling and Kuzmenko, Elizaveta and Tekiroglu, Serra Sinem and Guerini, Marco", booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics", year = "2019", pages = "2819--2829" } ``` **Contact**: [Rodrigo Agerri](https://ragerri.github.io/) HiTZ Center - Ixa, University of the Basque Country UPV/EHU
提供机构:
HiTZ
原始信息汇总

CONAN-EUS: Basque and Spanish Parallel Counter Narratives Dataset

数据集概述

CONAN-EUS 是由专业翻译人员将原始的 CONAN 数据集中的所有 6654 个英语 HS-CN 对翻译成 Basque 和 Spanish 而创建的。为了实验,我们生成了训练、验证和测试集,确保没有 HS-CN 对跨集出现。

数据集结构

  • 任务类别: 文本生成
  • 语言: Basque, Spanish, English
  • 标签: counternarratives, hate speech, multilinguality, LLMs, LLM
  • 大小类别: 10K<n<100K

数据文件配置

  • Basque (eu)
    • 训练集: data/eu/eu_train.csv, data/eu/eu_train_MT.csv
    • 验证集: data/eu/eu_val.csv, data/eu/eu_val_MT.csv
    • 测试集: data/eu/eu_test.csv, data/eu/eu_test_MT.csv
  • Spanish (es)
    • 训练集: data/es/es_train.csv, data/es/es_train_MT.csv
    • 验证集: data/es/es_val.csv, data/es/es_val_MT.csv
    • 测试集: data/es/es_test.csv, data/es/es_test_MT.csv
  • English (en)
    • 训练集: data/en/en_train.csv
    • 验证集: data/en/en_val.csv
    • 测试集: data/en/en_test.csv

数据集拆分

拆分 总 HS-CN 对数
训练集 4833
验证集 537
测试集 1278

数据示例

CONAN 数据集包含关于穆斯林和伊斯兰主题的 HS-CN 对。以下是一个 Basque、Spanish 和 English 的 HS-CN 对示例:

HS CN
Musulmanok ez dugu gure kultura aberastu dezakeen ezer erabilgarririk. Zer egin dute guretzat musulmanek? Beno, kafea, erlojuak, kamerak, higienea, unibertsitateak, tresna kirurgikoak, mapak, musika, aljebra.
Los musulmanes no tienen nada útil que pueda enriquecer nuestra cultura. ¿Qué han hecho los musulmanes por nosotros? Bueno, café, relojes, cámaras, higiene, universidades, instrumentos quirúrgicos, mapas, música, álgebra.
Muslims do not have anything useful that can enrich our culture. What have Muslims ever done for us? Well, Coffee, Clocks, Cameras, Hygiene, Universities, Surgical Instruments, Maps, Music, Algebra.

引用

如果您使用 CONAN-EUS,请引用以下论文: bibtex @inproceedings{bengoetxea-et-al-2024, title={{B}asque and {S}panish {C}ounter {N}arrative {G}eneration: {D}ata {C}reation and {E}valuation}, author={Jaione Bengoetxea and Yi-Ling Chung and Marco Guerini and Rodrigo Agerri}, year={2024}, publisher = "Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING)", }

如果您也使用英语拆分,则还应引用原始的 CONAN 论文: bibtex @inproceedings{chung-etal-2019-conan, title = "{CONAN} - {CO}unter {NA}rratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech", author = "Chung, Yi-Ling and Kuzmenko, Elizaveta and Tekiroglu, Serra Sinem and Guerini, Marco", booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics", year = "2019", pages = "2819--2829" }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作