HiTZ/CONAN-EUS
收藏CONAN-EUS: Basque and Spanish Parallel Counter Narratives Dataset
数据集概述
CONAN-EUS 是由专业翻译人员将原始的 CONAN 数据集中的所有 6654 个英语 HS-CN 对翻译成 Basque 和 Spanish 而创建的。为了实验,我们生成了训练、验证和测试集,确保没有 HS-CN 对跨集出现。
数据集结构
- 任务类别: 文本生成
- 语言: Basque, Spanish, English
- 标签: counternarratives, hate speech, multilinguality, LLMs, LLM
- 大小类别: 10K<n<100K
数据文件配置
- Basque (eu)
- 训练集:
data/eu/eu_train.csv,data/eu/eu_train_MT.csv - 验证集:
data/eu/eu_val.csv,data/eu/eu_val_MT.csv - 测试集:
data/eu/eu_test.csv,data/eu/eu_test_MT.csv
- 训练集:
- Spanish (es)
- 训练集:
data/es/es_train.csv,data/es/es_train_MT.csv - 验证集:
data/es/es_val.csv,data/es/es_val_MT.csv - 测试集:
data/es/es_test.csv,data/es/es_test_MT.csv
- 训练集:
- English (en)
- 训练集:
data/en/en_train.csv - 验证集:
data/en/en_val.csv - 测试集:
data/en/en_test.csv
- 训练集:
数据集拆分
| 拆分 | 总 HS-CN 对数 |
|---|---|
| 训练集 | 4833 |
| 验证集 | 537 |
| 测试集 | 1278 |
数据示例
CONAN 数据集包含关于穆斯林和伊斯兰主题的 HS-CN 对。以下是一个 Basque、Spanish 和 English 的 HS-CN 对示例:
| HS | CN |
|---|---|
| Musulmanok ez dugu gure kultura aberastu dezakeen ezer erabilgarririk. | Zer egin dute guretzat musulmanek? Beno, kafea, erlojuak, kamerak, higienea, unibertsitateak, tresna kirurgikoak, mapak, musika, aljebra. |
| Los musulmanes no tienen nada útil que pueda enriquecer nuestra cultura. | ¿Qué han hecho los musulmanes por nosotros? Bueno, café, relojes, cámaras, higiene, universidades, instrumentos quirúrgicos, mapas, música, álgebra. |
| Muslims do not have anything useful that can enrich our culture. | What have Muslims ever done for us? Well, Coffee, Clocks, Cameras, Hygiene, Universities, Surgical Instruments, Maps, Music, Algebra. |
引用
如果您使用 CONAN-EUS,请引用以下论文: bibtex @inproceedings{bengoetxea-et-al-2024, title={{B}asque and {S}panish {C}ounter {N}arrative {G}eneration: {D}ata {C}reation and {E}valuation}, author={Jaione Bengoetxea and Yi-Ling Chung and Marco Guerini and Rodrigo Agerri}, year={2024}, publisher = "Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING)", }
如果您也使用英语拆分,则还应引用原始的 CONAN 论文: bibtex @inproceedings{chung-etal-2019-conan, title = "{CONAN} - {CO}unter {NA}rratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech", author = "Chung, Yi-Ling and Kuzmenko, Elizaveta and Tekiroglu, Serra Sinem and Guerini, Marco", booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics", year = "2019", pages = "2819--2829" }



