five

HiTZ/cometa

收藏
Hugging Face2024-04-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/HiTZ/cometa
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - token-classification language: - es pretty_name: CoMeta size_categories: - 1K<n<10K --- # 🪁 CoMeta <!-- Provide a quick summary of the dataset. --> CoMeta is a manually annotated dataset corpus for metaphor detection in Spanish consisting of 3633 sentences of texts of multiple domains. We believe that CoMeta is the largest publicly available dataset with metaphorical annotations in texts of general domain for the Spanish language. - **Repository:** Code and dataset in tabulated format available at https://github.com/ixa-ehu/cometa - **Paper:** [Leveraging a New Spanish Corpus for Multilingual and Cross-lingual Metaphor Detection](https://aclanthology.org/2022.conll-1.16/) ## Dataset Structure - **tokens:** list of text split. - **tags:** list of metaphor annotations for each token. - 0: literal - 1: metaphor ## Citation <!-- If there is a paper or blog post introducing the dataset, the APA and Bibtex information for that should go in this section. --> If you use CoMeta, please cite our work: ``` @inproceedings{sanchez-bayona-agerri-2022-leveraging, title = "Leveraging a New {S}panish Corpus for Multilingual and Cross-lingual Metaphor Detection", author = "Sanchez-Bayona, Elisa and Agerri, Rodrigo", editor = "Fokkens, Antske and Srikumar, Vivek", booktitle = "Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL)", month = dec, year = "2022", address = "Abu Dhabi, United Arab Emirates (Hybrid)", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.conll-1.16", doi = "10.18653/v1/2022.conll-1.16", pages = "228--240", abstract = "The lack of wide coverage datasets annotated with everyday metaphorical expressions for languages other than English is striking. This means that most research on supervised metaphor detection has been published only for that language. In order to address this issue, this work presents the first corpus annotated with naturally occurring metaphors in Spanish large enough to develop systems to perform metaphor detection. The presented dataset, CoMeta, includes texts from various domains, namely, news, political discourse, Wikipedia and reviews. In order to label CoMeta, we apply the MIPVU method, the guidelines most commonly used to systematically annotate metaphor on real data. We use our newly created dataset to provide competitive baselines by fine-tuning several multilingual and monolingual state-of-the-art large language models. Furthermore, by leveraging the existing VUAM English data in addition to CoMeta, we present the, to the best of our knowledge, first cross-lingual experiments on supervised metaphor detection. Finally, we perform a detailed error analysis that explores the seemingly high transfer of everyday metaphor across these two languages and datasets.", } ``` ## Dataset Card Contact {elisa.sanchez, rodrigo.agerri}@ehu.eus
提供机构:
HiTZ
原始信息汇总

CoMeta 数据集概述

基本信息

  • 许可证: Apache-2.0
  • 任务类别: 词元分类
  • 语言: 西班牙语
  • 数据集名称: CoMeta
  • 数据量: 1K<n<10K

数据集描述

CoMeta 是一个手动标注的西班牙语隐喻检测数据集,包含来自多个领域的 3633 个句子。该数据集是目前公开可用的最大的西班牙语通用领域文本隐喻标注数据集。

数据集结构

  • tokens: 文本分割列表。
  • tags: 每个词元的隐喻标注列表。
    • 0: 字面意义
    • 1: 隐喻

引用信息

如果您使用 CoMeta 数据集,请引用以下论文:

@inproceedings{sanchez-bayona-agerri-2022-leveraging, title = "Leveraging a New {S}panish Corpus for Multilingual and Cross-lingual Metaphor Detection", author = "Sanchez-Bayona, Elisa and Agerri, Rodrigo", editor = "Fokkens, Antske and Srikumar, Vivek", booktitle = "Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL)", month = dec, year = "2022", address = "Abu Dhabi, United Arab Emirates (Hybrid)", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.conll-1.16", doi = "10.18653/v1/2022.conll-1.16", pages = "228--240", abstract = "The lack of wide coverage datasets annotated with everyday metaphorical expressions for languages other than English is striking. This means that most research on supervised metaphor detection has been published only for that language. In order to address this issue, this work presents the first corpus annotated with naturally occurring metaphors in Spanish large enough to develop systems to perform metaphor detection. The presented dataset, CoMeta, includes texts from various domains, namely, news, political discourse, Wikipedia and reviews. In order to label CoMeta, we apply the MIPVU method, the guidelines most commonly used to systematically annotate metaphor on real data. We use our newly created dataset to provide competitive baselines by fine-tuning several multilingual and monolingual state-of-the-art large language models. Furthermore, by leveraging the existing VUAM English data in addition to CoMeta, we present the, to the best of our knowledge, first cross-lingual experiments on supervised metaphor detection. Finally, we perform a detailed error analysis that explores the seemingly high transfer of everyday metaphor across these two languages and datasets.", }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作