gsarti/magpie
收藏数据集概述
数据集名称: MAGPIE
数据集描述: MAGPIE是一个大规模的潜在习语表达(PIEs)的感官注释语料库,基于英国国家语料库(BNC)。该数据集包含37,000个样本,注释为完全比喻或字面意义,涉及1482个包含颜色名词、数字或形容词的习语。
语言: 英语(BCP-47 en)
数据集结构:
- 数据实例: 每个实例包含句子、注释、习语、使用方式、变体和词性标签。
- 数据分割: 训练集包含44,451个实例。
许可证: 知识共享4.0许可证(CC-BY-4.0)
任务类别: 文本分类、文本到文本生成、翻译
数据集创建: 由专家生成,参考原始文章MAGPIE: A Large Corpus of Potentially Idiomatic Expressions和Can Transformer be Too Compositional? Analysing Idiom Processing in Neural Machine Translation。
引用信息: bibtex @inproceedings{haagsma-etal-2020-magpie, title = "{MAGPIE}: A Large Corpus of Potentially Idiomatic Expressions", author = "Haagsma, Hessel and Bos, Johan and Nissim, Malvina", booktitle = "Proceedings of the 12th Language Resources and Evaluation Conference", month = may, year = "2020", address = "Marseille, France", publisher = "European Language Resources Association", url = "https://aclanthology.org/2020.lrec-1.35", pages = "279--287", language = "English", ISBN = "979-10-95546-34-4", } @inproceedings{dankers-etal-2022-transformer, title = "Can Transformer be Too Compositional? Analysing Idiom Processing in Neural Machine Translation", author = "Dankers, Verna and Lucas, Christopher and Titov, Ivan", booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = may, year = "2022", address = "Dublin, Ireland", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.acl-long.252", doi = "10.18653/v1/2022.acl-long.252", pages = "3608--3626", }




