inseq/scat
收藏数据集概述
数据集名称
- SCAT (Supporting Context for Ambiguous Translations corpus)
数据集摘要
- SCAT 是一个英语到法语的翻译数据集,包含人类注释的用于解决多句翻译中代词回指歧义的合理性。
语言
- 源语言:英语 (
en) - 目标语言:法语 (
fr)
数据集结构
- 数据实例格式:每个实例包含多个字段,如
context_en,en,context_fr,fr等,用于存储上下文、源句和目标句。 - 数据分割:数据集分为
train,validation,test三个部分。
数据集创建
- 创建方式:由20名自由职业的英语-法语翻译者通过Upwork平台进行注释。
- 数据来源:从OpenSubtitles2018数据集中选取的14K个示例。
许可证信息
- 许可证:未知
引用信息
-
原始SCAT数据集: bibtex @inproceedings{yin-etal-2021-context, title = "Do Context-Aware Translation Models Pay the Right Attention?", author = "Yin, Kayo and Fernandes, Patrick and Pruthi, Danish and Chaudhary, Aditi and Martins, Andr{e} F. T. and Neubig, Graham", booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.acl-long.65", doi = "10.18653/v1/2021.acl-long.65", pages = "788--801", }
-
SCAT+ (当前版本): bibtex @inproceedings{sarti-etal-2023-quantifying, title = "Quantifying the Plausibility of Context Reliance in Neural Machine Translation", author = "Sarti, Gabriele and Chrupa{l}a, Grzegorz and Nissim, Malvina and Bisazza, Arianna", booktitle = "The Twelfth International Conference on Learning Representations (ICLR 2024)", month = may, year = "2024", address = "Vienna, Austria", publisher = "OpenReview", url = "https://openreview.net/forum?id=XTHfNGI3zT" }



