mteb/NTREX
收藏数据集概述
名称: NTREX -- News Test References for MT Evaluation
语言: 支持128种语言,包括但不限于Afrikaans, Amharic, Arabic, Azerbaijani, Bashkir, Belarusian, Bengali, Tibetan, Bosnian, Bulgarian, Catalan, Czech, Sorani Kurdish, Welsh, Danish, German, Dhivehi, Dzongkha, Greek, English, Estonian, Basque, Ewe, Faroese, Persian, Fijian, Filipino, Finnish, French, Irish, Galician, Gujarati, Hausa, Hebrew, Hindi, Hmong, Croatian, Hungarian, Armenian, Igbo, Indonesian, Icelandic, Italian, Japanese, Kannada, Georgian, Kazakh, Khmer, Kinyarwanda, Kyrgyz, Northern Kurdish, Korean, Lao, Latvian, Lithuanian, Luxembourgish, Malayalam, Marathi, Hassaniya Arabic, Macedonian, Malagasy, Maltese, Mongolian, Maori, Malay, Burmese, Ndebele, Nepali, Dutch, Norwegian Nynorsk, Norwegian Bokmål, Northern Sotho, Chichewa, Oromo, Punjabi (Gurmukhi), Polish, Portuguese, Dari, Pashto, Romanian, Russian, Tachelhit, Sinhala, Slovak, Slovenian, Samoan, Shona, Sindhi, Somali, Spanish, Albanian, Serbian, Swati, Swahili, Swedish, Tahitian, Tamil, Tatar, Telugu, Tajik, Thai, Tigrinya, Tongan, Tswana, Turkmen, Turkish, Uighur, Ukrainian, Urdu, Uzbek, Venda, Vietnamese, Wolof, Xhosa, Yoruba, Cantonese, Chinese (Simplified), Chinese (Traditional), Zulu.
许可证: CC-BY-SA-4.0
多语言性: 支持翻译任务
任务类别: 翻译
大小: 1997
配置:
- 默认配置:
- 数据文件:
- 测试集: test.parquet
- 数据文件:
引用信息
若引用此数据集,请使用以下引用信息:
@inproceedings{federmann-etal-2022-ntrex, title = "{NTREX}-128 {--} News Test References for {MT} Evaluation of 128 Languages", author = "Federmann, Christian and Kocmi, Tom and Xin, Ying", booktitle = "Proceedings of the First Workshop on Scaling Up Multilingual Evaluation", month = "nov", year = "2022", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.sumeval-1.4", pages = "21--24", }
同时,也请引用提供英语源数据的WMT 2019论文:
@inproceedings{barrault-etal-2019-findings, title = "Findings of the 2019 Conference on Machine Translation ({WMT}19)", author = {Barrault, Lo{"i}c and Bojar, Ond{v{r}}ej and Costa-juss{`a}, Marta R. and Federmann, Christian and Fishel, Mark and Graham, Yvette and Haddow, Barry and Huck, Matthias and Koehn, Philipp and Malmasi, Shervin and Monz, Christof and M{"u}ller, Mathias and Pal, Santanu and Post, Matt and Zampieri, Marcos}, editor = "Bojar, Ond{v{r}}ej and Chatterjee, Rajen and Federmann, Christian and Fishel, Mark and Graham, Yvette and Haddow, Barry and Huck, Matthias and Yepes, Antonio Jimeno and Koehn, Philipp and Martins, Andr{e} and Monz, Christof and Negri, Matteo and N{e}v{e}ol, Aur{e}lie and Neves, Mariana and Post, Matt and Turchi, Marco and Verspoor, Karin", booktitle = "Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)", month = aug, year = "2019", address = "Florence, Italy", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/W19-5301", doi = "10.18653/v1/W19-5301", pages = "1--61", }




