ngoan/WikiMatrix.en-vi
收藏数据集卡片 for WikiMatrix English-Vietnamese Parallel Sentences
数据集概述
WikiMatrix English-Vietnamese Parallel Sentences 数据集包含从 WikiMatrix 项目中提取的英语和越南语平行句子。该数据集是机器翻译和跨语言理解等任务的宝贵资源。
支持的任务和排行榜
- 机器翻译
- 跨语言理解
语言
- 英语
- 越南语
附加信息
许可信息
该数据集根据知识共享署名-相同方式共享许可证发布。
引用信息
[1] Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong and Paco Guzman, WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia arXiv, July 11 2019.
[2] Mikel Artetxe and Holger Schwenk, Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings arXiv, Nov 3 2018.
[3] Mikel Artetxe and Holger Schwenk, Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond arXiv, Dec 26 2018.
[4] Ye Qi, Devendra Sachan, Matthieu Felix, Sarguna Padmanabhan and Graham Neubig, When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation? NAACL, pages 529-535, 2018.



