ngoan/WikiMatrix.en-vi

Name: ngoan/WikiMatrix.en-vi
Creator: ngoan
Published: 2023-09-06 10:31:37
License: 暂无描述

Hugging Face2023-09-06 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/ngoan/WikiMatrix.en-vi

下载链接

链接失效反馈

官方服务：

资源简介：

--- # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/datasetcard.md?plain=1 # Doc / guide: https://huggingface.co/docs/hub/datasets-cards {} --- # Dataset Card for WikiMatrix English-Vietnamese Parallel Sentences ### Dataset Summary The WikiMatrix English-Vietnamese Parallel Sentences dataset contains parallel sentences in English and Vietnamese extracted from the WikiMatrix project. This dataset is a valuable resource for tasks such as machine translation and cross-lingual understanding. ### Supported Tasks and Leaderboards - Machine Translation - Cross-lingual Understanding ### Languages - English - Vietnamese ## Additional Information ### Licensing Information The dataset is distributed under the Creative Commons Attribution-ShareAlike License. ### Citation Information [1] Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong and Paco Guzman, [*WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia*](https://arxiv.org/abs/1907.05791) arXiv, July 11 2019. [2] Mikel Artetxe and Holger Schwenk, [*Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings*](https://arxiv.org/abs/1811.01136) arXiv, Nov 3 2018. [3] Mikel Artetxe and Holger Schwenk, [*Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond*](https://arxiv.org/abs/1812.10464) arXiv, Dec 26 2018. [4] Ye Qi, Devendra Sachan, Matthieu Felix, Sarguna Padmanabhan and Graham Neubig, [*When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation?*](https://www.aclweb.org/anthology/papers/N/N18/N18-2084/) NAACL, pages 529-535, 2018.

提供机构：

ngoan

原始信息汇总