five

ngoan/WikiMatrix.en-vi

收藏
Hugging Face2023-09-06 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/ngoan/WikiMatrix.en-vi
下载链接
链接失效反馈
官方服务:
资源简介:
--- # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/datasetcard.md?plain=1 # Doc / guide: https://huggingface.co/docs/hub/datasets-cards {} --- # Dataset Card for WikiMatrix English-Vietnamese Parallel Sentences ### Dataset Summary The WikiMatrix English-Vietnamese Parallel Sentences dataset contains parallel sentences in English and Vietnamese extracted from the WikiMatrix project. This dataset is a valuable resource for tasks such as machine translation and cross-lingual understanding. ### Supported Tasks and Leaderboards - Machine Translation - Cross-lingual Understanding ### Languages - English - Vietnamese ## Additional Information ### Licensing Information The dataset is distributed under the Creative Commons Attribution-ShareAlike License. ### Citation Information [1] Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong and Paco Guzman, [*WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia*](https://arxiv.org/abs/1907.05791) arXiv, July 11 2019. [2] Mikel Artetxe and Holger Schwenk, [*Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings*](https://arxiv.org/abs/1811.01136) arXiv, Nov 3 2018. [3] Mikel Artetxe and Holger Schwenk, [*Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond*](https://arxiv.org/abs/1812.10464) arXiv, Dec 26 2018. [4] Ye Qi, Devendra Sachan, Matthieu Felix, Sarguna Padmanabhan and Graham Neubig, [*When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation?*](https://www.aclweb.org/anthology/papers/N/N18/N18-2084/) NAACL, pages 529-535, 2018.
提供机构:
ngoan
原始信息汇总

数据集卡片 for WikiMatrix English-Vietnamese Parallel Sentences

数据集概述

WikiMatrix English-Vietnamese Parallel Sentences 数据集包含从 WikiMatrix 项目中提取的英语和越南语平行句子。该数据集是机器翻译和跨语言理解等任务的宝贵资源。

支持的任务和排行榜

  • 机器翻译
  • 跨语言理解

语言

  • 英语
  • 越南语

附加信息

许可信息

该数据集根据知识共享署名-相同方式共享许可证发布。

引用信息

[1] Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong and Paco Guzman, WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia arXiv, July 11 2019.

[2] Mikel Artetxe and Holger Schwenk, Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings arXiv, Nov 3 2018.

[3] Mikel Artetxe and Holger Schwenk, Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond arXiv, Dec 26 2018.

[4] Ye Qi, Devendra Sachan, Matthieu Felix, Sarguna Padmanabhan and Graham Neubig, When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation? NAACL, pages 529-535, 2018.

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作