English-Nepali Parallel Corpus

DataCite Commons2022-06-01 更新2025-04-15 收录

下载链接：

https://live.european-language-grid.eu/catalogue/corpus/921

下载链接

链接失效反馈

官方服务：

资源简介：

The Nepali Monolingual written corpus is one of the 3 resources that constitute the Nepali National Corpus. The Nepali National Corpus was produced in 2006 in the framework of the project Bhasha Sanchar (“language communication”), also known as Nelralec, for Nepali Language Resources and Localization for Education and Communication; funded by the EU Asia IT&C programme, reference number ASIE/2004/091-777.This corpus consists of a collection of national development texts in English and Nepali and is divided as follows:- a small set of data aligned at the sentence level (27,060 English words; 21,756 Nepali words), provided in the tmx format (xml file):- a larger set of texts aligned at the document level (617,340 English words; 596,571 Nepali words), provided in raw text and in the original word processing format.- an additional set of monolingual data in Nepali (386,879 words in Nepali), provided in raw text and in the original word processing format.

尼泊尔单语书面语料库（corpus）是构成尼泊尔国家语料库的三大资源之一。尼泊尔国家语料库于2006年在“Bhasha Sanchar”（意为“语言交流”，又称Nelralec，即尼泊尔教育与交流语言资源及本地化项目）项目框架下构建，该项目由欧盟亚洲IT&C计划资助，项目编号为ASIE/2004/091-777。该语料库包含一系列英文和尼泊尔语的国家发展文本，具体分为以下几部分： - 小规模句级对齐数据（英文27,060词，尼泊尔语21,756词），以tmx格式（xml文件）提供； - 较大规模文档级对齐文本（英文617,340词，尼泊尔语596,571词），以原始文本及原文字处理格式提供； - 额外的尼泊尔语单语数据（386,879词），以原始文本及原文字处理格式提供。

提供机构：

ELG

创建时间：

2022-06-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集