five

Nepali Monolingual written corpus

收藏
DataCite Commons2022-06-01 更新2024-07-13 收录
下载链接:
https://live.european-language-grid.eu/catalogue/corpus/911
下载链接
链接失效反馈
官方服务:
资源简介:
The Nepali Monolingual written corpus is one of the 3 resources that constitute the Nepali National Corpus. The Nepali National Corpus was produced in 2006 in the framework of the project Bhasha Sanchar (“language communication”), also known as Nelralec, for Nepali Language Resources and Localization for Education and Communication; funded by the EU Asia IT&C programme, reference number ASIE/2004/091-777.<p><p>The Nepali Monolingual written corpus comprises the core corpus (core sample) and the general corpus. <p><p>The core sample (CS) represents the collection of Nepali written texts from 15 different genres with 2000 words each published between 1990 and 1992. It is based on FLOB/FROWN corpora and contains 802,000 words. <p><p>The general corpus (GC) consists of written texts collected opportunistically from a wide range of sources such as the internet webs, newspapers, books, publishers and authors. It contains 1,400,000 words. This part of the corpus was intended to allow corpus analyses that depend on a very large corpus.<p><p>The written corpus is morphogically-annotated. A part-of-speech (POS) tagset has been produced within the project: the Nelralec Tagset. This is a categorisation system for the manual and automated analysis of morphosyntactic units in Nepali.
提供机构:
ELG
创建时间:
2022-06-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作