five

Southern Kurdish and Laki Corpus

收藏
arXiv2023-04-04 更新2024-06-21 收录
下载链接:
https://github.com/sinaahmadi/KurdishLID
下载链接
链接失效反馈
官方服务:
资源简介:
本数据集名为Southern Kurdish and Laki Corpus,由乔治梅森大学计算机科学系创建,旨在为Southern Kurdish和Laki这两种低资源语言提供语言数据支持。数据集包含16,003条记录,涵盖了从地方新闻网站、广播内容及田野调查中收集的数据。创建过程中面临了书写和标准化方面的挑战,以及数据源的获取和手写内容的数字化问题。该数据集主要应用于语言识别任务,支持语言技术的开发,如语音识别和机器翻译,以促进这些语言的保存和使用。

This dataset, titled Southern Kurdish and Laki Corpus, was created by the Department of Computer Science at George Mason University, with the core objective of providing linguistic data support for two low-resource languages: Southern Kurdish and Laki. Comprising 16,003 records, the corpus includes data collected from local news websites, broadcast content, and field surveys. During its development, challenges arose regarding orthographic standardization, as well as issues surrounding data source acquisition and the digitization of handwritten materials. Primarily utilized for language identification tasks, this dataset supports the development of language technologies such as speech recognition and machine translation, aiming to promote the preservation and practical application of these two languages.
提供机构:
乔治梅森大学计算机科学系
创建时间:
2023-04-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作