five

Uzbek Syllable Dataset for Linguistic and Natural Language Processing Research

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/th2mkg56m2
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset presents a structured collection of Uzbek language syllables designed to support research in linguistics and natural language processing (NLP). The dataset contains syllabified textual units derived from Uzbek words, enabling analysis at the subword (syllable) level. The primary goal of this dataset is to facilitate the development and evaluation of computational models for tasks such as syllabification, text segmentation, speech processing, and language modeling. It is particularly useful for low-resource language research, where high-quality annotated data is limited. The dataset is provided in a tabular (TSV) format, ensuring compatibility with common data processing tools and machine learning frameworks. Each entry represents syllable-level information extracted and organized for efficient computational use. This resource can be applied in various domains, including: - Natural language processing (NLP) - Computational linguistics - Speech recognition and synthesis - Text normalization and tokenization - Educational and linguistic analysis of the Uzbek language The dataset aims to contribute to the advancement of Uzbek language technologies by providing a reliable and structured foundation for both academic research and practical applications.
创建时间:
2026-04-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作