Uzbek Syllable Dataset for Linguistic and Natural Language Processing Research
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/th2mkg56m2
下载链接
链接失效反馈官方服务:
资源简介:
This dataset presents a structured collection of Uzbek language syllables designed to support research in linguistics and natural language processing (NLP). The dataset contains syllabified textual units derived from Uzbek words, enabling analysis at the subword (syllable) level.
The primary goal of this dataset is to facilitate the development and evaluation of computational models for tasks such as syllabification, text segmentation, speech processing, and language modeling. It is particularly useful for low-resource language research, where high-quality annotated data is limited.
The dataset is provided in a tabular (TSV) format, ensuring compatibility with common data processing tools and machine learning frameworks. Each entry represents syllable-level information extracted and organized for efficient computational use.
This resource can be applied in various domains, including:
- Natural language processing (NLP)
- Computational linguistics
- Speech recognition and synthesis
- Text normalization and tokenization
- Educational and linguistic analysis of the Uzbek language
The dataset aims to contribute to the advancement of Uzbek language technologies by providing a reliable and structured foundation for both academic research and practical applications.
创建时间:
2026-04-10



