Uzbek Syllable Dataset for Linguistic and Natural Language Processing Research

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://data.mendeley.com/datasets/th2mkg56m2

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset presents a structured collection of Uzbek language syllables designed to support research in linguistics and natural language processing (NLP). The dataset contains syllabified textual units derived from Uzbek words, enabling analysis at the subword (syllable) level. The primary goal of this dataset is to facilitate the development and evaluation of computational models for tasks such as syllabification, text segmentation, speech processing, and language modeling. It is particularly useful for low-resource language research, where high-quality annotated data is limited. The dataset is provided in a tabular (TSV) format, ensuring compatibility with common data processing tools and machine learning frameworks. Each entry represents syllable-level information extracted and organized for efficient computational use. This resource can be applied in various domains, including: - Natural language processing (NLP) - Computational linguistics - Speech recognition and synthesis - Text normalization and tokenization - Educational and linguistic analysis of the Uzbek language The dataset aims to contribute to the advancement of Uzbek language technologies by providing a reliable and structured foundation for both academic research and practical applications.

创建时间：

2026-04-10

5,000+

优质数据集

54 个

任务类型

进入经典数据集