Dataset for Sentiment and Named Entity Analysis in Uzbek Texts

Mendeley Data2026-04-18 收录

下载链接：

https://data.mendeley.com/datasets/y2d5pcyrzz

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset contains 15,000 synthetically generated Uzbek sentences annotated for sentiment (positive/neutral/negative) and named entities in three categories: PER, ORG, and LOC. It includes two subsets: Hybrid Synthetic Corpus (12,000) generated via templates with lexical polarity resources and curated NER gazetteers, and Manual-Style Synthetic Corpus (3,000) created using short natural-style patterns with higher emoji frequency to reflect conversational usage. Each record provides: id, text, sentiment, entities (JSON), entity_type (JSON aligned with entities), polarity_score, polarity_source, token_count, and emojis (JSON). Emoji presence is ~30% in the hybrid subset and ~39% in the manual-style subset, with emojis grouped into positive/neutral/negative classes. The dataset is released in CSV and XLSX (UTF-8) and distributed under CC BY 4.0.

创建时间：

2026-01-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集