five

Dataset for Sentiment and Named Entity Analysis in Uzbek Texts

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/y2d5pcyrzz
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains 15,000 synthetically generated Uzbek sentences annotated for sentiment (positive/neutral/negative) and named entities in three categories: PER, ORG, and LOC. It includes two subsets: Hybrid Synthetic Corpus (12,000) generated via templates with lexical polarity resources and curated NER gazetteers, and Manual-Style Synthetic Corpus (3,000) created using short natural-style patterns with higher emoji frequency to reflect conversational usage. Each record provides: id, text, sentiment, entities (JSON), entity_type (JSON aligned with entities), polarity_score, polarity_source, token_count, and emojis (JSON). Emoji presence is ~30% in the hybrid subset and ~39% in the manual-style subset, with emojis grouped into positive/neutral/negative classes. The dataset is released in CSV and XLSX (UTF-8) and distributed under CC BY 4.0.
创建时间:
2026-01-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作