five

A Multi-Source Synthetic Dataset for Uzbek Sentiment Analysis, Named Entity Recognition, and Normalization

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/khnmnp4t7v
下载链接
链接失效反馈
官方服务:
资源简介:
This repository provides a multi-source synthetic Uzbek dataset for (i) sentiment classification (Positive/Neutral/Negative) and (ii) named entity recognition with PER/LOC/ORG/DATE labels, plus auxiliary resources for emoji-aware modeling and text normalization. The main file contains 10,000 unique sentences with aligned entity spans (surface forms + types) and an emoji-aware score in [-1,1]. Emoji usage is source-dependent (news ~15%, social ~75%, dialog ~55%) to better reflect real communication styles. All data were generated programmatically from rule-based templates and lexicons; no copyrighted or real user content was used. Primary formats are CSV and JSONL (XLSX provided only for convenience).
创建时间:
2026-01-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作