five

uznlp-uz/uz_med_sentiment

收藏
Hugging Face2026-04-02 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/uznlp-uz/uz_med_sentiment
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: UzMedSentiment language: - uz license: cc-by-4.0 # ⚠️ change if needed task_categories: - text-classification task_ids: - sentiment-classification size_categories: - 100K<n<1M configs: - config_name: default data_files: - split: train path: dataset(UzMedSentiment).tsv sep: "\t" --- # UzMedSentiment ## 📌 Dataset Summary **UzMedSentiment** is a large-scale Uzbek medical-domain sentiment dataset designed for **aspect-based sentiment analysis** and **auxiliary linguistic signal detection**. Each record contains: - user-generated medical text - aspect label - sentiment label - additional annotations (negation, speculation, sarcasm, ADR flags, etc.) The dataset is released as a **single TSV file** compatible with the Hugging Face `datasets` library. --- ## 🎯 Supported Tasks - Sentiment classification - Aspect-based sentiment analysis (ABSA) - Medical text mining (Uzbek) - Detection of: - negation - speculation - sarcasm - adverse drug reactions (ADR) --- ## 🌍 Languages - Uzbek (primary) Script distribution: - `uz-latin`: 165,779 rows - `uz-kiril`: 991 rows ⚠️ Note: Some entries may include noise, code-switching, or non-standard spelling. --- ## 📊 Dataset Size - Total rows: **166,770** - Format: **TSV** - Columns: **13** - Avg. text length: **85.49 chars** - Median length: **73 chars** - Max length: **2,494 chars** --- ## 📈 Label Distribution ### Sentiment - `NEU`: 146,845 - `POS`: 17,915 - `NEG`: 2,010 ⚠️ Dataset is **highly imbalanced** toward neutral class. --- ### Aspect Labels - diagnostika - dori - infratuzilma - kutish-vaqti - muolaja - narx - parhez - shifokor-munosabati - simptom - xizmat - szolg‘a/xizmat (inconsistent variant) --- ### Source Platforms - youtube: 61,747 - telegram: 36,232 - instagram: 26,965 - tiktok: 16,140 - facebook: 13,463 - twitter_x: 8,508 - quora: 3,116 - forum: 520 - web: 72 - web-komment: 7 --- ## 🧱 Dataset Structure ### Hugging Face Usage ```python from datasets import load_dataset dataset = load_dataset("uznlp-uz/uz_med_sentiment") print(dataset["train"][0])
提供机构:
uznlp-uz
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作