An Annotated Corpus of Uzbek Business Reviews for Aspect-Based Sentiment Analysis
收藏Zenodo2026-02-26 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.18790638
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains 5,038 annotated business reviews designed for Aspect-Based Sentiment Analysis (ABSA). The reviews were scraped from Commeta Sharh, a publicly accessible business review platform in Uzbekistan. The dataset captures the natural linguistic diversity of the region, featuring mixed-language text (including Russian and Uzbek) alongside Uzbek-language metadata categories.
The corpus spans 630 unique businesses across 23 domains (e.g., Education/Ta'lim). It serves as a valuable resource for evaluating low-resource and code-switched NLP models, specifically for extracting specific business aspects and their associated sentiment polarities.
Dataset Characteristics
Total Reviews: 5,038 (filtered from a larger pool, excluding entries with fewer than five words).
Businesses Covered: 630
Business Domains: 23
Task: Aspect-Based Sentiment Analysis (ABSA) – Aspect Term Extraction (ATE) and Aspect Polarity Classification (APC).
Data Structure The dataset is provided in JSON format. Each entry represents a single user review and contains the following fields:
review_id: A unique identifier for the review.
text: The raw text of the user review.
business_name: The name of the reviewed business.
business_category: The domain/industry of the business (e.g., "Ta'lim" for Education).
user_rating: The numerical rating given by the user (typically 1-5).
aspects: A list of extracted aspects, where each aspect contains:
term: The specific word or phrase from the text representing the aspect.
category: The broader category of the aspect (e.g., "xizmat" for service, "boshqalar" for others).
polarity: The sentiment expressed toward the aspect (positive, negative, or neutral).
num_aspects: The total count of aspects identified in the text.
annotation_source: The model used for the automated annotation pipeline (e.g., qwen2.5-7b-finetuned).
parse_success: A boolean indicating if the model output was successfully parsed into the JSON structure.
raw_output: The raw JSON string generated by the fine-tuned LLM before parsing.
提供机构:
Zenodo
创建时间:
2026-02-26



