five

ilkayO/yildizsezar-turkish-reviews

收藏
Hugging Face2026-04-25 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/ilkayO/yildizsezar-turkish-reviews
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - tr task_categories: - text-classification tags: - sentiment-analysis - ecommerce - synthetic-data - llama license: cc-by-nc-4.0 size_categories: - 100K<n<1M --- # 📊 YıldızSezar: Turkish E-Commerce Reviews (Real + LLaMA Synthetic) [![Hugging Face Spaces](https://img.shields.io/badge/🤗%20Test%20The%20Model-Live%20Demo-blue)](https://huggingface.co/spaces/ilkayO/YildizSezar-Demo) [![GitHub](https://img.shields.io/badge/GitHub-Source%20Code-black?logo=github)](https://github.com/ilkay-onay/YildizSezar-Review-Classifier) This dataset is a comprehensive collection of Turkish customer reviews from e-commerce platforms, labeled with 1-to-5 star ratings. It was developed to train highly accurate multi-class sentiment analysis models. It is the official dataset for the peer-reviewed paper: [A Star Rating-Based Approach in BERT-Based Sentiment Analysis of Customer Feedback](https://dergipark.org.tr/tr/pub/ij3dptdi/article/1732179). ## 📌 Dataset Overview One of the biggest challenges in analyzing e-commerce reviews is the **class imbalance**—users mostly leave either 5-star (very happy) or 1-star (very angry) reviews. To solve this, I augmented the real-world dataset by generating over **900,000 synthetic Turkish reviews** specifically targeting the minority classes (2, 3, and 4 stars) using **LLaMA-8B-DPO**. The resulting dataset is highly balanced and morphologically complex. - **Total Synthetic Samples:** ~900,000 - **Task:** 5-Class Sentiment Analysis / Star Rating Prediction - **Language:** Turkish ## 📂 Data Splits The dataset is ready-to-use and pre-split for training, validation, and testing. | Feature | Type | Description | |---|---|---| | `review_text` | `string` | The cleaned customer review text (HTML tags removed, lowercased). | | `star_rating` | `integer` | The rating given by the customer (from 1 to 5). | ## 💻 Quick Usage ```python from datasets import load_dataset dataset = load_dataset("ilkayO/yildizsezar-turkish-reviews") print(dataset['train'][0]) # Example Output: {'review_text': 'ürün çok kaliteli tavsiye ederim', 'star_rating': 5} ``` ## ⚖️ License This dataset is published under the **CC BY-NC 4.0** license for research and academic purposes. *(If you are interested in commercial applications or custom NLP data pipelines for your enterprise, please check the contact details on my [Model Page](https://huggingface.co/ilkayO/yildizsezar-convbert).)*
提供机构:
ilkayO
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作