ilkayO/yildizsezar-turkish-reviews
收藏Hugging Face2026-04-25 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/ilkayO/yildizsezar-turkish-reviews
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- tr
task_categories:
- text-classification
tags:
- sentiment-analysis
- ecommerce
- synthetic-data
- llama
license: cc-by-nc-4.0
size_categories:
- 100K<n<1M
---
# 📊 YıldızSezar: Turkish E-Commerce Reviews (Real + LLaMA Synthetic)
[](https://huggingface.co/spaces/ilkayO/YildizSezar-Demo)
[](https://github.com/ilkay-onay/YildizSezar-Review-Classifier)
This dataset is a comprehensive collection of Turkish customer reviews from e-commerce platforms, labeled with 1-to-5 star ratings. It was developed to train highly accurate multi-class sentiment analysis models.
It is the official dataset for the peer-reviewed paper: [A Star Rating-Based Approach in BERT-Based Sentiment Analysis of Customer Feedback](https://dergipark.org.tr/tr/pub/ij3dptdi/article/1732179).
## 📌 Dataset Overview
One of the biggest challenges in analyzing e-commerce reviews is the **class imbalance**—users mostly leave either 5-star (very happy) or 1-star (very angry) reviews.
To solve this, I augmented the real-world dataset by generating over **900,000 synthetic Turkish reviews** specifically targeting the minority classes (2, 3, and 4 stars) using **LLaMA-8B-DPO**. The resulting dataset is highly balanced and morphologically complex.
- **Total Synthetic Samples:** ~900,000
- **Task:** 5-Class Sentiment Analysis / Star Rating Prediction
- **Language:** Turkish
## 📂 Data Splits
The dataset is ready-to-use and pre-split for training, validation, and testing.
| Feature | Type | Description |
|---|---|---|
| `review_text` | `string` | The cleaned customer review text (HTML tags removed, lowercased). |
| `star_rating` | `integer` | The rating given by the customer (from 1 to 5). |
## 💻 Quick Usage
```python
from datasets import load_dataset
dataset = load_dataset("ilkayO/yildizsezar-turkish-reviews")
print(dataset['train'][0])
# Example Output: {'review_text': 'ürün çok kaliteli tavsiye ederim', 'star_rating': 5}
```
## ⚖️ License
This dataset is published under the **CC BY-NC 4.0** license for research and academic purposes.
*(If you are interested in commercial applications or custom NLP data pipelines for your enterprise, please check the contact details on my [Model Page](https://huggingface.co/ilkayO/yildizsezar-convbert).)*
提供机构:
ilkayO



