five

turkish-nlp-suite/MusteriYorumlari

收藏
Hugging Face2024-11-01 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/turkish-nlp-suite/MusteriYorumlari
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - Duygu Altinok language: - tr license: - cc-by-sa-4.0 multilinguality: - monolingual size_categories: - 10K<n<100K source_datasets: - original task_categories: - text-classification task_ids: - sentiment-classification pretty_name: MusteriYorumlari tags: - sentiment dataset_info: features: - name: text dtype: string - name: label dtype: class_label: names: 0: 1_star 1: 2_star 2: 3_star 3: 4_star 4: 5_star splits: - name: train num_bytes: 46979645 num_examples: 73920 - name: validation num_bytes: 733500 num_examples: 15000 - name: test num_bytes: 742661 num_examples: 15000 download_size: 58918801 data_files: - split: train path: data/train-* - split: validation path: data/valid-* - split: test path: data/test-* --- # MüşteriYorumlari - A Large Scale Customer Sentiment Analysis Dataset for Turkish <img src="https://raw.githubusercontent.com/turkish-nlp-suite/.github/main/profile/musteriyorumlarilogo.png" width="30%" height="30%"> ## Dataset Summary MüşteriYorumları is a Turkish e-commerce customer reviews dataset of size 103K, scraped from Hepsiburada.com and Trendyol.com. These reviews encompass a wide array of product categories, including apparel, food items, baby products, and books. Review stars are in range of 1-5 stars. The star distribution is as follows: | star rating | count | |---|---| | 1 | 12,873 | | 2 | 11,472 | | 3 | 18,054 | | 4 | 31,207 | | 5 | 30,314 | | total | 103,920 | The star distribution is quite skewed towards 4+ reviews. For more information about dataset statistics, please refer to the [research paper](). ## Dataset Instances An instance looks like: ``` { "text": "SÜPEEEER KALİTE", "label": 4 #5 stars } ``` ## Data Split | name |train|validation|test| |---------|----:|---:|---:| |MüşteriYorumları Customer Reviews|73920|15000|15000| ## Benchmarking This dataset is a part of [SentiTurca](https://huggingface.co/datasets/turkish-nlp-suite/SentiTurca) benchmark, in the benchmark the subset name is **e-commerce**, named according to the GLUE tasks. Model benchmarking information can be found under SentiTurca HF repo and benchmarking scripts can be found under [SentiTurca Github repo](https://github.com/turkish-nlp-suite/SentiTurca). For this dataset we benchmarked a transformer based model BERTurk and a handful of LLMs. Success of each model is follows: | Model | acc./F1 | |---|---| | Gemini 1.0 Pro | 1.0/1.0 | | GPT-4 Turbo | 0.64/0.63 | | Claude 3 Sonnet | 0.57/0.53 | | Llama 3 70B | 0.58/0.55 | | Qwen2-72B | 0.53/0.50 | | BERTurk | 0.66/0.64 | For a critique of the results, misclassified instances and more please consult to the [research paper](). ## Citation Coming soon!!
提供机构:
turkish-nlp-suite
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作