turkish-nlp-suite/MusteriYorumlari

Name: turkish-nlp-suite/MusteriYorumlari
Creator: turkish-nlp-suite
Published: 2024-11-01 15:22:59
License: 暂无描述

Hugging Face2024-11-01 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/turkish-nlp-suite/MusteriYorumlari

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - Duygu Altinok language: - tr license: - cc-by-sa-4.0 multilinguality: - monolingual size_categories: - 10K<n<100K source_datasets: - original task_categories: - text-classification task_ids: - sentiment-classification pretty_name: MusteriYorumlari tags: - sentiment dataset_info: features: - name: text dtype: string - name: label dtype: class_label: names: 0: 1_star 1: 2_star 2: 3_star 3: 4_star 4: 5_star splits: - name: train num_bytes: 46979645 num_examples: 73920 - name: validation num_bytes: 733500 num_examples: 15000 - name: test num_bytes: 742661 num_examples: 15000 download_size: 58918801 data_files: - split: train path: data/train-* - split: validation path: data/valid-* - split: test path: data/test-* --- # MüşteriYorumlari - A Large Scale Customer Sentiment Analysis Dataset for Turkish <img src="https://raw.githubusercontent.com/turkish-nlp-suite/.github/main/profile/musteriyorumlarilogo.png" width="30%" height="30%"> ## Dataset Summary MüşteriYorumları is a Turkish e-commerce customer reviews dataset of size 103K, scraped from Hepsiburada.com and Trendyol.com. These reviews encompass a wide array of product categories, including apparel, food items, baby products, and books. Review stars are in range of 1-5 stars. The star distribution is as follows: | star rating | count | |---|---| | 1 | 12,873 | | 2 | 11,472 | | 3 | 18,054 | | 4 | 31,207 | | 5 | 30,314 | | total | 103,920 | The star distribution is quite skewed towards 4+ reviews. For more information about dataset statistics, please refer to the [research paper](). ## Dataset Instances An instance looks like: ``` { "text": "SÜPEEEER KALİTE", "label": 4 #5 stars } ``` ## Data Split | name |train|validation|test| |---------|----:|---:|---:| |MüşteriYorumları Customer Reviews|73920|15000|15000| ## Benchmarking This dataset is a part of [SentiTurca](https://huggingface.co/datasets/turkish-nlp-suite/SentiTurca) benchmark, in the benchmark the subset name is **e-commerce**, named according to the GLUE tasks. Model benchmarking information can be found under SentiTurca HF repo and benchmarking scripts can be found under [SentiTurca Github repo](https://github.com/turkish-nlp-suite/SentiTurca). For this dataset we benchmarked a transformer based model BERTurk and a handful of LLMs. Success of each model is follows: | Model | acc./F1 | |---|---| | Gemini 1.0 Pro | 1.0/1.0 | | GPT-4 Turbo | 0.64/0.63 | | Claude 3 Sonnet | 0.57/0.53 | | Llama 3 70B | 0.58/0.55 | | Qwen2-72B | 0.53/0.50 | | BERTurk | 0.66/0.64 | For a critique of the results, misclassified instances and more please consult to the [research paper](). ## Citation Coming soon!!

提供机构：

turkish-nlp-suite

5,000+

优质数据集

54 个

任务类型

进入经典数据集