turkish-nlp-suite/beyazperde-all-movie-reviews
收藏数据集卡片 for turkish-nlp-suite/beyazperde-all-movie-reviews
数据集描述
- 数据集名称: BeyazPerde All Movie Reviews
- 领域: 社交媒体
数据集概述
Beyazperde Movie Reviews 提供土耳其语情感分析数据集,这些数据是从流行的电影评论网站 Beyazperde.com 上抓取的。所有电影评论包括关于所有时期电影的观众评论。以下是星级评分分布:
| 星级评分 | 数量 |
|---|---|
| 0.5 | 3,635 |
| 1.0 | 2,325 |
| 1.5 | 1,077 |
| 2.0 | 1,902 |
| 2.5 | 4,767 |
| 3.0 | 4,347 |
| 3.5 | 6,495 |
| 4.0 | 9,486 |
| 4.5 | 3,652 |
| 5.0 | 7,594 |
| 总计 | 45,280 |
星级评分看起来相当平衡。该数据集提供了理解情感的挑战,将正面情感细分为“非常正面”或“还算正面”。
数据集实例
该数据集的一个实例如下:
json { "movie": "Avatar", "text": "Açıkçası film beklentilerimi karşılayamadı. Tabi her şeyin ilki güzel ama son seride iyi olabilirdi. Filmde görsel olarak her şey güzeldi kendimi filmi izledikten sonra ıslanmış gibi hissettim :D Puan kırdığım noktalar filmin bilim kurgudan fantastiğe doğru kayması. Ardından sır kapısına döndürüp iyilik yapan iyilik bulur moduna girmesi. Çoğu sahnelerin çocuklara hitap etmesi. Neyse serinin üçüncü filmi sağlam olucak gibi...", "rating": 3.5 }
数据分割
| 名称 | 训练集 | 验证集 | 测试集 |
|---|---|---|---|
| BeyazPerde All Movie Reviews | 35,280 | 5,000 | 5,000 |
引用
如果您想在自己的工作中使用此数据集,请引用 A Diverse Set of Freely Available Linguistic Resources for Turkish:
bibtex @inproceedings{altinok-2023-diverse, title = "A Diverse Set of Freely Available Linguistic Resources for {T}urkish", author = "Altinok, Duygu", booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = jul, year = "2023", address = "Toronto, Canada", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.acl-long.768", pages = "13739--13750", abstract = "This study presents a diverse set of freely available linguistic resources for Turkish natural language processing, including corpora, pretrained models and education material. Although Turkish is spoken by a sizeable population of over 80 million people, Turkish linguistic resources for natural language processing remain scarce. In this study, we provide corpora to allow practitioners to build their own applications and pretrained models that would assist industry researchers in creating quick prototypes. The provided corpora include named entity recognition datasets of diverse genres, including Wikipedia articles and supplement products customer reviews. In addition, crawling e-commerce and movie reviews websites, we compiled several sentiment analysis datasets of different genres. Our linguistic resources for Turkish also include pretrained spaCy language models. To the best of our knowledge, our models are the first spaCy models trained for the Turkish language. Finally, we provide various types of education material, such as video tutorials and code examples, that can support the interested audience on practicing Turkish NLP. The advantages of our linguistic resources are three-fold: they are freely available, they are first of their kind, and they are easy to use in a broad range of implementations. Along with a thorough description of the resource creation process, we also explain the position of our resources in the Turkish NLP world.", }




