turkish-nlp-suite/sinefil-movie-reviews
收藏Hugging Face2024-07-15 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/turkish-nlp-suite/sinefil-movie-reviews
下载链接
链接失效反馈官方服务:
资源简介:
Sinefil Movie Reviews是一个用于土耳其语电影评论情感分析的数据集,数据来源于Sinefil.com网站。该数据集包含观众对电影的评论,评分范围在1到9.9之间,且评分分布偏向于7分以上。数据集实例展示了评论的URL、文本内容和评分。数据集被分为训练集、验证集和测试集,具体数量分别为32914、5000和5000。
Sinefil Movie Reviews is a Turkish sentiment analysis dataset of movie reviews, scraped from the popular movie reviews website Sinefil.com. The dataset includes audience reviews about movies from all periods. The score field ranges from 1 to 9.9, with a distribution skewed towards scores above 7, most movies rated between 7-9. The dataset is split into 32914 training instances, 5000 validation instances, and 5000 test instances.
提供机构:
turkish-nlp-suite
原始信息汇总
Sinefil Movie Reviews
数据集概述
Sinefil Movie Reviews 是一个土耳其语电影评论情感分析数据集,从流行的电影评论网站 Sinefil.com 上抓取。该数据集包含观众对不同时期电影的评论。
评分分布
评分字段取值范围为1到9.9,具体分布如下:
| 评分区间 | 数量 |
|---|---|
| 1-2 | 2323 |
| 2-3 | 874 |
| 3-4 | 1252 |
| 4-5 | 1881 |
| 5-6 | 3484 |
| 6-7 | 6636 |
| 7-8 | 11430 |
| 8-9 | 10412 |
| 9-9.9 | 4622 |
| 总计 | 42914 |
评分主要集中在7-9区间,低于7分的电影数量较少。
数据实例
数据集中的一个实例如下:
json { "url": "https://www.sinefil.com/title/yorq1ly", "text": "Maryyi oynayan oyuncu, rolü çok abartıyor; zaten Mary karakterinin ezikliğine, sünepeliğine bir yerden sonra tahammül edemiyorsunuz! Bazıları bunu iyi oyunculuk diye çeviriyor; fakat artık yüzünü görmeye dayanamadım o kadının! Bunları dışında sıcak, rahatlatan, kendini izleten bir film. Kafa dağıtmak için izlenebileceklerden...", "score": 6.9 }
数据分割
数据集分为训练集、验证集和测试集,具体分配如下:
| 名称 | 训练集 | 验证集 | 测试集 |
|---|---|---|---|
| Sinefil Movie Reviews | 32914 | 5000 | 5000 |



