blinoff/kinopoisk
收藏Hugging Face2024-06-10 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/blinoff/kinopoisk
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- ru
multilinguality:
- monolingual
pretty_name: Kinopoisk
size_categories:
- 10K<n<100K
task_categories:
- text-classification
task_ids:
- sentiment-classification
license: mit
---
### Dataset Summary
Kinopoisk movie reviews dataset (TOP250 & BOTTOM100 rank lists).
In total it contains 36,591 reviews from July 2004 to November 2012.
With following distribution along the 3-point sentiment scale:
- Good: 27,264;
- Bad: 4,751;
- Neutral: 4,576.
### Data Fields
Each sample contains the following fields:
- **part**: rank list top250 or bottom100;
- **movie_name**;
- **review_id**;
- **author**: review author;
- **date**: date of a review;
- **title**: review title;
- **grade3**: sentiment score Good, Bad or Neutral;
- **grade10**: sentiment score on a 10-point scale parsed from text;
- **content**: review text.
### Python
```python3
import pandas as pd
df = pd.read_json('kinopoisk.jsonl', lines=True)
df.sample(5)
```
### Citation
```
@article{blinov2013research,
title={Research of lexical approach and machine learning methods for sentiment analysis},
author={Blinov, PD and Klekovkina, Maria and Kotelnikov, Eugeny and Pestov, Oleg},
journal={Computational Linguistics and Intellectual Technologies},
volume={2},
number={12},
pages={48--58},
year={2013}
}
```
提供机构:
blinoff
原始信息汇总
数据集概述
名称: Kinopoisk电影评论数据集(TOP250 & BOTTOM100排名列表)
语言: 俄语
多语言性: 单语种
大小: 10K<n<100K
任务类别: 文本分类
任务ID: 情感分类
总数据量: 36,591条评论
时间范围: 2004年7月至2012年11月
情感分布:
- 好: 27,264
- 坏: 4,751
- 中性: 4,576
数据字段
每个样本包含以下字段:
- part: 排名列表(top250或bottom100)
- movie_name: 电影名称
- review_id: 评论ID
- author: 评论作者
- date: 评论日期
- title: 评论标题
- grade3: 情感评分(好、坏或中性)
- grade10: 从文本解析的10分制情感评分
- content: 评论文本



