Amazon reviews
收藏www.kaggle.com2021-05-15 更新2025-01-16 收录
下载链接:
https://www.kaggle.com/kritanjalijain/amazon-reviews
下载链接
链接失效反馈官方服务:
资源简介:
Amazon Review Polarity Dataset
### OVERVIEW
Contains 34,686,770 Amazon reviews from 6,643,669 users on 2,441,053 products, from the Stanford Network Analysis Project (SNAP). This subset contains 1,800,000 training samples and 200,000 testing samples in each polarity sentiment.
### ORIGIN
The Amazon reviews dataset consists of reviews from amazon. The data span a period of 18 years, including ~35 million reviews up to March 2013. Reviews include product and user information, ratings, and a plaintext review. For more information, please refer to the following paper: J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. RecSys, 2013.
### DESCRIPTION
The Amazon reviews polarity dataset is constructed by taking review score 1 and 2 as negative, and 4 and 5 as positive. Samples of score 3 is ignored. In the dataset, class 1 is the negative and class 2 is the positive. Each class has 1,800,000 training samples and 200,000 testing samples.
If you need help extracting the `train.csv` and `test.csv` files check out the [starter code](https://www.kaggle.com/kritanjalijain/amazon-reviews-starter-nlp).
The files `train.csv` and `test.csv` contain all the training samples as comma-separated values.
The CSVs contain `polarity`, `title`, `text`. These 3 columns in them, correspond to class index (1 or 2), review title and review text.
* polarity - 1 for negative and 2 for positive
* title - review heading
* text - review body
The review title and text are escaped using double quotes ("), and any internal double quote is escaped by 2 double quotes (""). New lines are escaped by a backslash followed with an "n" character, that is "\n".
### CITATION
The Amazon reviews polarity dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu). It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. [Character-level Convolutional Networks for Text Classification](https://arxiv.org/abs/1509.01626). Advances in Neural Information Processing Systems 28 (NIPS 2015).
{'overview': '本数据集汇聚了来自斯坦福网络分析项目(SNAP)的34,686,770条亚马逊用户针对2,441,053种产品的评论,共计6,643,669位用户的评价。数据集包含1,800,000个训练样本和200,000个测试样本,分别对应两种情感极性。', 'origin': '亚马逊评论数据集由亚马逊网站上的评论构成,涵盖自2005年3月前的18年期间,累计约3500万条评论。数据集内包含商品及用户信息、评分以及文本评论。欲了解更多信息,请参阅以下论文:J. McAuley和J. Leskovec. 隐藏因素与隐藏主题:通过评论文本理解评分维度. RecSys, 2013。', 'description': '亚马逊评论极性数据集通过将评分1和2视为负面,将评分4和5视为正面构建而成。评分3的样本被忽略。数据集中,类别1代表负面,类别2代表正面。每个类别均包含1,800,000个训练样本和200,000个测试样本。如需帮助提取`train.csv`和`test.csv`文件,请参阅[入门代码](https://www.kaggle.com/kritanjalijain/amazon-reviews-starter-nlp)。`train.csv`和`test.csv`文件以逗号分隔值的形式包含所有训练样本。CSV文件包含以下三列:`polarity`(情感极性)、`title`(评论标题)和`text`(评论正文)。其中,`polarity`列的1表示负面,2表示正面;`title`列包含评论标题;`text`列包含评论正文。评论标题和正文中的双引号(")被转义,内部的双引号通过两个双引号("")进行转义。换行符通过反斜杠后跟字符"n"表示,即"\n"。', 'citation': '亚马逊评论极性数据集由张祥(xiang.zhang@nyu.edu)构建。该数据集被用作文本分类基准,在以下论文中有所应用:张祥,赵军波,杨立昆. [基于字符的卷积神经网络进行文本分类](https://arxiv.org/abs/1509.01626). 神经信息处理系统进展第28卷(NIPS 2015)。'}
提供机构:
Kaggle
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



