Amazon Review Polarity
收藏DataCite Commons2025-06-01 更新2024-07-28 收录
下载链接:
https://figshare.com/articles/dataset/Amazon_Review_Polarity/13232501/1
下载链接
链接失效反馈官方服务:
资源简介:
<b>Amazon Review Polaridy Dataset</b><br>Version 3, Updated 09/09/2015<br>ORIGIN<br>The Amazon reviews dataset consists of reviews from amazon. The data span a period of 18 years, including ~35 million reviews up to March 2013. Reviews include product and user information, ratings, and a plaintext review. For more information, please refer to the following paper: J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. RecSys, 2013.<br>The Amazon reviews polarity dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the above dataset. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).<br><br>DESCRIPTION<br>The Amazon reviews polarity dataset is constructed by taking review score 1 and 2 as negative, and 4 and 5 as positive. Samples of score 3 is ignored. In the dataset, class 1 is the negative and class 2 is the positive. Each class has 1,800,000 training samples and 200,000 testing samples.<br>The files train.csv and test.csv contain all the training samples as comma-sparated values. There are 3 columns in them, corresponding to class index (1 or 2), review title and review text. The review title and text are escaped using double quotes ("), and any internal double quote is escaped by 2 double quotes (""). New lines are escaped by a backslash followed with an "n" character, that is "\n".<br>
<b>亚马逊评论极性数据集(Amazon Review Polarity Dataset)</b><br>版本3,更新于2015年9月9日<br>来源<br>本亚马逊评论数据集收录了亚马逊平台的用户评论,时间跨度长达18年,截至2013年3月共计收录约3500万条评论。数据包含产品、用户信息、评分等级以及纯文本评论内容。更多详细信息可参考以下论文:J. McAuley与J. Leskovec所著《隐因子与隐主题:基于评论文本理解评分维度(Hidden factors and hidden topics: understanding rating dimensions with review text)》,发表于RecSys 2013。<br>亚马逊评论极性数据集由张翔(Xiang Zhang,电子邮箱:xiang.zhang@nyu.edu)基于上述原始数据集构建,并被用作以下论文中的文本分类基准数据集:Xiang Zhang、Junbo Zhao、Yann LeCun所著《文本分类的字符级卷积神经网络(Character-level Convolutional Networks for Text Classification)》,发表于《神经信息处理系统进展》第28卷(Advances in Neural Information Processing Systems 28,NIPS 2015)。<br><br>数据集说明<br>亚马逊评论极性数据集的构建规则为:将评分1和2划分为负样本,评分4和5划分为正样本,评分3的样本予以忽略。数据集中类别1代表负样本,类别2代表正样本。每个类别分别设有180万条训练样本与20万条测试样本。<br>训练集文件train.csv与测试集文件test.csv以逗号分隔值(Comma-Separated Values, CSV)格式存储全部数据样本,文件中共包含3列,依次对应类别索引(1或2)、评论标题与评论正文内容。评论标题与正文均采用双引号进行转义,若内容内部包含双引号,则需通过两个连续双引号进行转义;换行符则通过反斜杠拼接小写字母n,即"
",进行转义。
提供机构:
figshare
创建时间:
2020-11-13
搜集汇总
数据集介绍

背景与挑战
背景概述
Amazon Review Polarity数据集是一个用于文本分类的基准数据集,包含来自亚马逊的评论数据,通过评分将评论分类为正面和负面。每个类别有180万条训练样本和20万条测试样本,数据以CSV格式提供,包含类别索引、评论标题和评论文本。
以上内容由遇见数据集搜集并总结生成



