five

Amazon movie reviews

收藏
DataCite Commons2025-04-01 更新2025-04-16 收录
下载链接:
https://data.mendeley.com/datasets/kb5nv7dbtm
下载链接
链接失效反馈
官方服务:
资源简介:
Secondary Data. Primary Data could be found here: https://snap.stanford.edu/data/web-Movies.html File allReviews.csv consists of 7911684 movie reviews from amazon. It has removed the profile name of the reviewer, the review-summary, and the review-text from the primary data. The data span a period of more than 10 years, including all up to October 2012. Each row contains 6 fields: 1. Product ID (e.g. B003AI2VGA) 2. User ID (e.g. A141HP4LYPWMSR) 3. Count of thumb-ups received by this review (e.g. 7) 4. Total thumb count of this review (sum of thumb-ups and thumb-downs, e.g. 7) 5. Given rating in a discrete likert scale of 1 to 5(e.g. 3) 6. Time of the review (unix time: e.g. 1182729600) For example, a sample row from this file is: B003AI2VGA,A1I7QGUDP043DG,8,10,5,1164844800 File Reviewers50plus.csv, contains the user ID of all (16341) the reviewers with more than 50 reviews each. File MovieID177k.csv, contains the product ID of all (177111) the movies reviewed by the reviewers with more than 50 reviews. File Set2userid2000.csv, contains the user ID of 2000 reviewers who have the largest thumb-up to the thumb-down difference from Reviewers50plus.csv. The four files in "Product Ratings" folder contains “product ratings” of all the movies from MovieID177k.csv derived using 4 different techniques. Each file consists of 2 columns: product ID and product rating. The 9 files in "Recommended Experts" folder contains 37 different sets of “recommended expert reviewers”. Each file contains 200 rows of user IDs. Primary data citation: "McAuley, J. J., & Leskovec, J. (2013, May). From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In Proceedings of the 22nd international conference on World Wide Web (pp. 897-908)."

二级数据集。原始数据集可通过以下链接获取:https://snap.stanford.edu/data/web-Movies.html。allReviews.csv文件包含7911684条来自亚马逊的电影评论,该数据集已从原始数据中移除了评论者的个人资料名称、评论摘要及评论正文。数据覆盖时长超过10年,涵盖至2012年10月的全部相关评论。每行数据包含6个字段:1. 产品标识符(Product ID,示例:B003AI2VGA);2. 用户标识符(User ID,示例:A141HP4LYPWMSR);3. 该评论获得的点赞数(示例:7);4. 该评论的总互动数(点赞与点踩之和,示例:7);5. 基于1至5分离散李克特量表(Likert scale)给出的评分(示例:3);6. 评论的Unix时间戳(Unix time,示例:1182729600)。例如该文件的一行样本数据为:B003AI2VGA,A1I7QGUDP043DG,8,10,5,1164844800。 Reviewers50plus.csv文件包含所有(共16341位)单用户评论数超过50条的评论者的用户标识符。MovieID177k.csv文件包含所有被上述评论数超50的用户所评论的(共177111部)电影的产品标识符。Set2userid2000.csv文件包含从Reviewers50plus.csv中筛选出的2000位点赞减点踩差值最高的评论者的用户标识符。 「Product Ratings」文件夹内的4个文件,包含基于4种不同技术从MovieID177k.csv中提取的所有电影的产品评分数据,每个文件均包含两列:产品标识符与产品评分。「Recommended Experts」文件夹内的9个文件,包含37组不同的「推荐专家评论者」数据,每个文件包含200行用户标识符。 原始数据集引用信息:"McAuley, J. J., & Leskovec, J. (2013, May). From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In Proceedings of the 22nd international conference on World Wide Web (pp. 897-908)."
提供机构:
Mendeley
创建时间:
2020-08-02
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集包含亚马逊平台上超过790万条电影评论,时间跨度超过10年,提供了丰富的用户评分和互动数据。数据集还包括高活跃度评论者信息和多种方式计算的产品评分,适用于数据挖掘和大数据分析。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作