five

E-Commerce Book Review(ECBR)

收藏
科学数据银行2023-11-10 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=7f6740bd88a84e17a63d6e46ca53a194
下载链接
链接失效反馈
官方服务:
资源简介:
The data source is the book review data collected from the e-commerce platform (Jingdong), the professional book review website (Douban Books), and the book buying platform (Dangdang). Jingdong e-commerce platform, as one of the largest shopping platforms in China, has a large number of high-quality book review data in its book column. Douban Books is a well-known book review website in China, and its book reviews are well received by the public. Dangdang book platform is a well-known book buying platform in China, with a large number of books, rich classification and variety, and all-encompassing book reviews, which have good research value [23].In this paper, the fields including book name, review data and number of likes on the three platforms are obtained by means of web crawler. A total of 115,000 pieces of data were crawled, and useless data such as less effective information, duplicate content, false information, advertising information, and too long or too short were eliminated. A total of 80125 pieces of valid data remained. The data set is divided into training set, test set and verification set, according to the ratio of 8:1:1. With reference to the method proposed by Zhang Yifei et al. [24], this paper makes the following improvements to the method of obtaining standard abstracts: For each book review, the review with the highest number of likes is identified as the ideal review; according to the number of reviews under each book, and the ratio of ideal review: book review = 1:10, the top 10% of the comments with the number of likes is selected as the ideal review. Each review of the book is delimited by delimiters such as commas, periods, question marks, and exclamation points. Since the ideal review is an objective and fair review that indicates the main theme of the book and can arouse the emotional resonance of users, the review itself as a whole is rich in semantic information of emotion and theme, so in the similarity calculation, the ideal review as a whole does not need clauses, and the sentence of each comment clause is calculated with the ideal review. The top 10% of sentences are selected as the standard summary of the review. In this paper, the cosine values between sentence vectors are used as the rules for calculating the similarity between sentences.
提供机构:
北京信息科技大学; Beijing Information Science & Technology University
创建时间:
2023-08-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作