AmaSum
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/abrazinskas/selsum
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为AmaSum,主要基于专业评论员为消费品所撰写的总结,涵盖了四个主要的专业产品评论平台。该数据集包含了关于产品的结论性观点、优缺点,概括了产品最重要的信息。在数据集的创建过程中,使用了HTML抓取程序,并包含了经过验证和未经验证的亚马逊评论。为了提高数据质量,预处理过程中移除了短评和总结,使得该数据集在规模上显著大于现有的其他选择,覆盖了超过31,000种产品,平均每种产品拥有320条评论。该数据集的任务是对观点进行总结。
This dataset, named AmaSum, is primarily built upon summaries of consumer products written by professional reviewers, covering four major professional product review platforms. It contains conclusive product-related opinions, as well as the advantages and disadvantages of the products, summarizing the most critical information about each item. During the dataset construction, HTML scraping programs were employed, and the dataset includes both verified and unverified Amazon reviews. To enhance data quality, short reviews and substandard summaries were removed during the preprocessing phase, which renders this dataset significantly larger in scale than existing alternatives. It covers over 31,000 products, with an average of 320 reviews per product. The core task associated with this dataset is opinion summarization.



