JSSICE/Multi-Domain-Sentiment-Dataset
收藏Hugging Face2022-12-04 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/JSSICE/Multi-Domain-Sentiment-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
Using it for assessment.
# Dataset for Multi Domain (Including Kitchen, Books, DVDs, and Electronics)
[Multi-Domain Sentiment Dataset](https://www.cs.jhu.edu/~mdredze/datasets/sentiment/index2.html) by John Blitzer, Mark Dredze, Fernando Pereira.
### Description:
The Multi-Domain Sentiment Dataset contains product reviews taken from Amazon.com from 4 product types (domains): Kitchen, Books, DVDs, and Electronics. Each domain has several thousand reviews, but the exact number varies by domain. Reviews contain star ratings (1 to 5 stars) that can be converted into binary labels if needed. This page contains some descriptions about the data. If you have questions, please email me directly (email found here).
A few notes regarding the data.
1) There are 4 directories corresponding to each of the four domains. Each directory contains 3 files called positive.review, negative.review and unlabeled.review. (The books directory doesn't contain the unlabeled but the link is below.) While the positive and negative files contain positive and negative reviews, these aren't necessarily the splits we used in the experiments. We randomly drew from the three files ignoring the file names.
2) Each file contains a pseudo XML scheme for encoding the reviews. Most of the fields are self explanatory. The reviews have a unique ID field that isn't very unique. If it has two unique id fields, ignore the one containing only a number.
### Link to download the data:
Multi-Domain Sentiment Dataset (30 MB) [domain_sentiment_data.tar.gz](https://www.cs.jhu.edu/~mdredze/datasets/sentiment/domain_sentiment_data.tar.gz)
Books unlabeled data (2 MB) [book.unlabeled.gz](https://www.cs.jhu.edu/~mdredze/datasets/sentiment/book.unlabeled.gz)
提供机构:
JSSICE
原始信息汇总
数据集概述
数据集名称
Multi-Domain Sentiment Dataset
数据集来源
由John Blitzer, Mark Dredze, Fernando Pereira创建,数据来源于Amazon.com。
数据集内容
包含来自四个不同产品类型(领域)的商品评论:
- 厨房用品
- 书籍
- DVD
- 电子产品
每个领域包含数千条评论,具体数量因领域而异。评论包含1至5星的评分,可根据需要转换为二元标签。
数据集结构
- 每个领域对应一个目录,包含三个文件:positive.review, negative.review, unlabeled.review。
- 书籍目录不包含unlabeled.review文件,但提供下载链接。
- 每个文件中的评论采用伪XML格式编码,包含易于理解的字段和一个不唯一ID字段。
数据集下载
- 主数据集:domain_sentiment_data.tar.gz,大小30MB。
- 书籍未标记数据:book.unlabeled.gz,大小2MB。



