fancyzhx/yelp_polarity
收藏数据集卡片 for "yelp_polarity"
数据集描述
数据集摘要
这是一个用于二元情感分类的大型Yelp评论数据集。数据集包含560,000条高度极性的训练评论和38,000条测试评论。数据集由Xiang Zhang从Yelp Dataset Challenge 2015中提取构建,并首次在以下论文中用作文本分类基准:
- 论文: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).
支持的任务和排行榜
语言
数据集结构
数据实例
plain_text
- 下载的数据集文件大小: 166.38 MB
- 生成的数据集大小: 441.74 MB
- 磁盘总使用量: 608.12 MB
训练集的一个示例如下:
json { "label": 0, "text": ""Unfortunately, the frustration of being Dr. Goldbergs patient is a repeat of the experience Ive had with so many other doctor..." }
数据字段
所有拆分中的数据字段相同。
plain_text
text: 一个string特征。label: 一个分类标签,可能的值包括1(0),2(1)。
数据拆分
| 名称 | 训练集 | 测试集 |
|---|---|---|
| plain_text | 560000 | 38000 |
数据集创建
策划理由
源数据
初始数据收集和规范化
源语言生产者是谁?
注释
注释过程
注释者是谁?
个人和敏感信息
使用数据的注意事项
数据集的社会影响
偏见的讨论
其他已知限制
附加信息
数据集策展人
许可信息
引用信息
bibtex @article{zhangCharacterlevelConvolutionalNetworks2015, archivePrefix = {arXiv}, eprinttype = {arxiv}, eprint = {1509.01626}, primaryClass = {cs}, title = {Character-Level {{Convolutional Networks}} for {{Text Classification}}}, abstract = {This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks.}, journal = {arXiv:1509.01626 [cs]}, author = {Zhang, Xiang and Zhao, Junbo and LeCun, Yann}, month = sep, year = {2015}, }
贡献
感谢@patrickvonplaten, @lewtun, @mariamabarham, @thomwolf, @julien-c 添加此数据集。




