five

SST (Stanford Sentiment Treebank)

收藏
OpenDataLab2026-03-29 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/SST
下载链接
链接失效反馈
资源简介:
斯坦福情感树库是一个带有完全标记的解析树的语料库,可以全面分析情感在语言中的构成影响。该语料库基于 Pang 和 Lee (2005) 引入的数据集,由从电影评论中提取的 11,855 个单句组成。它使用斯坦福解析器进行解析,包括来自这些解析树的总共 215,154 个独特的短语,每个短语由 3 名人类评委注释。每个短语被标记为负面、有点负面、中性、有点正面或正面。所有 5 个语料库标签被称为 SST-5 或 SST 细粒度。完整句子的二元分类实验(否定或有些否定与有些肯定或肯定,丢弃中性句子)将数据集称为 SST-2 或 SST 二进制。

The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that enables comprehensive analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005), and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford Parser, and includes a total of 215,154 unique phrases from these parse trees, each annotated by 3 human judges. Each phrase is labeled as negative, somewhat negative, neutral, somewhat positive, or positive. All five corpus labels are referred to as SST-5 or SST fine-grained. For binary classification experiments on full sentences (discarding neutral sentences, and classifying the remaining as either negative/somewhat negative or positive/somewhat positive), the dataset is referred to as SST-2 or SST binary.
提供机构:
OpenDataLab
创建时间:
2022-04-24
AI搜集汇总
数据集介绍
main_image_url
以上内容由AI搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作