SST (Stanford Sentiment Treebank)

OpenDataLab2026-03-29 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/SST

下载链接

链接失效反馈

资源简介：

斯坦福情感树库是一个带有完全标记的解析树的语料库，可以全面分析情感在语言中的构成影响。该语料库基于 Pang 和 Lee (2005) 引入的数据集，由从电影评论中提取的 11,855 个单句组成。它使用斯坦福解析器进行解析，包括来自这些解析树的总共 215,154 个独特的短语，每个短语由 3 名人类评委注释。每个短语被标记为负面、有点负面、中性、有点正面或正面。所有 5 个语料库标签被称为 SST-5 或 SST 细粒度。完整句子的二元分类实验（否定或有些否定与有些肯定或肯定，丢弃中性句子）将数据集称为 SST-2 或 SST 二进制。

The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that enables comprehensive analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005), and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford Parser, and includes a total of 215,154 unique phrases from these parse trees, each annotated by 3 human judges. Each phrase is labeled as negative, somewhat negative, neutral, somewhat positive, or positive. All five corpus labels are referred to as SST-5 or SST fine-grained. For binary classification experiments on full sentences (discarding neutral sentences, and classifying the remaining as either negative/somewhat negative or positive/somewhat positive), the dataset is referred to as SST-2 or SST binary.

提供机构：

OpenDataLab

创建时间：

2022-04-24

AI搜集汇总

数据集介绍

以上内容由AI搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集