five

cobie_sst2

收藏
魔搭社区2025-12-05 更新2025-02-01 收录
下载链接:
https://modelscope.cn/datasets/BSC-LT/cobie_sst2
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for cobie_sst2 This dataset is a modification of the original [SST-2](https://huggingface.co/datasets/stanfordnlp/sst2) dataset for LLM cognitive bias evaluation. ## Language(s) - English (`en`) ## Dataset Summary The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford parser and includes a total of 215,154 unique phrases from those parse trees, each annotated by 3 human judges. ## Dataset Structure The modifications carried out in the dataset are thought to evaluate cognitive biases in a few-shot setting and with two different task complexities. We make use of 25,000 instances from the original dataset, while the remaining ones serve as few-shot examples. Each instance is prompted with all possible unbalanced 4-shot distributions. To increase the original task complexity, we also introduce an additional neutral example between the first and last two examples. **Dataset Fields** - `idx`: original sentence id, in the format `<original_partition>_<original_id>`. - `sentence`: test sentence. - `label`: sentiment of the test sentence, either "negative" (`0`) or "positive" (`1`). - `dist`: few-shot distribution (`0000`, `1111`, `0001`, `0010`, `0100`, `1000`, `1110`, `1101`, `1011`, `0111`). - `shot<n>_idx`: original id of the example sentence, in the format `<original_partition>_<original_id>`. - `shot<n>_sent`: example sentence. - `shot<n>_label`: sentiment of the example sentence. - `few_shot_string`: string with all 4 shots the sentence is prompted with. - `few_shot_hard_string`: string with the same 4 shots and an additional neutral example between the first and last two to increase task complexity. ## Supported Tasks and Leaderboards - `sentiment-classification` ## Additional Information **Dataset Curators** Language Technologies Unit (LangTech) at the Barcelona Supercomputing Center. This work has been promoted and financed by the Generalitat de Catalunya through the [Aina](https://projecteaina.cat/) project. This work is also funded by the Ministerio para la Transformación Digital y de la Función Pública and Plan de Recuperación, Transformación y Resiliencia - Funded by EU – NextGenerationEU within the framework of the project Desarrollo Modelos ALIA. **Licensing Information** This work is licensed under a [MIT License](https://github.com/YJiangcm/Movielens1M-Movie-Recommendation-System/blob/main/LICENSE) (same as original). ## Citation Information ``` @inproceedings{cobie, title={Cognitive Biases, Task Complexity, and Result Intepretability in Large Language Models}, author={Mario Mina and Valle Ruiz-Fernández and Júlia Falcão and Luis Vasquez-Reina and Aitor Gonzalez-Agirre}, booktitle={Proceedings of The 31st International Conference on Computational Linguistics (COLING)}, year={2025 (to appear)} } ```

# 数据集卡片:cobie_sst2 本数据集为适配大语言模型(Large Language Model,LLM)认知偏差评估任务,对原始[SST-2](https://huggingface.co/datasets/stanfordnlp/sst2)数据集进行了修改。 ## 语言 - 英语(en) ## 数据集概览 斯坦福情感树库(Stanford Sentiment Treebank)是带有全标注句法树的语料库,可用于完整分析语言中情感的组合效应。该语料库基于Pang与Lee(2005)提出的数据集,包含从电影影评中提取的11855条单句。其使用斯坦福句法分析器完成句法解析,从生成的句法树中共提取出215154个独特短语,每个短语均由3名人类标注者进行标注。 ## 数据集结构 本数据集所做的修改旨在于少样本(Few-shot)设置下,以两种不同的任务复杂度评估认知偏差。我们从原始数据集中选取25000条样本作为测试实例,剩余样本则作为少样本示例。每条测试实例均搭配所有可能的非均衡4样本分布作为提示。为提升原任务的复杂度,我们还在首尾两组示例之间额外加入了一条中性样本。 **数据集字段** - `idx`:原始句子ID,格式为`<原始分区>_<原始ID>`。 - `sentence`:测试句子。 - `label`:测试句子的情感倾向,可选值为“负面(0)”或“正面(1)”。 - `dist`:少样本分布格式,可选值为`0000`、`1111`、`0001`、`0010`、`0100`、`1000`、`1110`、`1101`、`1011`、`0111`。 - `shot<n>_idx`:示例句子的原始ID,格式为`<原始分区>_<原始ID>`,其中`<n>`为样本序号。 - `shot<n>_sent`:示例句子内容。 - `shot<n>_label`:示例句子的情感倾向。 - `few_shot_string`:包含当前测试实例所搭配的全部4条少样本示例的提示字符串。 - `few_shot_hard_string`:包含相同4条少样本示例,并在首尾两组示例间额外加入中性样本以提升任务复杂度的提示字符串。 ## 支持任务与排行榜 - 情感分类(sentiment-classification) ## 补充信息 **数据集维护方** 巴塞罗那超级计算中心语言技术部门(Language Technologies Unit, LangTech)。 本工作由加泰罗尼亚政府通过[Aina](https://projecteaina.cat/)项目推动并资助,同时也受到西班牙数字化转型与公共职能部以及“恢复、转型与韧性计划”(由欧盟下一代欧盟(NextGenerationEU)资助)资助,相关项目为“ALIA模型开发”。 **许可信息** 本作品采用[MIT许可协议](https://github.com/YJiangcm/Movielens1M-Movie-Recommendation-System/blob/main/LICENSE)(与原始数据集一致)。 ## 引用信息 @inproceedings{cobie, title={大语言模型中的认知偏差、任务复杂度与结果可解释性}, author={Mario Mina、Valle Ruiz-Fernández、Júlia Falcão、Luis Vasquez-Reina、Aitor Gonzalez-Agirre}, booktitle={第31届国际计算语言学大会(COLING)论文集}, year={2025(待刊)} }
提供机构:
maas
创建时间:
2025-01-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作