ibm/claim_stance
收藏数据集概述
数据集摘要
Claim Stance
该数据集包含2,394条标记的维基百科声明,涉及55个主题。每条声明包含对主题的立场(支持/反对)以及基于Stance Classification of Context-Dependent Claims模型的细粒度注释(主题目标、主题对目标的情感、声明目标、声明对目标的情感以及目标之间的关系)。
数据集分为训练集(25个主题,1,039条声明)和测试集(30个主题,1,355条声明)。
Claim Stance Topic
该子集仅包含与主题(列label)关联的声明(列text),并采用不同的训练-验证-测试分割,可用于主题分类任务。
数据集结构
topicId- 内部主题IDsplit- 训练或测试topicText- 主题文本topicTarget- 主题情感目标topicSentiment- 主题对目标的情感(1:积极/-1:消极)claims.claimId- 声明内部IDclaims.stance- 支持或反对claims.claimCorrectedText- 声明的修正版本claims.claimOriginalText- 声明的原始版本claims.Compatible- 声明是否与Stance Classification of Context-Dependent Claims模型的语义模型兼容(是/否)
仅对“兼容”声明指定以下细粒度注释:
claims.claimTarget.text- 声明情感目标文本(在声明的修正版本中)claims.claimTarget.span.start- 0claims.claimTarget.span.end- 31claims.claimSentiment- 声明对目标的情感(1:积极/-1:消极)claims.targetsRelation- 声明目标与主题目标之间的关系(1:一致/-1:对比)
许可信息
数据集发布在以下许可和版权条款下:
- (c)版权所有Wikipedia
- (c)版权所有IBM 2014。在CC-BY-SA 3.0下发布
引用信息
如果使用此数据集,请引用以下论文:
@inproceedings{bar-haim-etal-2017-stance, title = "Stance Classification of Context-Dependent Claims", author = "Bar-Haim, Roy and Bhattacharya, Indrajit and Dinuzzo, Francesco and Saha, Amrita and Slonim, Noam", editor = "Lapata, Mirella and Blunsom, Phil and Koller, Alexander", booktitle = "Proceedings of the 15th Conference of the {E}uropean Chapter of the Association for Computational Linguistics: Volume 1, Long Papers", month = apr, year = "2017", address = "Valencia, Spain", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/E17-1024", pages = "251--261", abstract = "Recent work has addressed the problem of detecting relevant claims for a given controversial topic. We introduce the complementary task of Claim Stance Classification, along with the first benchmark dataset for this task. We decompose this problem into: (a) open-domain target identification for topic and claim (b) sentiment classification for each target, and (c) open-domain contrast detection between the topic and the claim targets. Manual annotation of the dataset confirms the applicability and validity of our model. We describe an implementation of our model, focusing on a novel algorithm for contrast detection. Our approach achieves promising results, and is shown to outperform several baselines, which represent the common practice of applying a single, monolithic classifier for stance classification.", }
在此数据集上改进的立场分类结果发表在:
@inproceedings{bar-haim-etal-2017-improving, title = "Improving Claim Stance Classification with Lexical Knowledge Expansion and Context Utilization", author = "Bar-Haim, Roy and Edelstein, Lilach and Jochim, Charles and Slonim, Noam", editor = "Habernal, Ivan and Gurevych, Iryna and Ashley, Kevin and Cardie, Claire and Green, Nancy and Litman, Diane and Petasis, Georgios and Reed, Chris and Slonim, Noam and Walker, Vern", booktitle = "Proceedings of the 4th Workshop on Argument Mining", month = sep, year = "2017", address = "Copenhagen, Denmark", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/W17-5104", doi = "10.18653/v1/W17-5104", pages = "32--38", abstract = "Stance classification is a core component in on-demand argument construction pipelines. Previous work on claim stance classification relied on background knowledge such as manually-composed sentiment lexicons. We show that both accuracy and coverage can be significantly improved through automatic expansion of the initial lexicon. We also developed a set of contextual features that further improves the state-of-the-art for this task.", }
备注
(1) 声明注释和Stance Classification of Context-Dependent Claims及Improving Claim Stance Classification with Lexical Knowledge Expansion and Context Utilization中报告的实验基于声明的修正版本。原始版本是声明在清理后的文章中的原始形式,未经进一步编辑。
(2) 主题和声明部分与CE-EMNLP-2015数据集重叠。




