ibm/claim_stance

Hugging Face2023-11-15 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/ibm/claim_stance

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含2,394个标注的Wikipedia声明，涉及55个主题。每个声明对主题的立场（支持/反对）以及基于语义模型的细粒度注释（主题目标、主题情感、声明目标、声明情感以及目标之间的关系）。数据集分为训练集（25个主题，1,039个声明）和测试集（30个主题，1,355个声明）。此外，还有一个子集Claim Stance Topic，用于主题分类任务。数据集的结构包括主题ID、分割、主题文本、主题目标、主题情感、声明ID、声明立场、声明修正文本、声明原始文本、声明兼容性等字段。数据集的许可信息包括Wikipedia和IBM的版权声明，以及CC-BY-SA 3.0许可。

This dataset contains 2,394 annotated Wikipedia claims spanning 55 topics. Each claim is annotated with its stance towards the corresponding topic (support/oppose), as well as fine-grained semantic model-based annotations including topic target, topic sentiment, claim target, claim sentiment, and the relationship between targets. The dataset is split into a training set (1,039 claims across 25 topics) and a test set (1,355 claims across 30 topics). Additionally, a subset named Claim Stance Topic is provided for topic classification tasks. The dataset structure includes fields such as topic ID, split, topic text, topic target, topic sentiment, claim ID, claim stance, claim revised text, claim original text, and claim compatibility. The licensing information of the dataset includes copyright statements from Wikipedia and IBM, as well as the CC-BY-SA 3.0 license.

提供机构：

ibm

原始信息汇总

数据集概述

数据集摘要

Claim Stance

该数据集包含2,394条标记的维基百科声明，涉及55个主题。每条声明包含对主题的立场（支持/反对）以及基于Stance Classification of Context-Dependent Claims模型的细粒度注释（主题目标、主题对目标的情感、声明目标、声明对目标的情感以及目标之间的关系）。

数据集分为训练集（25个主题，1,039条声明）和测试集（30个主题，1,355条声明）。

Claim Stance Topic

该子集仅包含与主题（列label）关联的声明（列text），并采用不同的训练-验证-测试分割，可用于主题分类任务。

数据集结构

topicId - 内部主题ID
split - 训练或测试
topicText - 主题文本
topicTarget - 主题情感目标
topicSentiment - 主题对目标的情感（1:积极/-1:消极）
claims.claimId - 声明内部ID
claims.stance - 支持或反对
claims.claimCorrectedText - 声明的修正版本
claims.claimOriginalText - 声明的原始版本
claims.Compatible - 声明是否与Stance Classification of Context-Dependent Claims模型的语义模型兼容（是/否）

仅对“兼容”声明指定以下细粒度注释：

claims.claimTarget.text - 声明情感目标文本（在声明的修正版本中）
claims.claimTarget.span.start - 0
claims.claimTarget.span.end - 31
claims.claimSentiment - 声明对目标的情感（1:积极/-1:消极）
claims.targetsRelation - 声明目标与主题目标之间的关系（1:一致/-1:对比）

许可信息

数据集发布在以下许可和版权条款下：

（c）版权所有Wikipedia
（c）版权所有IBM 2014。在CC-BY-SA 3.0下发布

引用信息

如果使用此数据集，请引用以下论文：

@inproceedings{bar-haim-etal-2017-stance, title = "Stance Classification of Context-Dependent Claims", author = "Bar-Haim, Roy and Bhattacharya, Indrajit and Dinuzzo, Francesco and Saha, Amrita and Slonim, Noam", editor = "Lapata, Mirella and Blunsom, Phil and Koller, Alexander", booktitle = "Proceedings of the 15th Conference of the {E}uropean Chapter of the Association for Computational Linguistics: Volume 1, Long Papers", month = apr, year = "2017", address = "Valencia, Spain", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/E17-1024", pages = "251--261", abstract = "Recent work has addressed the problem of detecting relevant claims for a given controversial topic. We introduce the complementary task of Claim Stance Classification, along with the first benchmark dataset for this task. We decompose this problem into: (a) open-domain target identification for topic and claim (b) sentiment classification for each target, and (c) open-domain contrast detection between the topic and the claim targets. Manual annotation of the dataset confirms the applicability and validity of our model. We describe an implementation of our model, focusing on a novel algorithm for contrast detection. Our approach achieves promising results, and is shown to outperform several baselines, which represent the common practice of applying a single, monolithic classifier for stance classification.", }

在此数据集上改进的立场分类结果发表在：

@inproceedings{bar-haim-etal-2017-improving, title = "Improving Claim Stance Classification with Lexical Knowledge Expansion and Context Utilization", author = "Bar-Haim, Roy and Edelstein, Lilach and Jochim, Charles and Slonim, Noam", editor = "Habernal, Ivan and Gurevych, Iryna and Ashley, Kevin and Cardie, Claire and Green, Nancy and Litman, Diane and Petasis, Georgios and Reed, Chris and Slonim, Noam and Walker, Vern", booktitle = "Proceedings of the 4th Workshop on Argument Mining", month = sep, year = "2017", address = "Copenhagen, Denmark", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/W17-5104", doi = "10.18653/v1/W17-5104", pages = "32--38", abstract = "Stance classification is a core component in on-demand argument construction pipelines. Previous work on claim stance classification relied on background knowledge such as manually-composed sentiment lexicons. We show that both accuracy and coverage can be significantly improved through automatic expansion of the initial lexicon. We also developed a set of contextual features that further improves the state-of-the-art for this task.", }

备注

(1) 声明注释和Stance Classification of Context-Dependent Claims及Improving Claim Stance Classification with Lexical Knowledge Expansion and Context Utilization中报告的实验基于声明的修正版本。原始版本是声明在清理后的文章中的原始形式，未经进一步编辑。

(2) 主题和声明部分与CE-EMNLP-2015数据集重叠。

搜集汇总

数据集介绍

构建方式

在自然语言处理领域，立场分类任务旨在识别文本对特定话题的支持或反对倾向。Claim Stance数据集的构建基于语义模型框架，从维基百科中提取了55个争议性话题下的2394条主张。这些主张经过精细标注，不仅标注了立场（支持或反对），还依据目标识别、情感分析和目标关系等维度进行了多层次语义标注。数据划分遵循主题分层原则，训练集涵盖25个话题的1039条主张，测试集则包含30个话题的1355条主张，确保了话题分布的独立性。

特点

该数据集的核心特征在于其多层次语义标注体系。每条主张不仅包含立场标签，还标注了话题目标、话题情感倾向、主张目标、主张情感倾向以及目标间关系等细粒度信息。这种结构化标注为模型提供了丰富的语义线索，有助于深入理解上下文依赖的立场表达机制。数据集包含原始文本和修正版本的双重表述，为研究文本规范化对立场分类的影响提供了基础。话题分类子集进一步扩展了数据集的适用场景，支持跨任务迁移学习。

使用方法

研究人员可通过HuggingFace平台直接加载数据集的两个配置版本：claim_stance配置适用于立场分类任务，包含训练集和测试集；claim_stance_topic配置则提供训练-验证-测试划分，专用于话题分类研究。使用时可调用原始文本或修正文本作为输入特征，结合细粒度标注信息构建多任务学习框架。数据集的标准化格式支持直接应用于主流深度学习库，其分层划分策略确保了模型评估的可靠性，为立场检测、论辩挖掘等研究方向提供了基准测试平台。

背景与挑战

背景概述

在自然语言处理领域，立场分类作为论点挖掘的核心任务，旨在识别文本对特定议题的支持或反对倾向。IBM研究团队于2017年发布的Claim Stance数据集，聚焦于上下文相关主张的立场分析，涵盖了55个争议性主题下的2394条维基百科主张标注。该数据集基于《立场分类的语义模型》理论框架，不仅标注了主张与主题间的宏观立场（支持/反对），还提供了细粒度的情感目标、情感极性及目标间关系等深层语义信息。其构建推动了计算论辩学的发展，为自动化论点构建系统提供了关键数据支撑，成为该领域的重要基准资源。

当前挑战

该数据集致力于解决立场分类任务的固有挑战：主张的立场往往高度依赖上下文，且同一主张在不同议题背景下可能呈现截然不同的倾向性，这要求模型具备深层的语义推理与语境理解能力。在构建过程中，研究人员面临多重困难：首先，从维基百科中提取并规范化争议性主张需克服文本噪声与表述多样性问题；其次，细粒度标注如情感目标识别与目标关系判定依赖复杂的人工语义解析，一致性保障成本高昂；此外，数据集中部分主张与主题的关联性较弱，增加了模型区分相关性与立场倾向的难度。这些挑战共同凸显了立场分类在自然语言理解中的复杂性。

常用场景

经典使用场景

在自然语言处理领域，立场分类任务旨在识别文本对特定话题的支持或反对倾向。Claim Stance数据集作为该任务的基准资源，其经典使用场景聚焦于训练和评估立场分类模型。研究者利用该数据集中的2,394条标注数据，涵盖55个话题的Pro/Con立场，通过精细的语义标注如话题目标、情感极性及目标间关系，构建能够理解上下文依赖型主张的机器学习系统。这一场景推动了模型在复杂语境下对立场细微差别的捕捉能力。

实际应用

在实际应用中，Claim Stance数据集支撑了自动化论辩系统的开发，尤其在需要实时生成或评估主张的平台上。例如，在社交媒体内容审核、新闻事实核查及在线辩论辅助工具中，该数据集训练的模型能够自动识别用户发言对争议话题的立场，帮助过滤误导性信息或构建平衡的论点摘要。此外，它还可用于教育技术领域，辅助学生理解复杂议题的多角度论述，提升批判性思维能力。

衍生相关工作

基于Claim Stance数据集，多项经典研究工作得以衍生。原始论文《Stance Classification of Context-Dependent Claims》首次提出了分解式语义模型，而后续研究《Improving Claim Stance Classification with Lexical Knowledge Expansion and Context Utilization》则通过自动扩展情感词典和引入上下文特征，进一步提升了分类性能。这些工作共同奠定了立场分类领域的方法论基础，并激发了更多关于多目标情感分析和对比检测的探索，推动了自然语言处理中细粒度语义理解的发展。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集