SemEval-2020 Task 5 Dataset

Name: SemEval-2020 Task 5 Dataset
Creator: SemEval-2020 Task Organisers
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/ThilinaRajapakse/simpletransformers

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含了一种称为反事实的文本表达，这些表达描述了未曾发生或可能不会发生的事件。数据集在子任务1上的负样本与正样本的比例高达88:12，为了解决这一高度不平衡的问题，研究尝试了多种方法，包括过采样、SMOTE算法、欠采样以及加权交叉熵损失。具体规模上，子任务1拥有13,000个示例，子任务2有3,500个示例；而在测试集方面，子任务1有7,000个句子，子任务2有1,950个句子。该数据集的任务是检测反事实陈述，并将其解析为前提和结果。

This dataset contains textual expressions termed counterfactuals, which describe events that have not occurred or may not have occurred. The ratio of negative to positive samples in Subtask 1 reaches 88:12. To address this severe class imbalance problem, various methods have been explored, including oversampling, the SMOTE algorithm, undersampling, and weighted cross-entropy loss. Regarding the dataset scale, Subtask 1 comprises 13,000 instances, while Subtask 2 contains 3,500 instances. For the test splits, Subtask 1 includes 7,000 sentences, and Subtask 2 has 1,950 sentences. The core task of this dataset is to detect counterfactual statements and parse them into their antecedents and consequents.

提供机构：

SemEval-2020 Task Organisers

5,000+

优质数据集

54 个

任务类型

进入经典数据集