five

SemEval-2020 Task 5 Dataset

收藏
arXiv2025-09-30 收录
下载链接:
https://github.com/ThilinaRajapakse/simpletransformers
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含了一种称为反事实的文本表达,这些表达描述了未曾发生或可能不会发生的事件。数据集在子任务1上的负样本与正样本的比例高达88:12,为了解决这一高度不平衡的问题,研究尝试了多种方法,包括过采样、SMOTE算法、欠采样以及加权交叉熵损失。具体规模上,子任务1拥有13,000个示例,子任务2有3,500个示例;而在测试集方面,子任务1有7,000个句子,子任务2有1,950个句子。该数据集的任务是检测反事实陈述,并将其解析为前提和结果。

This dataset contains textual expressions termed counterfactuals, which describe events that have not occurred or may not have occurred. The ratio of negative to positive samples in Subtask 1 reaches 88:12. To address this severe class imbalance problem, various methods have been explored, including oversampling, the SMOTE algorithm, undersampling, and weighted cross-entropy loss. Regarding the dataset scale, Subtask 1 comprises 13,000 instances, while Subtask 2 contains 3,500 instances. For the test splits, Subtask 1 includes 7,000 sentences, and Subtask 2 has 1,950 sentences. The core task of this dataset is to detect counterfactual statements and parse them into their antecedents and consequents.
提供机构:
SemEval-2020 Task Organisers
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作