maximoss/rte3-multi
收藏数据集卡片
数据集描述
数据集概述
该仓库包含RTE-3数据集的所有手动翻译版本,以及原始的英语版本。RTE-3数据集已翻译成意大利语(2012年)、德语(2013年)和法语(2023年)。与其它仓库不同,我们的法语版本以及较早的意大利语和德语版本在这里都标注为3个类别(蕴含、中性、矛盾),而不是2个(蕴含、非蕴含)。
如果只想使用这里提供的特定语言的数据集,可以通过选择您希望的语言列值来过滤数据。
支持的任务和排行榜
该数据集可用于自然语言推理(NLI)任务,也称为识别文本蕴含(RTE),这是一个句子对分类任务。
数据集结构
数据字段
id: 索引号。language: 相关句子对的语言。premise: 目标语言中的翻译前提。hypothesis: 目标语言中的翻译假设。label: 分类标签,可能的值为0(蕴含)、1(中性)、2(矛盾)。label_text: 分类标签,可能的值为entailment(0)、neutral(1)、contradiction(2)。task: 数据所来自的特定NLP任务(信息提取、信息检索、问答和摘要)。length: 句子对文本的长度。
数据分割
| 名称 | 开发集 | 测试集 |
|---|---|---|
| 所有语言 | 3200 | 3200 |
| 法语 | 800 | 800 |
| 德语 | 800 | 800 |
| 意大利语 | 800 | 800 |
| 英语 | 800 | 800 |
对于法语RTE-3:
| 名称 | 蕴含 | 中性 | 矛盾 |
|---|---|---|---|
| 开发集 | 412 | 299 | 89 |
| 测试集 | 410 | 318 | 72 |
| 名称 | 短 | 长 |
|---|---|---|
| 开发集 | 665 | 135 |
| 测试集 | 683 | 117 |
| 名称 | IE | IR | QA | SUM |
|---|---|---|---|---|
| 开发集 | 200 | 200 | 200 | 200 |
| 测试集 | 200 | 200 | 200 | 200 |
附加信息
引用信息
BibTeX:
BibTeX @inproceedings{skandalis-etal-2024-new-datasets, title = "New Datasets for Automatic Detection of Textual Entailment and of Contradictions between Sentences in {F}rench", author = "Skandalis, Maximos and Moot, Richard and Retor{e}, Christian and Robillard, Simon", editor = "Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen", booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)", month = may, year = "2024", address = "Torino, Italy", publisher = "ELRA and ICCL", url = "https://aclanthology.org/2024.lrec-main.1065", pages = "12173--12186", abstract = "This paper introduces DACCORD, an original dataset in French for automatic detection of contradictions between sentences. It also presents new, manually translated versions of two datasets, namely the well known dataset RTE3 and the recent dataset GQNLI, from English to French, for the task of natural language inference / recognising textual entailment, which is a sentence-pair classification task. These datasets help increase the admittedly limited number of datasets in French available for these tasks. DACCORD consists of 1034 pairs of sentences and is the first dataset exclusively dedicated to this task and covering among others the topic of the Russian invasion in Ukraine. RTE3-FR contains 800 examples for each of its validation and test subsets, while GQNLI-FR is composed of 300 pairs of sentences and focuses specifically on the use of generalised quantifiers. Our experiments on these datasets show that they are more challenging than the two already existing datasets for the mainstream NLI task in French (XNLI, FraCaS). For languages other than English, most deep learning models for NLI tasks currently have only XNLI available as a training set. Additional datasets, such as ours for French, could permit different training and evaluation strategies, producing more robust results and reducing the inevitable biases present in any single dataset.", }
@inproceedings{giampiccolo-etal-2007-third, title = "The Third {PASCAL} Recognizing Textual Entailment Challenge", author = "Giampiccolo, Danilo and Magnini, Bernardo and Dagan, Ido and Dolan, Bill", booktitle = "Proceedings of the {ACL}-{PASCAL} Workshop on Textual Entailment and Paraphrasing", month = jun, year = "2007", address = "Prague", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/W07-1401", pages = "1--9", }
ACL:
Maximos Skandalis, Richard Moot, Christian Retoré, and Simon Robillard. 2024. New Datasets for Automatic Detection of Textual Entailment and of Contradictions between Sentences in French. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 12173–12186, Torino, Italy. ELRA and ICCL.
And
Danilo Giampiccolo, Bernardo Magnini, Ido Dagan, and Bill Dolan. 2007. The Third PASCAL Recognizing Textual Entailment Challenge. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, pages 1–9, Prague. Association for Computational Linguistics.




