five

maximoss/gqnli-fr

收藏
Hugging Face2024-05-18 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/maximoss/gqnli-fr
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含一个手动翻译的法语版本的GQNLI挑战数据集,原数据集为英文。GQNLI是一个评估语料库,旨在测试语言模型的广义量词推理能力。该数据集可用于自然语言推理(NLI)任务,也称为识别文本蕴含(RTE),这是一个句子对分类任务。数据集包含翻译的前提和假设,以及原始英文的前提和假设,标签包括蕴含、中立和矛盾。数据集的测试集包含97个蕴含、100个中立和103个矛盾的例子。
提供机构:
maximoss
原始信息汇总

数据集概述

数据集描述

  • 名称: GQNLI-French
  • 语言: 法语 (fr)
  • 许可: CC-BY-4.0
  • 任务类别:
    • 文本分类
    • 自然语言推理
    • 多输入文本分类
  • 数据集大小: 小于1000条记录 (n<1K)

数据集总结

本数据集是GQNLI挑战数据集的法语手动翻译版本,原数据集为英语。GQNLI旨在测试语言模型的一般量化推理能力。

支持的任务和排行榜

该数据集适用于自然语言推理(NLI)任务,也称为识别文本蕴含(RTE),这是一个句子对分类任务。

数据集结构

数据字段

  • uid: 索引号。
  • premise: 目标语言中的翻译前提。
  • hypothesis: 目标语言中的翻译假设。
  • label: 分类标签,可能值为0(蕴含)、1(中性)、2(矛盾)。
  • label_text: 分类标签,可能值为entailment(0)、neutral(1)、contradiction(2)。
  • premise_original: 英语源数据集中的原始前提。
  • hypothesis_original: 英语源数据集中的原始假设。

数据分割

名称 蕴含 中性 矛盾
测试 97 100 103

引用信息

BibTeX @inproceedings{skandalis-etal-2024-new-datasets, title = "New Datasets for Automatic Detection of Textual Entailment and of Contradictions between Sentences in {F}rench", author = "Skandalis, Maximos and Moot, Richard and Retor{e}, Christian and Robillard, Simon", editor = "Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen", booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)", month = may, year = "2024", address = "Torino, Italy", publisher = "ELRA and ICCL", url = "https://aclanthology.org/2024.lrec-main.1065", pages = "12173--12186", abstract = "This paper introduces DACCORD, an original dataset in French for automatic detection of contradictions between sentences. It also presents new, manually translated versions of two datasets, namely the well known dataset RTE3 and the recent dataset GQNLI, from English to French, for the task of natural language inference / recognising textual entailment, which is a sentence-pair classification task. These datasets help increase the admittedly limited number of datasets in French available for these tasks. DACCORD consists of 1034 pairs of sentences and is the first dataset exclusively dedicated to this task and covering among others the topic of the Russian invasion in Ukraine. RTE3-FR contains 800 examples for each of its validation and test subsets, while GQNLI-FR is composed of 300 pairs of sentences and focuses specifically on the use of generalised quantifiers. Our experiments on these datasets show that they are more challenging than the two already existing datasets for the mainstream NLI task in French (XNLI, FraCaS). For languages other than English, most deep learning models for NLI tasks currently have only XNLI available as a training set. Additional datasets, such as ours for French, could permit different training and evaluation strategies, producing more robust results and reducing the inevitable biases present in any single dataset.", }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作