maximoss/mnli-nineeleven-fr-mt

Name: maximoss/mnli-nineeleven-fr-mt
Creator: maximoss
Published: 2024-02-04 12:38:08
License: 暂无描述

Hugging Face2024-02-04 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/maximoss/mnli-nineeleven-fr-mt

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含2000个关于9/11恐怖袭击事件的机器翻译法语例子，源自MultiNLI数据集。这些例子不同于XNLI数据集中的验证和测试子集。数据集用于自然语言推理任务，涉及句子对的分类，包括前提、假设和分类标签等字段。数据集的翻译使用了最新的神经机器翻译模型。

提供机构：

maximoss

原始信息汇总

数据集概述

数据集描述

语言: 法语 (fr)
大小: 1,000 < n < 10,000
许可证: BSD-2-Clause
任务类别:
- 文本分类
- 自然语言推理
- 多输入文本分类

数据集总结

本数据集包含2000个关于9/11恐怖袭击事件的机器翻译法语版本的多NLI数据。这些例子与XNLI验证集和测试集中的例子不同。原始数据集中的26个无黄金标签的例子在本法语版本中已被赋予黄金标签。

支持的任务和排行榜

该数据集用于自然语言推理（NLI）任务，也称为识别文本蕴含（RTE），这是一个句子对分类任务。

数据集结构

数据字段

premise: 目标语言中的机器翻译前提。
hypothesis: 目标语言中的机器翻译假设。
label: 分类标签，可能值为0（蕴含）、1（中性）、2（矛盾）。
label_text: 分类标签，可能值为entailment（0）、neutral（1）、contradiction（2）。
pairID: 对唯一标识符。
promptID: 提示唯一标识符。
premise_original: 英语源数据集中的原始前提。
hypothesis_original: 英语源数据集中的原始假设。

数据分割

名称	蕴含	中性	矛盾
mnli_fr	705	641	654

数据集创建

数据集使用最新的神经机器翻译模型opus-mt-tc-big从英语机器翻译到法语。翻译工作于2023年3月29日完成。

引用信息

BibTeX:

BibTeX @InProceedings{N18-1101, author = "Williams, Adina and Nangia, Nikita and Bowman, Samuel", title = "A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference", booktitle = "Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)", year = "2018", publisher = "Association for Computational Linguistics", pages = "1112--1122", location = "New Orleans, Louisiana", url = "http://aclweb.org/anthology/N18-1101" }

ACL:

Adina Williams, Nikita Nangia, and Samuel Bowman. 2018. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122, New Orleans, Louisiana. Association for Computational Linguistics.

5,000+

优质数据集

54 个

任务类型

进入经典数据集