tommasobonomo/sem_augmented_fever_nli

Name: tommasobonomo/sem_augmented_fever_nli
Creator: tommasobonomo
Published: 2024-07-12 15:13:29
License: 暂无描述

Hugging Face2024-07-12 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/tommasobonomo/sem_augmented_fever_nli

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是基于FEVER数据集的一个随机下采样版本，专门为自然语言推理（NLI）任务进行了调整。数据集包含了前提（premise）和假设（hypothesis）对，并附有标签（label），标签映射自FEVER数据集中的支持（supports）、反驳（refutes）和信息不足（not enough info）三种情况。此外，数据集还增加了语义注释，包括词义消歧（WSD）和语义角色标注（SRL）信息。这些注释是通过AMuSE-WSD和InVeRo工具添加的。数据集被分为训练集、验证集和测试集，分别包含51086、2288和2287个样本。

This dataset is a random downsample of the FEVER dataset adapted for the Natural Language Inference (NLI) task. It includes premise-hypothesis pairs with labels mapped from the FEVER datasets supports, refutes, and not enough info categories. Additionally, the dataset is augmented with semantic annotations, including Word Sense Disambiguation (WSD) and Semantic Role Labeling (SRL) information, added using AMuSE-WSD and InVeRo tools. The dataset is divided into training, validation, and test sets, containing 51086, 2288, and 2287 samples respectively.

提供机构：

tommasobonomo

原始信息汇总

数据集概述

基本信息

名称: Semantically-augmented FEVER for NLI
语言: 英语
许可证: MIT
大小: 10K<n<100K

数据集特征

id: 字符串类型
premise: 字符串类型
hypothesis: 字符串类型
label: 字符串类型
wsd: 结构化数据，包含premise和hypothesis的详细语义信息，如索引、文本、词性、词形、语义集ID等
srl: 结构化数据，包含premise和hypothesis的语义角色标注信息，如词索引、原始文本、动词框架、角色、分数和跨度

数据集分割

训练集: 51086个样本，357653267字节
验证集: 2288个样本，15794078字节
测试集: 2287个样本，15736002字节

下载与数据集大小

下载大小: 77623798字节
数据集大小: 389183347字节

配置

默认配置: 包含训练、验证和测试数据的路径配置

数据集增强

使用AMuSE-WSD和InVeRo对整个数据集进行语义增强，包括Word Sense Disambiguation (WSD)和Semantic Role Labeling (SRL)信息。

5,000+

优质数据集

54 个

任务类型

进入经典数据集