zhili312/Textual-Natural-Contextual-Classification

Name: zhili312/Textual-Natural-Contextual-Classification
Creator: zhili312
Published: 2023-10-30 06:45:42
License: 暂无描述

Hugging Face2023-10-30 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/zhili312/Textual-Natural-Contextual-Classification

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - text-classification language: - en pretty_name: TNCC size_categories: - 1K<n<10K --- Given the scarcity of datasets for understanding natural language in visual scenes, we introduce a novel textual entailment dataset, named Textual Natural Contextual Classification (TNCC). This dataset is formulated on the foundation of Crisscrossed Captions (https://github.com/google-research-datasets/Crisscrossed-Captions), an image captioning dataset supplied with human-rated semantic similarity ratings on a continuous scale from 0 to 5. We tailor the dataset to suit a binary classification task. Specifically, sentence pairs with annotation scores exceeding 4 are categorized as positive (entailment), whereas pairs with scores less than 1 are marked as negative (non-entailment). The TNCC dataset is partitioned into training, validation, and testing sets, containing 3,600, 1,200, and 1,560 instances, respectively. If you use this dataset for academic research, please cite the NeurIPS 2023 paper titled 'Back-Modality: Leveraging Modal Transformation for Data Augmentation'.

提供机构：

zhili312

原始信息汇总

数据集概述

数据集名称

名称: TNCC

数据集类型

类型: 文本蕴含数据集

任务类别

任务: 文本分类

语言

语言: 英语

数据集大小

大小: 1K<n<10K

数据集描述

描述: TNCC数据集基于Crisscrossed Captions图像字幕数据集构建，该数据集提供了人类评定的语义相似度评分，范围从0到5。TNCC数据集针对二分类任务进行了调整，具体来说，评分超过4的句子对被归类为正类（蕴含），而评分低于1的句子对被标记为负类（非蕴含）。

数据集划分

划分:
- 训练集: 3,600个实例
- 验证集: 1,200个实例
- 测试集: 1,560个实例

引用信息

引用: 如果使用该数据集进行学术研究，请引用NeurIPS 2023论文《Back-Modality: Leveraging Modal Transformation for Data Augmentation》。

5,000+

优质数据集

54 个

任务类型

进入经典数据集