zhili312/Textual-Natural-Contextual-Classification
收藏Hugging Face2023-10-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/zhili312/Textual-Natural-Contextual-Classification
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-classification
language:
- en
pretty_name: TNCC
size_categories:
- 1K<n<10K
---
Given the scarcity of datasets for understanding natural language in visual scenes, we introduce a novel textual entailment dataset, named Textual Natural Contextual Classification (TNCC).
This dataset is formulated on the foundation of Crisscrossed Captions (https://github.com/google-research-datasets/Crisscrossed-Captions), an image captioning dataset supplied with human-rated semantic similarity ratings on a continuous scale from 0 to 5.
We tailor the dataset to suit a binary classification task. Specifically, sentence pairs with annotation scores exceeding 4 are categorized as positive (entailment), whereas pairs with scores less than 1 are marked as negative (non-entailment).
The TNCC dataset is partitioned into training, validation, and testing sets, containing 3,600, 1,200, and 1,560 instances, respectively.
If you use this dataset for academic research, please cite the NeurIPS 2023 paper titled 'Back-Modality: Leveraging Modal Transformation for Data Augmentation'.
提供机构:
zhili312
原始信息汇总
数据集概述
数据集名称
- 名称: TNCC
数据集类型
- 类型: 文本蕴含数据集
任务类别
- 任务: 文本分类
语言
- 语言: 英语
数据集大小
- 大小: 1K<n<10K
数据集描述
- 描述: TNCC数据集基于Crisscrossed Captions图像字幕数据集构建,该数据集提供了人类评定的语义相似度评分,范围从0到5。TNCC数据集针对二分类任务进行了调整,具体来说,评分超过4的句子对被归类为正类(蕴含),而评分低于1的句子对被标记为负类(非蕴含)。
数据集划分
- 划分:
- 训练集: 3,600个实例
- 验证集: 1,200个实例
- 测试集: 1,560个实例
引用信息
- 引用: 如果使用该数据集进行学术研究,请引用NeurIPS 2023论文《Back-Modality: Leveraging Modal Transformation for Data Augmentation》。



