KETI-AIR/kor_snli
收藏Hugging Face2023-11-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/KETI-AIR/kor_snli
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- ko
license: cc-by-4.0
size_categories:
- 100K<n<1M
task_categories:
- text-classification
task_ids:
- natural-language-inference
- multi-input-text-classification
dataset_info:
features:
- name: premise
dtype: string
- name: hypothesis
dtype: string
- name: label
dtype:
class_label:
names:
'0': entailment
'1': neutral
'2': contradiction
- name: data_index_by_user
dtype: int32
splits:
- name: train
num_bytes: 85943643
num_examples: 550152
- name: validation
num_bytes: 1631544
num_examples: 10000
- name: test
num_bytes: 1638084
num_examples: 10000
download_size: 27268480
dataset_size: 89213271
---
# Dataset Card for QASC
## Licensing Information
The data is distributed under the [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) license.
## Source Data Citation INformation
```
@inproceedings{snli:emnlp2015,
Author = {Bowman, Samuel R. and Angeli, Gabor and Potts, Christopher, and Manning, Christopher D.},
Booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
Publisher = {Association for Computational Linguistics},
Title = {A large annotated corpus for learning natural language inference},
Year = {2015}
}
```
---
语言:
- 韩语(ko)
许可协议:cc-by-4.0
样本规模区间:
- 10万条 < n < 100万条
任务类别:
- 文本分类
任务子类型:
- 自然语言推理(natural-language-inference)
- 多输入文本分类(multi-input-text-classification)
数据集信息:
特征:
- 名称:前提(premise),数据类型:字符串
- 名称:假设(hypothesis),数据类型:字符串
- 名称:标签(label),数据类型为类别标签:
类别名称:
'0': 蕴含(entailment)
'1': 中性(neutral)
'2': 矛盾(contradiction)
- 名称:用户数据索引(data_index_by_user),数据类型:int32
拆分集:
- 名称:训练集(train),字节数:85943643,样本数量:550152
- 名称:验证集(validation),字节数:1631544,样本数量:10000
- 名称:测试集(test),字节数:1638084,样本数量:10000
下载大小:27268480
数据集总大小:89213271
---
# QASC数据集卡片
## 许可协议说明
本数据集采用[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)许可协议发布。
## 源数据引用信息
@inproceedings{snli:emnlp2015,
作者 = {Bowman, Samuel R. and Angeli, Gabor and Potts, Christopher, and Manning, Christopher D.},
会议文集 = {2015年经验方法自然语言处理会议(EMNLP)论文集},
出版者 = {计算语言学协会(Association for Computational Linguistics)},
标题 = {用于自然语言推理学习的大型标注语料库(A large annotated corpus for learning natural language inference)},
年份 = {2015}
}
提供机构:
KETI-AIR
原始信息汇总
数据集概述
基本信息
- 语言: 韩语
- 许可证: CC BY 4.0
- 数据规模: 100K<n<1M
任务类别
- 文本分类
- 自然语言推理
- 多输入文本分类
数据集结构
特征
- premise: 字符串类型
- hypothesis: 字符串类型
- label: 分类标签
- 0: entailment
- 1: neutral
- 2: contradiction
- data_index_by_user: 32位整数类型
分割
- 训练集
- 字节数: 85943643
- 样本数: 550152
- 验证集
- 字节数: 1631544
- 样本数: 10000
- 测试集
- 字节数: 1638084
- 样本数: 10000
大小
- 下载大小: 27268480 字节
- 数据集大小: 89213271 字节



