five

KETI-AIR/kor_glue

收藏
Hugging Face2023-12-05 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/KETI-AIR/kor_glue
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: cola features: - name: data_index_by_user dtype: int32 - name: label dtype: int32 - name: sentence dtype: string splits: - name: train num_bytes: 569511 num_examples: 8551 - name: validation num_bytes: 72661 num_examples: 1043 - name: test num_bytes: 72979 num_examples: 1063 download_size: 381894 dataset_size: 715151 - config_name: mrpc features: - name: data_index_by_user dtype: int32 - name: sentence1 dtype: string - name: sentence2 dtype: string - name: label dtype: int32 - name: idx dtype: int32 splits: - name: train num_bytes: 1078522 num_examples: 3668 - name: validation num_bytes: 120306 num_examples: 408 - name: test num_bytes: 504069 num_examples: 1725 download_size: 1176356 dataset_size: 1702897 - config_name: qnli features: - name: data_index_by_user dtype: int32 - name: label dtype: int32 - name: question dtype: string - name: sentence dtype: string splits: - name: train num_bytes: 28343211 num_examples: 104743 - name: validation num_bytes: 1507016 num_examples: 5463 - name: test num_bytes: 1510880 num_examples: 5463 download_size: 21097078 dataset_size: 31361107 - config_name: qqp features: - name: data_index_by_user dtype: int32 - name: question1 dtype: string - name: question2 dtype: string - name: label dtype: int32 - name: idx dtype: int32 splits: - name: train num_bytes: 64564524 num_examples: 363846 download_size: 40798086 dataset_size: 64564524 - config_name: wnli features: - name: data_index_by_user dtype: int32 - name: sentence1 dtype: string - name: sentence2 dtype: string - name: label dtype: int32 - name: idx dtype: int32 splits: - name: train num_bytes: 132171 num_examples: 635 - name: validation num_bytes: 15331 num_examples: 71 - name: test num_bytes: 47430 num_examples: 146 download_size: 80151 dataset_size: 194932 configs: - config_name: cola data_files: - split: train path: cola/train-* - split: validation path: cola/validation-* - split: test path: cola/test-* - config_name: mrpc data_files: - split: train path: mrpc/train-* - split: validation path: mrpc/validation-* - split: test path: mrpc/test-* - config_name: qnli data_files: - split: train path: qnli/train-* - split: validation path: qnli/validation-* - split: test path: qnli/test-* - config_name: qqp data_files: - split: train path: qqp/train-* - config_name: wnli data_files: - split: train path: wnli/train-* - split: validation path: wnli/validation-* - split: test path: wnli/test-* license: cc-by-4.0 --- # Dataset Card for "kor_glue" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) # Source Data Citation Information ``` @article{warstadt2018neural, title={Neural Network Acceptability Judgments}, author={Warstadt, Alex and Singh, Amanpreet and Bowman, Samuel R}, journal={arXiv preprint arXiv:1805.12471}, year={2018} } @inproceedings{wang2019glue, title={{GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding}, author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.}, note={In the Proceedings of ICLR.}, year={2019} } Note that each GLUE dataset has its own citation. Please see the source to see the correct citation for each contained dataset. ```
提供机构:
KETI-AIR
原始信息汇总

数据集概述

数据集配置

COLA

  • 特征:
    • data_index_by_user: int32
    • label: int32
    • sentence: string
  • 分割:
    • train: 569511 bytes, 8551 examples
    • validation: 72661 bytes, 1043 examples
    • test: 72979 bytes, 1063 examples
  • 下载大小: 381894 bytes
  • 数据集大小: 715151 bytes

MRPC

  • 特征:
    • data_index_by_user: int32
    • sentence1: string
    • sentence2: string
    • label: int32
    • idx: int32
  • 分割:
    • train: 1078522 bytes, 3668 examples
    • validation: 120306 bytes, 408 examples
    • test: 504069 bytes, 1725 examples
  • 下载大小: 1176356 bytes
  • 数据集大小: 1702897 bytes

QNLI

  • 特征:
    • data_index_by_user: int32
    • label: int32
    • question: string
    • sentence: string
  • 分割:
    • train: 28343211 bytes, 104743 examples
    • validation: 1507016 bytes, 5463 examples
    • test: 1510880 bytes, 5463 examples
  • 下载大小: 21097078 bytes
  • 数据集大小: 31361107 bytes

QQP

  • 特征:
    • data_index_by_user: int32
    • question1: string
    • question2: string
    • label: int32
    • idx: int32
  • 分割:
    • train: 64564524 bytes, 363846 examples
  • 下载大小: 40798086 bytes
  • 数据集大小: 64564524 bytes

WNLI

  • 特征:
    • data_index_by_user: int32
    • sentence1: string
    • sentence2: string
    • label: int32
    • idx: int32
  • 分割:
    • train: 132171 bytes, 635 examples
    • validation: 15331 bytes, 71 examples
    • test: 47430 bytes, 146 examples
  • 下载大小: 80151 bytes
  • 数据集大小: 194932 bytes

数据文件路径

COLA

  • train: cola/train-*
  • validation: cola/validation-*
  • test: cola/test-*

MRPC

  • train: mrpc/train-*
  • validation: mrpc/validation-*
  • test: mrpc/test-*

QNLI

  • train: qnli/train-*
  • validation: qnli/validation-*
  • test: qnli/test-*

QQP

  • train: qqp/train-*

WNLI

  • train: wnli/train-*
  • validation: wnli/validation-*
  • test: wnli/test-*

许可证

  • cc-by-4.0
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作